In Response to Joe Schmidt, Andreessen Horowitz

The System of Work
Is the Moat.

a16z draws a line through the enterprise AI map: the Yellow Brick Road (what the labs already own) versus the Rest of Oz (vertical complexity that compounds). Here is where the semantic governance layer fits, and why it is a system, not a tool.

Alexander Braun · May 2026 · 11 min read
"The model is fungible underneath; the system of work is not."
Joe Schmidt, Andreessen Horowitz. May 2026
½ the work
Roughly half of any real enterprise workflow is non-agentic deterministic software, where labs hold zero structural advantage over a focused application company. The other half is agentic, but still requires domain-specific tuning, training, and constraints built from production exposure in the specific vertical.
Joe Schmidt, Andreessen Horowitz · via Prabhav Jain, CEO 11x · May 2026
4×
11x AI's positive reply rates in sales outreach have increased 4x over recent months, generating hundreds of millions in pipeline. The improvement came from continuously adapting agents to a shifting market, not from a better foundation model. That adaptation compounds only from inside the specific workflow.
Prabhav Jain, CEO 11x · via Andreessen Horowitz, May 2026
3 tests
a16z gives three ways to check if you are in the Rest of Oz: how complex is the software under the model (tools-and-steps), do you own the workflow end-to-end or layer on top (system test), and does your customer measure you against benchmark scores or their own P&L outcomes (P&L test). All three reward vertical depth.
Joe Schmidt, Andreessen Horowitz · May 2026
1×
Business context indexed once, governed in code the enterprise owns. Every agent query runs from that governed layer, not from the model re-ingesting organizational knowledge per call. That is the semantic governance layer as a system of work: not a tool on top of a workflow, but the place where definitions, tribal knowledge, and governance rules compound over time.
LazyFox architecture · Core design principle
Executive Summary
Key Finding
Enterprise AI application companies building on the Yellow Brick Road (a generic model, off-the-shelf connectors, and a horizontal orchestration layer) are walking toward displacement. OpenAI and Anthropic are building exactly this, with better margins, model ownership, and scale distribution. Schmidt is explicit: if your playbook runs the same connectors, adds no vertical software depth, and lacks domain-specific context, you are building a product the labs already ship.
Root Cause
The labs' structural advantage is pre-training compute, not vertical depth. Complex enterprise workflows do not improve linearly with raw model capability. The intelligence in insurance underwriting, legal redlining, or sales qualification lives in the workflow itself: which risks get escalated, which exceptions matter, when a human must approve. That knowledge is not in any training set, and production exposure inside the specific workflow is the only way to acquire it.
Market Recommendation
Build in the Rest of Oz: own the system of work end-to-end, not a tool that sits on top of it. Four compounding defenses are available to application companies that commit to vertical depth: data and learning flywheels, governance as a compliance control plane, cost optimization through model routing across tiers, and absorbing model migration complexity so the customer does not have to. "The model is fungible underneath; the system of work is not."
How LazyFox Delivers on This
The Semantic Layer as System of Work
Schmidt's system/tool distinction maps precisely to LazyFox's architecture. LazyFox is not a layer of intelligence sitting on top of a BI tool the customer already runs. It is the system enterprise data agents run through: governed metric definitions, sourcing rules, and tribal knowledge in the logical and contextual layers that agents query before touching any warehouse. The customer depends on LazyFox as the orchestration layer for meaning, not as an optional add-on.
Compounding Context as Data Flywheel
Schmidt describes two stacked flywheels: across-customer pattern recognition and within-customer operational memory. LazyFox's contextual layer captures both. Each new metric definition indexed, each tribal knowledge rule encoded, each semantic drift event detected adds to an institutional memory that compounds. The 167% net revenue retention LazyFox has achieved with early enterprise customers reflects this dynamic: the value of the semantic layer grows with every new definition captured.
Governance as the Control Plane for AI
a16z names governance as one of the four Rest of Oz defenses: the control plane for permissions, auditing, and what the agent is allowed to do. LazyFox's logical layer is built on this architecture. Versioned metric definitions, audit logs of every governed answer, drift detection when upstream data changes, and access controls by agent role. That control plane lives in code the enterprise owns, not in a model provider's weights or cloud tenant.
Token Efficiency & Vendor Independence
Business context is indexed once. Every subsequent agent query runs directly from governed code; no re-tokenization per call, no organizational knowledge migrating into a model provider's weights. When the enterprise upgrades models, the semantic layer persists. LazyFox absorbs the migration complexity Schmidt describes, routing agents through the best model for each sub-task while the system of work stays constant underneath.
Read the full analysis below
The Article

What a16z Is Actually Arguing

The question is not whether the app layer is dead. It is whether you are building on the path the labs are already walking.

Joe Schmidt's piece in May 2026 is organized around a spatial metaphor that earns its weight. The Yellow Brick Road is the obvious path through the enterprise AI application layer: take a high-performing model, plug in off-the-shelf connectors for the standard enterprise systems (Google Drive, Slack, Salesforce, Notion, GitHub), add an agentic orchestration layer on top, and ship. The problem, as Schmidt points out with precision, is that this is exactly what OpenAI and Anthropic are already building.

The clearest evidence that the labs share Schmidt's read is buried in a footnote of the piece but deserves attention: both OpenAI and Anthropic have announced large forward-deployed joint ventures to build entire companies around configuring and customizing their models for enterprise. Schmidt's logic is clean: you do not pour billions into those programs if you believe the next model release will close the gap automatically. The labs are telling the market that horizontal product plus better model does not solve complex enterprise workflows. They are standing up services businesses to close what the model alone cannot reach.

The Rest of Oz, in Schmidt's taxonomy, is everything that does not improve linearly with pre-training compute. These are multi-step, multi-system workflows where the value comes not from the model's raw capability but from the software scaffolding around it: domain-specific agents, legacy system integrations, deterministic output requirements, vertical compliance regimes, and tribal knowledge that has never appeared in any training set. He walks through sales and insurance as detailed examples. Both make the same point from different angles: the intelligence lives in the workflow, and the workflow can only be learned from the inside.

"You don't pour billions into those programs if you think the next model release is going to take care of it."

Joe Schmidt, Andreessen Horowitz, May 2026 · Read the full article →

Schmidt's framing matters for the enterprise buyer as much as for the founder or investor. If you are a CIO deciding which AI vendors to consolidate on, the system/tool distinction is the most consequential architectural question you face. The tool delivers intelligence on top of a system you already run. The system owns data capture, workflow execution, and governance. When a lab releases a competing product, you switch the tool. You do not switch the system. The vendors worth committing to are the ones whose removal would require reorganizing how the actual work happens, not just swapping one intelligence layer for another.

Same inputs. Very different outcomes.

Two architectures are forming in the enterprise AI market. One sits on the Yellow Brick Road the labs already own. The other builds the system of work the labs cannot reach without becoming a hundred different verticals at once.

Yellow Brick Road
Foundation model (API)
OpenAI / Anthropic / Gemini. Capability improves with compute. Labs own the roadmap.
Off-the-shelf connectors
Google Drive, Slack, Salesforce, Notion, GitHub. Same connectors every horizontal tool ships.
Horizontal orchestration layer
No vertical domain software. No governance per use case. No tribal knowledge captured.
Labs already ship this
Labs own the model and the distribution. When Codex or Cowork ships the same workflow, there is no moat below the orchestration layer to defend.
Rest of Oz
Model routing across tiers
Frontier for hardest judgment calls, mid-tier for the bulk, fine-tuned models where production has earned them. Vendor-agnostic by design.
Domain software + governance control plane
Multi-step workflow software, vertical-specific agents, compliance logic per use case, audit logs, and permissions. Years of focused engineering the labs do not have.
System the customer depends on
Data flywheel: within-customer + across-customer
Every escalation, exception, and human correction feeds the workflow's operational memory. Pattern recognition compounds across similar problem types seen in production.
New model generations are delivered through the system of work. The model is fungible. The system is not. Removing it requires reorganizing how the work happens.
The Yellow Brick Road collapses when labs ship the same thing. Off-the-shelf connectors plus a foundation model is the exact architecture Cowork and Codex are built on. No distribution advantage. No moat below the model. When the lab releases a competitive product, the customer switches.
The Rest of Oz compounds with production exposure. The workflow you ship on day one is not the defense. The governance layer, the data flywheel, and the operational memory built from thousands of production runs is what the next entrant cannot replicate by spinning up a fresh agent.
Finding 2

The Intelligence Lives in the Workflow, Not the Model

Insurance underwriting and sales qualification do not improve linearly with pre-training compute. The knowledge that makes them work lives in production workflows, and only production exposure inside those workflows can provide it.

The FurtherAI and 11x examples Schmidt includes are the most instructive part of the piece, and they make the same point from opposite ends of the enterprise.

Aman Gour, CEO of FurtherAI, describes what insurance carriers mean when they talk about AI-automated underwriting. Two carriers can run a submission through what looks like identical steps: submission, review, quote, bind. What separates them is everything inside that path: which risks get escalated, which loss signals carry weight, which appetite rule wins when two conflict, when a human must sign off, which external data gets called, and how the final decision gets documented. None of that logic lives in a clean rules engine. It is distributed across SOPs, manager reviews, underwriting philosophy, and years of operational experience. Much of it has never been written down in any form a model could read from a training set.

Gour's conclusion from building this in production is precise: every escalation is a signal, every exception is feedback, every human correction reveals where the runbook was incomplete. Over time, the workflow stops being a script and becomes the carrier's operating memory. A lab cannot build that by pointing a general-purpose agent at a carrier's data on day one. It accumulates from running the workflow in production, many thousands of times, with accountability for each outcome.

"The workflow you ship on day one is not the moat. The loop that production usage creates over time is."

Aman Gour, CEO FurtherAI, via Joe Schmidt, Andreessen Horowitz, May 2026

11x CEO Prabhav Jain puts numbers to the same dynamic on the sales side. His agents' positive reply rates have increased 4x in recent months. The improvement did not come from a better base model. It came from continuously adapting agents to a changing market: buyer sensitivity to AI-written emails shifts every few months, and the team has built the workflow surfaces that detect and respond to that shift in near real-time. That is application company work, and it cannot be replicated by a horizontal platform without the same production exposure and vertical-specific workflow engineering.

Jain's framing on the engineering split is worth holding: roughly half of any real GTM workflow is non-agentic deterministic software. No agent writes the logic for lead deduplication across messy CRM data, for domain matching against subsidiary hierarchies, for detecting stale matching fields before they cold-pitch a current customer's CRO. Labs have no structural advantage on that half. The other half is agentic, but it still requires tuning, training, and constraining against the specific domain, persona, and outcome the workflow targets. Domain knowledge that does not exist in general training data has to be constructed from the ground up and injected at the right moment in the workflow.

Finding 3

Own the System of Work, Not the Tool on Top

The system versus tool distinction is the most consequential architectural question an enterprise AI company faces, and the answer determines whether the labs can replace you with a model release.

Schmidt gives three tests for deciding whether you are building in the Rest of Oz. The tools-and-steps test asks how many steps the workflow takes and how complex the software underneath the model layer must be. The system test asks whether you own the workflow end-to-end (data capture, governance, records of what got done) or whether you add intelligence to a workflow the customer already runs elsewhere. The P&L test asks whether your customer tracks your performance against benchmark scores or against their own business outcomes: deals closed, contracts redlined correctly, policies bound on the right risk.

The system/tool distinction carries the most weight. Schmidt's diagnostic is worth quoting directly: ask whether a lab releasing a product that competes directly with yours would cause your customer to switch. If yes, you are a tool, even at high ACV. If no, you own something the lab cannot take without displacing how the work actually happens. The question is not about feature overlap. It is about whether the customer's operational dependence is on you as the orchestration layer, or merely on the intelligence you provide on top of a system they could route around.

Governance is the fourth Rest of Oz defense Schmidt names, and in regulated verticals it is the least substitutable. Legal, healthcare, insurance, and financial services require a counterparty that contractually absorbs compliance complexity for the specific domain: FRCP and bar rules, HIPAA, SEC and FINRA requirements, state insurance regulations. A horizontal platform cannot take on that obligation across every vertical simultaneously. It is structurally the same trade-off that keeps the labs on the Yellow Brick Road: you can be everywhere for everyone, or you can be great at one specific workflow. Not both at the same time.

"The best agent businesses are going to need to execute like hedge funds, winning on alpha measured in customer P&L, not in benchmark scores."

Joe Schmidt, Andreessen Horowitz, May 2026

The cost optimization point Schmidt makes matters for the enterprise buyer in concrete terms. Running every query through the most capable frontier model is the fastest path to negative gross margins for application companies, and the highest-cost path to AI adoption for enterprise buyers. Rest of Oz companies route across tiers: frontier for the hardest judgment calls, mid-tier for the bulk, fine-tuned or custom models where production exposure has earned the right to use them. That routing is only possible with a deep understanding of what each sub-task actually requires, which requires the vertical-specific software underneath the agent layer. The lab sells you the floor. The application company with the right architecture sells you the lowest cost for the specific level of intelligence each step of the workflow needs.

What connects all four defenses (data flywheels, governance, model routing, and migration absorption) is a single underlying principle Schmidt states cleanly at the end of the piece: the system of work that owns data capture, workflow execution, and governance becomes the layer through which every new model generation gets delivered to the customer. Model generations are fungible. The system of work is not. Application companies that own it will integrate whatever model lands next. Those that do not will be replaced by it.

Are you in the Rest of Oz?

a16z gives three concrete tests. Pass all three and a lab cannot replace you with a model release. Fail any one and the moat you think you have is a feature, not a system.

01
Tools & Steps Test
How many steps, and how complex is the software under the model?
Yellow Brick Road
One step, one tool, forgiving outcome. User re-asks if the answer is wrong. Labs can ship this in a product update.
Rest of Oz
Dozens of steps, many purpose-built tools, output that has to clear compliance review or be argued in a regulated context. Takes years of focused engineering to build correctly.
02
System Test
Do you own the workflow, or do you sit on top of one the customer already has?
Yellow Brick Road
Intelligence added to a workflow the customer runs via another system. Customer could swap you for a lab product without reorganizing how the work happens.
Rest of Oz
You own data capture, execution, and governance. The customer points to your system when describing how the actual work gets done. Removing you requires replacing the operational infrastructure.
03
P&L Test
Does your customer care about benchmark scores or specific business outcomes?
Yellow Brick Road
Customer pays for generic capability. They could get equivalent value from a Claude or Codex seat. ACV may be high, but the dependency is on the intelligence, not the system.
Rest of Oz
Customer tracks whether the agent closed the deal, redlined the contract correctly, or bound the right policy. Performance is measured against their P&L, not a benchmark leaderboard.
"The model is fungible underneath; the system of work is not."
Joe Schmidt, Andreessen Horowitz. May 2026

See the semantic governance layer in practice.

Book a technical walkthrough. We will connect LazyFox to your data stack and show exactly what your agents are running through.