a16z draws a line through the enterprise AI map: the Yellow Brick Road (what the labs already own) versus the Rest of Oz (vertical complexity that compounds). Here is where the semantic governance layer fits, and why it is a system, not a tool.
"The model is fungible underneath; the system of work is not."Joe Schmidt, Andreessen Horowitz. May 2026
The question is not whether the app layer is dead. It is whether you are building on the path the labs are already walking.
Joe Schmidt's piece in May 2026 is organized around a spatial metaphor that earns its weight. The Yellow Brick Road is the obvious path through the enterprise AI application layer: take a high-performing model, plug in off-the-shelf connectors for the standard enterprise systems (Google Drive, Slack, Salesforce, Notion, GitHub), add an agentic orchestration layer on top, and ship. The problem, as Schmidt points out with precision, is that this is exactly what OpenAI and Anthropic are already building.
The clearest evidence that the labs share Schmidt's read is buried in a footnote of the piece but deserves attention: both OpenAI and Anthropic have announced large forward-deployed joint ventures to build entire companies around configuring and customizing their models for enterprise. Schmidt's logic is clean: you do not pour billions into those programs if you believe the next model release will close the gap automatically. The labs are telling the market that horizontal product plus better model does not solve complex enterprise workflows. They are standing up services businesses to close what the model alone cannot reach.
The Rest of Oz, in Schmidt's taxonomy, is everything that does not improve linearly with pre-training compute. These are multi-step, multi-system workflows where the value comes not from the model's raw capability but from the software scaffolding around it: domain-specific agents, legacy system integrations, deterministic output requirements, vertical compliance regimes, and tribal knowledge that has never appeared in any training set. He walks through sales and insurance as detailed examples. Both make the same point from different angles: the intelligence lives in the workflow, and the workflow can only be learned from the inside.
"You don't pour billions into those programs if you think the next model release is going to take care of it."
Joe Schmidt, Andreessen Horowitz, May 2026 · Read the full article →Schmidt's framing matters for the enterprise buyer as much as for the founder or investor. If you are a CIO deciding which AI vendors to consolidate on, the system/tool distinction is the most consequential architectural question you face. The tool delivers intelligence on top of a system you already run. The system owns data capture, workflow execution, and governance. When a lab releases a competing product, you switch the tool. You do not switch the system. The vendors worth committing to are the ones whose removal would require reorganizing how the actual work happens, not just swapping one intelligence layer for another.
Two architectures are forming in the enterprise AI market. One sits on the Yellow Brick Road the labs already own. The other builds the system of work the labs cannot reach without becoming a hundred different verticals at once.
Insurance underwriting and sales qualification do not improve linearly with pre-training compute. The knowledge that makes them work lives in production workflows, and only production exposure inside those workflows can provide it.
The FurtherAI and 11x examples Schmidt includes are the most instructive part of the piece, and they make the same point from opposite ends of the enterprise.
Aman Gour, CEO of FurtherAI, describes what insurance carriers mean when they talk about AI-automated underwriting. Two carriers can run a submission through what looks like identical steps: submission, review, quote, bind. What separates them is everything inside that path: which risks get escalated, which loss signals carry weight, which appetite rule wins when two conflict, when a human must sign off, which external data gets called, and how the final decision gets documented. None of that logic lives in a clean rules engine. It is distributed across SOPs, manager reviews, underwriting philosophy, and years of operational experience. Much of it has never been written down in any form a model could read from a training set.
Gour's conclusion from building this in production is precise: every escalation is a signal, every exception is feedback, every human correction reveals where the runbook was incomplete. Over time, the workflow stops being a script and becomes the carrier's operating memory. A lab cannot build that by pointing a general-purpose agent at a carrier's data on day one. It accumulates from running the workflow in production, many thousands of times, with accountability for each outcome.
"The workflow you ship on day one is not the moat. The loop that production usage creates over time is."
Aman Gour, CEO FurtherAI, via Joe Schmidt, Andreessen Horowitz, May 202611x CEO Prabhav Jain puts numbers to the same dynamic on the sales side. His agents' positive reply rates have increased 4x in recent months. The improvement did not come from a better base model. It came from continuously adapting agents to a changing market: buyer sensitivity to AI-written emails shifts every few months, and the team has built the workflow surfaces that detect and respond to that shift in near real-time. That is application company work, and it cannot be replicated by a horizontal platform without the same production exposure and vertical-specific workflow engineering.
Jain's framing on the engineering split is worth holding: roughly half of any real GTM workflow is non-agentic deterministic software. No agent writes the logic for lead deduplication across messy CRM data, for domain matching against subsidiary hierarchies, for detecting stale matching fields before they cold-pitch a current customer's CRO. Labs have no structural advantage on that half. The other half is agentic, but it still requires tuning, training, and constraining against the specific domain, persona, and outcome the workflow targets. Domain knowledge that does not exist in general training data has to be constructed from the ground up and injected at the right moment in the workflow.
The system versus tool distinction is the most consequential architectural question an enterprise AI company faces, and the answer determines whether the labs can replace you with a model release.
Schmidt gives three tests for deciding whether you are building in the Rest of Oz. The tools-and-steps test asks how many steps the workflow takes and how complex the software underneath the model layer must be. The system test asks whether you own the workflow end-to-end (data capture, governance, records of what got done) or whether you add intelligence to a workflow the customer already runs elsewhere. The P&L test asks whether your customer tracks your performance against benchmark scores or against their own business outcomes: deals closed, contracts redlined correctly, policies bound on the right risk.
The system/tool distinction carries the most weight. Schmidt's diagnostic is worth quoting directly: ask whether a lab releasing a product that competes directly with yours would cause your customer to switch. If yes, you are a tool, even at high ACV. If no, you own something the lab cannot take without displacing how the work actually happens. The question is not about feature overlap. It is about whether the customer's operational dependence is on you as the orchestration layer, or merely on the intelligence you provide on top of a system they could route around.
Governance is the fourth Rest of Oz defense Schmidt names, and in regulated verticals it is the least substitutable. Legal, healthcare, insurance, and financial services require a counterparty that contractually absorbs compliance complexity for the specific domain: FRCP and bar rules, HIPAA, SEC and FINRA requirements, state insurance regulations. A horizontal platform cannot take on that obligation across every vertical simultaneously. It is structurally the same trade-off that keeps the labs on the Yellow Brick Road: you can be everywhere for everyone, or you can be great at one specific workflow. Not both at the same time.
"The best agent businesses are going to need to execute like hedge funds, winning on alpha measured in customer P&L, not in benchmark scores."
Joe Schmidt, Andreessen Horowitz, May 2026The cost optimization point Schmidt makes matters for the enterprise buyer in concrete terms. Running every query through the most capable frontier model is the fastest path to negative gross margins for application companies, and the highest-cost path to AI adoption for enterprise buyers. Rest of Oz companies route across tiers: frontier for the hardest judgment calls, mid-tier for the bulk, fine-tuned or custom models where production exposure has earned the right to use them. That routing is only possible with a deep understanding of what each sub-task actually requires, which requires the vertical-specific software underneath the agent layer. The lab sells you the floor. The application company with the right architecture sells you the lowest cost for the specific level of intelligence each step of the workflow needs.
What connects all four defenses (data flywheels, governance, model routing, and migration absorption) is a single underlying principle Schmidt states cleanly at the end of the piece: the system of work that owns data capture, workflow execution, and governance becomes the layer through which every new model generation gets delivered to the customer. Model generations are fungible. The system of work is not. Application companies that own it will integrate whatever model lands next. Those that do not will be replaced by it.
a16z gives three concrete tests. Pass all three and a lab cannot replace you with a model release. Fail any one and the moat you think you have is a feature, not a system.
"The model is fungible underneath; the system of work is not."Joe Schmidt, Andreessen Horowitz. May 2026
Book a technical walkthrough. We will connect LazyFox to your data stack and show exactly what your agents are running through.