A taxonomy for CFOs who can’t tell which fix to buy, and why buying the wrong one means paying twice without solving the original problem.
In Part 1, we established that Karpathy’s wiki architecture solves a content-layer problem, not the semantic-layer problem most enterprises are actually paying for. Here we lay out all three layers and what each requires.
Every CFO has noticed the AI bill. Not every CFO has noticed that the AI bill is actually three different bills, with three different drivers, that look identical on the invoice. The reference architectures going around online conflate all three. Buying the wrong fix for your actual bottleneck is how companies end up paying twice and still having the original problem.
This is the taxonomy. Three layers, three waste modes, three different shapes of solution.
When an AI agent answers a question about prose-heavy material, a research paper, a legal contract, a 40-page policy doc, most of the tokens go into re-reading that material on every query. The waste mode is "I have already read this PDF nine times today, and the tenth person is going to ask the same thing about it tomorrow."
This is the layer Andrej Karpathy is solving with his LLM-compiled wiki. The architecture is to read each source once, compile it into a dense, deduplicated, cross-linked Markdown corpus, and let subsequent queries operate on the compiled artifact instead of the raw files. For workloads that genuinely run on prose, research labs, law firms, policy shops, due-diligence teams, the savings are real and large.
Two warnings even when you have a content-layer problem: the wiki has to be maintained (at enterprise scale, recompile and lint costs scale linearly with corpus size), and it does nothing for queries that aren’t really about prose.
Most enterprise AI queries are not "summarize this document." They’re "compute this number under these conditions."
Every one of these depends on a definition. Revenue. Open. Pipeline. Coverage. In most enterprises these terms aren’t ambiguous because nobody thought about them, they’re ambiguous because each business unit got them right for their own context. Finance recognizes revenue. Sales books contracts. Both are correct. The conflict is real and load-bearing.
When an AI agent runs into this, it does not say "definitions conflict." It guesses. It retries. It hedges. It quietly chooses whichever interpretation the prompt nudged it toward. The token bill is the visible cost. The wrong number in the next board deck is the invisible one.
This is the semantic layer. It is where most enterprise AI token spend actually lives, and the layer most teams don’t recognize as a layer at all. They call it "data quality," or "context," or "AI accuracy," and try to solve it with prompting tricks, longer context windows, or another retrieval system. None of that works for long, because none of it carries the version history of how Finance defined ARR last quarter.
The third layer is the one nobody talks about until an AI agent has to actually do something, resolve a customer ticket, approve a refund, escalate an incident, route an exception. Now it needs to know how the work actually gets done in this company. Not "what does revenue mean", but "what does a senior support rep do when a tier-2 customer churns mid-contract for the third time this year."
That knowledge usually lives in nobody’s head and everybody’s hands. It’s procedural. It’s tacit. It rarely makes it into documentation because the people who know it don’t have time to write it down, and what they do write becomes outdated within a quarter.
This is the layer companies like Interloom are solving, by ingesting operational records and mapping how decisions actually get made by the experienced humans in the room. It’s a real category, adjacent to ours, and a different fix again: you can’t compile prose to solve it, and you can’t compile a definition to solve it.
Most reference architectures pick one layer, usually content, because Karpathy’s setup is the most visible to copy, and apply it to all three. That wiki paid setup and recompile costs but didn’t fix the bill, because the bill was being burnt on the semantic layer underneath.
Re-reading prose on every query. Research papers, contracts, policy docs. The waste Karpathy’s wiki architecture is designed to eliminate, and does, at the right scale.
Fix: build a wikiRe-resolving meaning on every query. Conflicting definitions, ambiguous status codes, cross-system schema conflicts. The dominant driver of enterprise token cost, and the layer most teams misidentify.
Fix: build a semantic layerRe-discovering how decisions get made. Tacit operational knowledge that lives in nobody’s documentation but everybody’s institutional memory. Can’t be fixed by prose or definitions alone.
Fix: procedural infrastructureImagine an AI agent reasoning about a tier-2 customer churning mid-contract for the third time. To answer well it needs:
Three layers, three fixes. Compress each in the right shape and you take 70–80% of the cost out of the workflow. Compress them in the wrong shape and you reinvent expensive architecture for the wrong layer.
Take a thousand of your real AI queries from last week. Look at where the tokens went. Bucket them by waste mode.
What we see in mid-market and enterprise customers: Layer 2 is 60–80% of the bill, Layer 1 is 10–20%, Layer 3 is the rest. Your numbers will differ. But knowing which mileage is which is the difference between fixing your AI cost and generating consulting work for someone.
Architecture wins beat brand wins in enterprise software. The team that identifies the right layer for their bottleneck spends a fraction of what the team that copies the trendiest reference architecture does. In Part 3, we run the cost math to show exactly how large that gap is.
Share your email and we’ll run a free token cost audit, showing where your knowledge worker queries are burning budget and what a 90-day reduction path looks like.