The Karpathy Series · Part 1 of 3

Karpathy’s Approach To Knowledge Bases.
And Why It Won’t Solve
Your Company’s Problem.

The architecture of enterprise AI cost, and why the savings your AI team is excited about won’t fix the bill on your desk.

By Alexander Braun · April 2026 · 6 min read
01 Karpathy & the CFO layer problem 02 Three layers of token waste 03 Wiki costs at enterprise scale
$300M — Salesforce’s entire 2026 Anthropic spend·
5–7% of every knowledge worker’s salary — Rory O’Driscoll, 20VC·
71% of companies exceeded their AI budget in 2025·
60–80% of enterprise queries never need to hit the model at all·
36% annual growth in AI token costs — CloudZero 2026·
10 margin points — what AI governance is worth by 2029 (Gartner)·
$300M — Salesforce’s entire 2026 Anthropic spend·
5–7% of every knowledge worker’s salary — Rory O’Driscoll, 20VC·
71% of companies exceeded their AI budget in 2025·
60–80% of enterprise queries never need to hit the model at all·
36% annual growth in AI token costs — CloudZero 2026·
10 margin points — what AI governance is worth by 2029 (Gartner)·
"Five or seven percent of every knowledge worker’s salary and twenty percent of every engineering salary as additional token costs."
Rory O’Driscoll, Scale Venture Partners: 20VC Podcast, 2026
5–7%
Of every knowledge worker’s salary — the emerging benchmark for AI token spend per head, per year.
Rory O’Driscoll, Scale Venture Partners · 20VC, 2026
$300M
Salesforce’s annual Anthropic spend in 2026. Almost entirely on coding agents — from nothing two years ago.
Marc Benioff / Yahoo Finance · 20VC Podcast
60–80%
Of enterprise data queries are routine and repeatable. With semantic governance, they never need to hit the model at all.
LazyFox internal analysis · Conservative estimate
71%
Share of companies that blew their AI cost budget in 2025. GenAI ranked the single least predictable cost category.
FinOps Foundation, 2025

Andrej Karpathy’s wiki architecture delivers real savings, at the right layer. Here’s why enterprise finance, ops, and RevOps teams are paying a different bill entirely.

Andrej Karpathy posted a thread about compiling LLM-built knowledge wikis. The math is incredible: 70 to 90% token savings on repeated queries. Engineers loved it. Within a week, CFOs were forwarding it to their AI architecture teams asking, essentially: can we do this?

The honest answer is yes, you can do it. It just won’t fix the bill you’re paying.

That’s not because Karpathy is wrong. It’s because his savings come from a very specific place, and that place is not where the enterprise token bill is actually being burnt.

What Karpathy actually built

Steelman first. He’s solving a real problem with a clean architecture:

  • A raw/ directory holds source material, papers, articles, repos, web clippings.
  • An LLM "compiles" it into a wiki/ of Markdown files: per-source summaries, concept articles, backlinks, auto-maintained indexes.
  • Obsidian is the frontend. Periodic "lint" passes check for inconsistencies, fill gaps via search, suggest new article candidates.
  • Q&A happens against the wiki, not against the raw files.

The architecture is elegant. The savings are real. At his scale (about 100 articles, ~400K words) we’ve costed the maintenance at roughly $200 a month. For a researcher who lives in that corpus 40 hours a week, that’s an excellent trade.

The objection going around , "do you know how much that wiki will cost to maintain?", is wrong at his scale. He’s right.

Where the savings come from, exactly

Three sources, in order of magnitude:

  1. Deduplication. Many raw documents repeat the same facts. The wiki captures each fact once.
  2. Cleaning. Boilerplate, ads, navigation, headers, stripped.
  3. Summarization plus linking. The model jumps to the right 600-word concept article instead of scanning the full 80-page source.

All three are content-layer optimizations. They reduce how much prose the model has to re-read on each query. If your workload is prose-heavy (research, legal contracts, policy docs) these savings show up as advertised.

Now compare to where enterprise tokens actually go.

What enterprise queries actually look like

Knowledge workers in your finance, ops, and revenue teams aren’t asking "summarize this 80-page paper." They’re asking:

  • "Pipeline coverage by region this quarter."
  • "All invoices with status ‘offen’ from Q1."
  • "Revenue vs forecast in Germany."

These don’t waste tokens by re-reading prose. They waste tokens by re-disambiguating meaning on every single call. What’s "pipeline coverage", booked, qualified, weighted, in the last 30 days? Does "offen" mean open or pending or unbilled? Which of the five tables called "revenue" is the right one for Q1 Germany?

The Token Tax in Action
Query to AI agent — without LazyFox
"Summarize all invoices with status ‘offen’ from this quarter"
4,200 tokens — AI fires 3 clarifying sub-queries
Same query — with LazyFox
"Summarize all invoices with status ‘open’ from this quarter"
1,100 tokens — resolved in one pass
Savings per query
74%
fewer tokens consumed per ambiguous data query

Semantic ambiguity is a hidden token tax

When an AI agent runs into ambiguous data (and it does, on most enterprise queries) it doesn’t fail gracefully. It fires clarifying sub-queries, retrieves more context, generates hedged outputs, and retries. We’ve measured 4,200 tokens spent on a query that should have cost 1,100.

Multiply by 10,000 queries a day. That’s the CFO bill.

A Karpathy-style wiki doesn’t help here. A wiki article called "Revenue" will be written, and so will four others, because the model dutifully compiled what it found across SAP, Salesforce, MongoDB, and the data lake. Five articles. Five definitions. Same problem.

"People go nuts with Claude: one person spent a thousand dollars in token costs over one weekend, on a report you can build with your standard CRM."

Head of RevOps & Enablement, mid-market SaaS company

The one-line reframe

Karpathy precomputes content. The enterprise needs to precompute meaning.

That’s the architectural difference. Everything else flows from it.

Precomputing content gets you out of re-reading prose. Precomputing meaning gets you out of re-resolving ambiguity, and then, critically, lets the query run deterministically. The model gets used once, at setup, to resolve "revenue means X under context Y." After that, queries are generated as code, executed against the source systems, and the LLM is not in the hot path. The bill on the millionth identical query approaches zero.

That’s not a marginal improvement on the wiki. It’s a different cost curve.

The CFO’s test

If you want to know whether your bill looks more like Karpathy’s problem or the enterprise problem, sort one day’s worth of AI queries by intent.

If they’re mostly "find me the right paragraph in the right document" (research, contracts, policy lookups) you have a content-layer problem. Build a wiki. The savings will be real.

If they’re mostly "what’s the right number for this metric, in this context, this quarter" (finance, ops, RevOps, executive reporting) a Markdown wiki will not save you. The waste isn’t in re-reading prose; it’s in re-resolving definitions that haven’t been governed.

In our experience with mid-market and enterprise customers, the second bucket is 60–80% of the bill. Yours may differ. But you should look before you copy.

The point isn’t to argue with him

Karpathy did the field a real favor. He took an architectural pattern that almost nobody was using at personal scale and made it obvious. He raised the floor on how people think about LLM cost.

The next move isn’t to argue with the architecture. It’s to ask which surface your bill is actually being burnt against. If your surface is prose, his answer is right. If your surface is meaning (which is most of what the enterprise is paying for) the right shape of compression looks different.

That’s the layer we built LazyFox at. We compile your definition graph the way Karpathy compiles his wiki: once, with the model doing real work. After that, every query you ask runs deterministically, against your actual systems, with no LLM call in the path. The cost curve flattens. The CFO bill stops growing with query volume.

Find out which layer your bill is being burnt against.

Share your email and we’ll run a free token cost audit, showing where your knowledge worker queries are burning budget and what a 90-day reduction path looks like.