LazyFox — Karpathy's Approach To Knowledge Bases. And Why It Won't Solve Your Company's Problem.

Karpathy’s Approach To Knowledge Bases.
And Why It Won’t Solve
Your Company’s Problem.

The architecture of enterprise AI cost, and why the savings your AI team is excited about won’t fix the bill on your desk.

By Alexander Braun · April 2026 · 6 min read

Andrej Karpathy’s wiki architecture delivers real savings, at the right layer. Here’s why enterprise finance, ops, and RevOps teams are paying a different bill entirely.

Andrej Karpathy posted a thread about compiling LLM-built knowledge wikis. The math is incredible: 70 to 90% token savings on repeated queries. Engineers loved it. Within a week, CFOs were forwarding it to their AI architecture teams asking, essentially: can we do this?

LLM Knowledge Bases

Something I'm finding very useful recently: using LLMs to build personal knowledge bases for various topics of research interest. In this way, a large fraction of my recent token throughput is going less into manipulating code, and more into manipulating…
— Andrej Karpathy (@karpathy) April 2, 2026

The honest answer is yes, you can do it. It just won’t fix the bill you’re paying.

That’s not because Karpathy is wrong. It’s because his savings come from a very specific place, and that place is not where the enterprise token bill is actually being burnt.

What Karpathy actually built

Steelman first. He’s solving a real problem with a clean architecture:

A raw/ directory holds source material, papers, articles, repos, web clippings.
An LLM "compiles" it into a wiki/ of Markdown files: per-source summaries, concept articles, backlinks, auto-maintained indexes.
Obsidian is the frontend. Periodic "lint" passes check for inconsistencies, fill gaps via search, suggest new article candidates.
Q&A happens against the wiki, not against the raw files.

The architecture is elegant. The savings are real. At his scale (about 100 articles, ~400K words) we’ve costed the maintenance at roughly $200 a month. For a researcher who lives in that corpus 40 hours a week, that’s an excellent trade.

The objection going around , "do you know how much that wiki will cost to maintain?", is wrong at his scale. He’s right.

Where the savings come from, exactly

Three sources, in order of magnitude:

Deduplication. Many raw documents repeat the same facts. The wiki captures each fact once.
Cleaning. Boilerplate, ads, navigation, headers, stripped.
Summarization plus linking. The model jumps to the right 600-word concept article instead of scanning the full 80-page source.

All three are content-layer optimizations. They reduce how much prose the model has to re-read on each query. If your workload is prose-heavy (research, legal contracts, policy docs) these savings show up as advertised.

Now compare to where enterprise tokens actually go.

What enterprise queries actually look like

Knowledge workers in your finance, ops, and revenue teams aren’t asking "summarize this 80-page paper." They’re asking:

"Pipeline coverage by region this quarter."
"All invoices with status ‘offen’ from Q1."
"Revenue vs forecast in Germany."

These don’t waste tokens by re-reading prose. They waste tokens by re-disambiguating meaning on every single call. What’s "pipeline coverage", booked, qualified, weighted, in the last 30 days? Does "offen" mean open or pending or unbilled? Which of the five tables called "revenue" is the right one for Q1 Germany?

Semantic ambiguity is a hidden token tax

When an AI agent runs into ambiguous data (and it does, on most enterprise queries) it doesn’t fail gracefully. It fires clarifying sub-queries, retrieves more context, generates hedged outputs, and retries. We’ve measured 4,200 tokens spent on a query that should have cost 1,100.

Multiply by 10,000 queries a day. That’s the CFO bill.

A Karpathy-style wiki doesn’t help here. A wiki article called "Revenue" will be written, and so will four others, because the model dutifully compiled what it found across SAP, Salesforce, MongoDB, and the data lake. Five articles. Five definitions. Same problem.

"People go nuts with Claude: one person spent a thousand dollars in token costs over one weekend, on a report you can build with your standard CRM."

Head of RevOps & Enablement, mid-market SaaS company

The one-line reframe

Karpathy precomputes content. The enterprise needs to precompute meaning.

That’s the architectural difference. Everything else flows from it.

Precomputing content gets you out of re-reading prose. Precomputing meaning gets you out of re-resolving ambiguity, and then, critically, lets the query run deterministically. The model gets used once, at setup, to resolve "revenue means X under context Y." After that, queries are generated as code, executed against the source systems, and the LLM is not in the hot path. The bill on the millionth identical query approaches zero.

That’s not a marginal improvement on the wiki. It’s a different cost curve.

The CFO’s test

If you want to know whether your bill looks more like Karpathy’s problem or the enterprise problem, sort one day’s worth of AI queries by intent.

If they’re mostly "find me the right paragraph in the right document" (research, contracts, policy lookups) you have a content-layer problem. Build a wiki. The savings will be real.

If they’re mostly "what’s the right number for this metric, in this context, this quarter" (finance, ops, RevOps, executive reporting) a Markdown wiki will not save you. The waste isn’t in re-reading prose; it’s in re-resolving definitions that haven’t been governed.

In our experience with mid-market and enterprise customers, the second bucket is 60–80% of the bill. Yours may differ. But you should look before you copy.

The point isn’t to argue with him

Karpathy did the field a real favor. He took an architectural pattern that almost nobody was using at personal scale and made it obvious. He raised the floor on how people think about LLM cost.

The next move isn’t to argue with the architecture. It’s to ask which surface your bill is actually being burnt against. If your surface is prose, his answer is right. If your surface is meaning (which is most of what the enterprise is paying for) the right shape of compression looks different.

That’s the layer we built LazyFox at. We compile your definition graph the way Karpathy compiles his wiki: once, with the model doing real work. After that, every query you ask runs deterministically, against your actual systems, with no LLM call in the path. The cost curve flattens. The CFO bill stops growing with query volume.

Karpathy’s Approach To Knowledge Bases.
And Why It Won’t Solve
Your Company’s Problem.

What Karpathy actually built

Where the savings come from, exactly

What enterprise queries actually look like

Semantic ambiguity is a hidden token tax

The one-line reframe

The CFO’s test

The point isn’t to argue with him

Three Layers of AI Token Waste in the Enterprise

Find out which layer your bill is being burnt against.

Karpathy’s Approach To Knowledge Bases.And Why It Won’t SolveYour Company’s Problem.

What Karpathy actually built

Where the savings come from, exactly

What enterprise queries actually look like

Semantic ambiguity is a hidden token tax

The one-line reframe

The CFO’s test

The point isn’t to argue with him

Three Layers of AI Token Waste in the Enterprise

Find out which layer your bill is being burnt against.

Karpathy’s Approach To Knowledge Bases.
And Why It Won’t Solve
Your Company’s Problem.