The Token Cost Is in the Discovery

When people talk about AI token costs, they usually mean generation — how many tokens the model produces. The more interesting cost is the one before generation: discovery. What does the LLM have to read before it can answer your question?

For a framework with no contracts, the answer is: everything that might be relevant.

The Discovery Problem

Ask an LLM how to add timestamp tracking to a DarJS model. Without any retrieval layer, the LLM has two options:

Guess from general knowledge (probably wrong — DarJS uses mixins, not field declarations)
Read the source

If it reads the source, it reads Model.js first — the base class. That file is 8,976 bytes, approximately 2,244 tokens. Then it needs to find the relevant mixin. Trackable.js is another 2,149 bytes — 538 tokens. Total before it can even begin to answer: roughly 2,782 tokens of file content in the context.

The correct answer is a single line:

class Order extends Model.with(Trackable) {}

The ratio is uncomfortable: 2,782 tokens consumed to produce a one-line answer.

A Reproducible Measurement

The numbers above are not estimates. They come from fs.statSync() on the actual files. Here is the verification command:

wc -c packages/core/model/Model.js packages/mixins/model/Trackable.js

Output on this repo:

 8976 packages/core/model/Model.js
 2149 packages/mixins/model/Trackable.js

Divide by 4 (standard Claude/GPT-4 approximation for tokens per character). The numbers above are correct.

dar find returns the Trackable contract — including the @example, @reuse-when, and import path — in approximately 538 tokens of output. The gap is 2,244 tokens, which is exactly the cost of having to read Model.js to discover that Trackable exists.

Eight Scenarios, Documented

The dar simulate --compare command in the DarJS CLI runs eight scenarios using this methodology. For each scenario it records:

Which files an LLM would read without the contract layer, and why
The actual byte size of each file
The dar find output in tokens
Whether the scenario also has an incorrect-generation risk — cases where the LLM not only reads unnecessary tokens, but generates functionally wrong code

The full output:

  S01  multi step wizard form                          ✓ reuse  score 90%
         Path A reads: CreateView.js         3644 bytes  ≈  911 tok
         Path A reads: PageDefRenderer.js   12883 bytes  ≈ 3221 tok
         Path A reads: procedures.js        15522 bytes  ≈ 3881 tok
         Path A total:                                   ≈ 8013 tok
         Path B result: dar find output     ≈  596 tok  (avoided 7417 tok)
         ⚠  Structural: without contracts, LLM likely generates incorrect solution

  S02  add timestamps to my model                      ✓ reuse  score 55%
         Path A reads: Model.js              8976 bytes  ≈ 2244 tok
         Path A reads: Trackable.js          2149 bytes  ≈  538 tok
         Path A total:                                   ≈ 2782 tok
         Path B result: dar find output     ≈  592 tok  (avoided 2190 tok)

  [... 6 more scenarios ...]

  File reads avoided (Path A): 28,696 tokens
  dar find output (Path B):    4,650 tokens
  Net tokens avoided:          24,046 tokens  (per session of 8 queries)
  Scenarios with incorrect-generation risk: 2/8

24,046 tokens across 8 queries. Every number derived from fs.statSync(). The reader can verify any figure with a single wc -c call.

The More Important Case

S01 — the wizard scenario — deserves a separate paragraph.

Without contracts, when a developer asks “how do I build a multi-step form?”, the LLM does not know that DarJS has a wizard: declaration in the pageDef. No amount of reading CreateView.js or PageDefRenderer.js surfaces this — the wizard feature is configured in the pageDef, not the implementation. The LLM reads three files (8,013 tokens), still does not find the pattern, and generates a custom multi-step form with Alpine.js state management.

That code works. It handles the UX correctly. But it does not integrate with the DarJS PageDef renderer, which means it bypasses form validation, i18n labels, and the wizard step skip conditions. The developer does not discover this until they push to staging.

The token cost — 7,417 tokens avoided — is the small part of the story. The real cost is the code review, the rewrite, and the afternoon lost to a solution that looked right.

The @contract block for configureWizardForm contains the entire answer:

// @example
wizard: [ { title: 'Customer', fields: ['name','phone'] }, { title: 'Items', fields: ['items'] } ]  // in pageDef
// @reuse-when
// I need a multi-step form, or I want to split a long form into steps, or I need a wizard with conditional steps

That is 80 characters. It is discoverable in 596 tokens. The alternative is 8,013 tokens of source reading followed by a wrong solution.

What This Is Not

This comparison does not prove that dar find produces perfect results. It proves that for framework-specific questions, the discovery cost without a retrieval layer is large and the result without a retrieval layer is unreliable.

The scores in the output — 90% for S01, 55% for S02 — are TF-IDF similarity scores. They are not “percentage of the time it gets it right.” They are a ranking signal. A score of 55% means the right contract is in the top result, not that the system is right 55% of the time.

The methodology is also conservative by design:

It does not count incorrect generation cycles (discovering the wrong answer and asking again)
It does not count repeated reads across a session (the same file read 3 times across 8 queries)
It does not count the LLM’s reasoning tokens when interpreting a 3,000-token file

The 24,046 figure is a floor.

The General Principle

Every framework develops a gap between what is in the source code and what a developer actually needs to know. The source code tells you how things are implemented. The developer needs to know which thing to reach for and what the call site looks like.

That gap is exactly what @reuse-when and @example close. Not “here is the implementation” — “here is the moment you would reach for this, and here is the one-line call.”

A retrieval layer built on these two fields turns 8,013 tokens of source reading into a 596-token result. The math is not the point. The point is that the question has a right answer — one specific contract, one specific example — and the retrieval layer puts it at the top of the result set in under a second.

Without that layer, every question about your framework is a discovery expedition. With it, the expedition happens once, when the contract is written.

Next: TBD