Loading episodes…
0:00 0:00

The Contract Corpus Has Two Layers

00:00
BACK TO HOME

The Contract Corpus Has Two Layers

10xTeam May 22, 2026 6 min read

After building the NLP contract system for DarJS — 245 contracts, 100% @reuse-when coverage, TF-IDF index, dar find running — Ahmed asked whether it was actually enough for a junior developer to use without an LLM.

The answer required running real queries. Not coverage checks. Real questions a junior would type when they don’t know the answer.

Most of them worked. dar find "timestamps on my model" found Trackable. dar find "protect route to managers only" found requireRole. The retrieval system did what it was designed to do.

Then: dar find "how do I add a field to a model". The top result was adapter.getFields() — the method that reads an existing field map. Correct function, wrong question entirely. The junior needed to know the workflow: edit the model file, regenerate the schema, run the migration. No function in the codebase does that. There’s nothing to retrieve.

The same pattern repeated across a dozen queries. “How do I scaffold a model.” “How do I define transitions.” “How do I configure a wizard form.” All returned the wrong thing or nothing useful.


What the System Can’t See

Articles 011 and 012 covered the retrieval case: when a function already exists, write @reuse-when in the caller’s vocabulary so TF-IDF can find it. That works for questions that have a function as their answer.

But junior developer questions divide into two categories, and only one of them has a function as the answer.

The first category: “What exists that does X?” The answer is a function. dar find retrieves it. This is what the contract system was built for.

The second category: “How do I accomplish X?” The answer is a workflow — a sequence of steps, CLI commands, and patterns. No single function is the answer. The information exists in the framework’s conventions, not in any function’s signature.

A corpus of function contracts handles the first category well. It handles the second category not at all. Every “how do I” query routes to the LLM regardless of how complete the function coverage is, because there’s structurally nothing to retrieve.


The Second Layer

The fix is a second layer in the corpus: procedure contracts.

A procedure contract doesn’t describe a function. It describes a task — a recipe a developer follows to accomplish something. It lives in a documentation module that gets indexed alongside the function contracts, but it points to CLI commands and code patterns instead of callable APIs.

In DarJS this became packages/cli/procedures.js — a module that exports nothing but carries a contract block for every common development task:

/**
 * @contract
 * @role        coordinator
 * @domain      cli
 * @does        Documents the steps to add a new field to an existing DarJS model and sync it to the database.
 * @tags        add-field, new-field, extend-model, schema, migration, prisma
 * @example     1. Edit models/Order.js — add field to static schema. 2. dar schema prisma. 3. npx prisma db push
 * @reuse-when  I need to add a new field to a model, or I want to extend a model with more data,
 *              or I changed my model schema and need to update the database
 * @complexity  simple
 */
function addFieldToModel() { /* step-by-step in comments */ }

The @example isn’t a function call — it’s the sequence of commands. The @reuse-when uses “I need” and “I want” and “I changed” — the language of someone partway through a task who got stuck. The @role is coordinator rather than transformer or query, signaling that this contract orchestrates steps rather than wrapping a callable.

After adding thirteen of these — covering field addition, mixin application, transition definition, PageDef generation, locale sync, scenario testing, migrations, RBAC configuration, wizard forms, app scaffolding — the results changed:

dar find "add a field to a model"          → addFieldToModel     53%  ✓ reuse as-is
dar find "run migration database"          → runMigration        64%  ✓ reuse as-is
dar find "multi step form wizard"          → configureWizardForm 90%  ✓ reuse as-is
dar find "define transitions workflow"     → defineTransitions   68%  ✓ reuse as-is
dar find "write a test scenario"           → writeScenarioTest   65%  ✓ reuse as-is

The LLM is no longer called for any of these. The answer exists in the corpus. Retrieval handles it.


Why This Layer Is Usually Missing

Function contracts are written as a byproduct of writing functions. When you add a @contract block to a function, the function is right there — you read its signature, describe what it does, name the reuse condition.

Procedure contracts have no corresponding code artifact. The workflow for adding a field to a model is real knowledge in the framework — it’s just not localized anywhere. It lives in the heads of people who’ve done it before. There’s no file to annotate.

This makes procedure contracts easy to skip and easy to forget entirely. The function coverage metric reaches 100% and looks complete. The gap is invisible until someone types “how do I” into the search and gets back the wrong thing.

The signal that the second layer is missing: dar find routes most questions beginning with “how do” to ↑ generate new, regardless of how mature the function contracts are. That’s not an LLM routing decision — it’s a corpus gap.


What the Two Layers Cover

Taken together, the two layers answer different questions:

Function contracts answer: “What exists in this codebase that I should use instead of writing new code?” They’re written for developers who have a clear technical need and are checking whether it’s already solved.

Procedure contracts answer: “I’m trying to accomplish something and I don’t know the path.” They’re written for developers who know what they want to build but don’t yet know the steps. For junior developers, this is most questions. For an AI agent starting a new task in an unfamiliar codebase, it’s also most questions.

A corpus with only the first layer is complete from an API documentation perspective and incomplete from a usability perspective. The people who most need the system — the ones who don’t already know the answer — hit the gap immediately.


The Pattern

When building a contract corpus for retrieval, add a procedures file alongside the function contracts. One contract per common task. @role: coordinator. @example shows the command sequence, not a function call. @reuse-when uses the language of someone midway through a task: “I need to,” “I want to,” “I changed X and now.”

The target audience for function contracts is someone who knows what they need but doesn’t know what to call. The target audience for procedure contracts is someone who knows what they want to accomplish but doesn’t know where to start. The second group is larger than the first — and until the second layer exists, they all end up at the LLM.


Next: The Token Cost Is in the Discovery


Join the 10xdev Community

Subscribe and get 8+ free PDFs that contain detailed roadmaps with recommended learning periods for each programming language or field, along with links to free resources such as books, YouTube tutorials, and courses with certificates.

Audio Interrupted

We lost the audio stream. Retry with shorter sentences?