From Oracle to Builder: Write-Capable AI Tools and the Scaffold Workflow

Most AI tool layers built over frameworks are read-only. The AI can inspect the app — list models, check health, search contracts — but everything it wants to create or change has to flow through a human editing files.

The read layer has real value: an AI that can see your app’s structure will give better answers than one that can’t. But there’s a ceiling. At some point the AI says “here’s the model code to add” and the human copies it into a file. At some point the AI says “your locale keys are missing these” and the human runs dar locale sync --fix. The human is the execution layer.

Write tools change the shape of the interaction. The AI doesn’t hand you code — it writes the file. The difference sounds small and turns out to be significant.

What Write Tools Actually Do

The naive case for write tools is automation: skip the copy-paste. That’s real, but not the interesting part.

The interesting part is what happens to the AI’s reasoning when it has write capability.

With read tools only, the AI gives advice. It proposes a model structure, shows the code, explains what mixins to use. The human applies it. Errors are caught by the human at apply time. The AI never sees the output of its own suggestions.

With write tools, the AI applies the change, then reads the result back to verify. It creates the model, runs health() to confirm no validation errors, checks getModels() to see the model appears, runs fixLocale() to populate missing keys, confirms the result. Each tool call is a step in a feedback loop, not a one-shot suggestion.

The AI catches its own errors. It scaffolds, checks, adjusts, checks again — without a human in the loop for each step.

The Seven Write Tools

A write-capable MCP layer over DarJS has seven write tools alongside six read tools:

Tool	What it does
`create_model`	Write `models/Name.js` with chosen mixins and fields
`generate_pagedef`	Run the pagedef-gen tool to auto-generate a PageDef from a model
`write_pagedef`	Write a fully customized PageDef to `pages/id.js`
`fix_locale`	Run `dar locale sync --fix` — write all missing i18n keys
`suggest_mixins`	Return scored mixin suggestions for a plain-English description
`scaffold_app`	Generate a complete app structure from a brief
`run_scenario`	Run a named test scenario and return pass/fail

Each tool is a thin wrapper over a method on the adapter class. The tool handler calls the adapter, the adapter does the work, the tool returns structured output the AI can read and reason about.

suggest_mixins is technically read-only — it returns recommendations without writing anything. But it’s part of the write workflow: the AI calls it before create_model to get scored candidates for the mixin list.

The Scaffold Workflow

The pattern that emerges from these tools working together is a scaffold workflow — the AI builds an app from a description:

1. suggest_mixins("medicine product with batch lot and expiry")
   → Batchable (0.82), Expirable (0.71), Stockable (0.64)

2. scaffold_app({ name: "pharmacie", models: [{ name: "Product", mixins: [...] }] }, dry_run: true)
   → preview: dar.config.js, manifest.js, models/Product.js, locales/en.json

3. scaffold_app({ ... }, dry_run: false)
   → writes all files

4. generate_pagedef("Product")
   → writes pages/product.js

5. health()
   → { ok: true, models: 1, pages: 1, ... }

6. fix_locale()
   → populated locales/en.json

Six tool calls. A runnable app with real models, pages, and locale keys. The AI wrote every file, verified the result, and fixed the gaps — without a human touching a keyboard between steps.

The Dry-Run Gate

There’s a problem with write tools in a stdio-based MCP server: there’s no round-trip for confirmation.

A typical write workflow in a UI might be: show the user a preview, wait for confirmation, then write. In a stdio MCP server, tools are called by the AI autonomously. There’s no interaction channel for “are you sure?” between a preview and a write.

The solution is a dry_run parameter on scaffold_app (and any write tool where preview matters):

scaffold_app({
  brief:   { name: "pharmacie", models: [...] },
  dry_run: true   // default — returns file contents without writing
})

dry_run: true (the default) returns a full preview: file paths, sizes, and complete content for each file. The AI presents this to the user. The user says “looks good.” The AI calls again with dry_run: false.

This is the confirmation gate that doesn’t require the tool protocol to support round-trips. The AI implements the two-phase commit using two ordinary tool calls.

The default is dry_run: true for a reason: a tool called without explicit intent writes nothing. The AI has to deliberately opt into writing by passing dry_run: false. This matches how humans should think about it — preview is the safe default, write is the deliberate step.

Mixin Scoring

suggest_mixins runs TF-IDF over a static catalog of 33 mixins. The catalog has one entry per mixin with a keyword list, field list, and description:

{
  name: 'Batchable',
  category: 'domain',
  description: 'Tracks items by batch lot number and expiry date',
  keywords: ['batch', 'lot', 'expiry', 'expire', 'traceability', 'recall', 'pharmaceutical', 'food'],
  fields: ['batch_number', 'lot_number', 'expiry_date', 'manufacture_date'],
}

When the AI calls suggest_mixins("medicine product with batch lot and expiry"), the function tokenizes the description, computes keyword overlap with each catalog entry, and returns the top matches sorted by score.

The scores are meaningful enough to be useful but not so precise as to be misleading. A score of 0.82 for Batchable on “batch lot” input means high overlap; 0.20 for Timestamps means low overlap but not zero. The AI uses scores to decide which mixins to propose and which to skip.

The catalog doubles as documentation. Reading mixinCatalog.js tells you what every mixin does and what keywords it responds to — more readable than trawling through implementation files.

The Write Method Architecture

Write methods on the adapter are hand-written and live outside the generated section. They don’t follow the same pattern as read methods — they’re not thin wrappers over CLI calls. They do real work: file path computation, content generation, file I/O.

createModel builds a model file from a spec:

createModel({ name, mixins, fields }) {
  const modelsDir = path.join(this._appDir, 'models');
  const filePath  = path.join(modelsDir, `${name}.js`);
  const content   = this._renderModelFile({ name, mixins, fields }, modelsDir);
  fs.mkdirSync(modelsDir, { recursive: true });
  fs.writeFileSync(filePath, content, 'utf8');
  return { path: filePath, size: content.length };
}

_renderModelFile computes require paths dynamically from the model file’s location to the packages:

const corePath = path.relative(fromModelsDir, path.join(this._repoRoot, 'packages/core/index.js'));

An app at apps/my-app/models/Product.js gets require('../../../packages/core/index.js'). An app at the repo root gets require('./packages/core/index.js'). The adapter handles both — the caller passes a model spec, not a path.

The Feedback Loop in Practice

The scaffold workflow isn’t just about writing files faster. It’s about what changes when the AI can verify its own output.

Before write tools, the AI might generate a model file with a field type it got slightly wrong. The human applies it, runs dar health, sees the validation error, reports back, the AI fixes it. Three round-trips through the human.

With write tools, the AI creates the model, runs health() immediately, sees the validation error in the tool response, fixes the model file, runs health() again, gets clean. The human never saw the error. The correction happened inside the AI’s tool-call loop.

This changes the quality bar for AI-generated code. The AI can iterate to correctness before the human sees the result, rather than delivering a first draft that the human has to repair.

It also changes what the AI will attempt. A write-capable AI will scaffold a full app from a brief because it can verify each step works. A read-only AI will describe the scaffold and hand it to the human because it can’t verify anything it generates.

What Still Requires Human Judgment

Write tools don’t replace human review — they change when it happens.

The AI calls dry_run: true and shows the human what it will build. That’s when the human reviews the mixin choices, checks the field names, confirms the structure matches the real domain. This is higher-value review than “check this code compiles” — it’s “is this the right model for the domain?”

The AI handles the mechanical correctness (valid field types, proper require paths, locale keys populated). The human handles the domain judgment (is this the right model shape, are these the right mixins for this business, does the PageDef match the actual workflow).

The division works because the tools make the mechanical part reliable. If health() passes after scaffold, the app is structurally valid. The human reviewing a structurally valid draft does less error-spotting and more design-thinking.

The Read/Write Balance

Thirteen tools: six read, seven write. The read tools are lighter — they call CLI internals and return JSON. The write tools do more work — file I/O, content generation, path computation.

The balance matters. The write tools are only useful if the read tools give the AI enough context to make good write decisions. suggest_mixins scores candidates before create_model chooses them. health() verifies after every write. getModel() confirms the model appears correctly after creation.

The read tools are the AI’s senses. The write tools are its hands. Neither is useful without the other.

A read-only AI tool layer is a good oracle. A read-write AI tool layer is a working collaborator.

The pattern in these last few articles runs in one direction: each layer builds on the last. Inspect gives the AI eyes. PageDef autofill gives it judgment about UI structure. The write tools give it hands. The adapter layer keeps all of it from breaking when the framework evolves. The system is designed for the steady state, not the demo — where refactors happen, where coverage matters, where partial breakage is worse than total breakage.