Right Model, Right Layer

Most developers use AI the same way they use a search engine: one query, one answer, move on. One model, one conversation, one project. The model does everything — architecture, implementation, documentation, templates — and the developer accepts whatever comes out.

This works until it doesn’t. The model that’s fast at generating boilerplate is slow and expensive at architectural reasoning. The model that thinks carefully about tradeoffs is overkill for writing a Nunjucks template. Using one model for everything is like using one tool for an entire workshop — technically possible, consistently suboptimal.

The paradigm that actually scales is multi-agent: different models handling different layers, each given exactly what it needs to do its job, none of them needing to understand the full system.

The Layers

Every serious software project has at least three layers of cognitive work:

Architecture and decisions. What should this system be? What are the tradeoffs? What gets built first? This layer requires deep reasoning, tolerance for ambiguity, and the ability to hold the whole system in mind. It’s slow work and it should be slow — wrong decisions here compound through every layer below.

Implementation. Given a precise spec, produce correct code. This is where most of the token volume lives. It doesn’t require architectural judgment — the spec already encodes the decisions. It requires speed, consistency, and the ability to follow a contract exactly.

Consumer work. Given a surface description — context variables, URL conventions, constraints — build something that plugs into the system. Templates, dashboards, integrations. This layer requires zero knowledge of internals. It just needs the contract.

These layers have different requirements. Routing them to the same model at the same cost is waste.

What Each Layer Needs

Architecture layer needs: the full problem space, the constraints, the history of what’s been tried. It benefits from a model that reasons slowly and explicitly — one that will push back, surface tradeoffs, and refuse to paper over ambiguity. In a phase-based build, this is the model that writes the specs. It reads the PRD, the prior phase output, and produces a spec tight enough that the implementation layer can follow it without asking questions.

Implementation layer needs: a precise spec, the relevant contracts, and nothing else. It doesn’t need the PRD. It doesn’t need the architectural history. It needs “here’s what to build, here’s the interface it must satisfy, here’s the test count that means you’re done.” Given that, a fast model produces correct output in one pass.

Consumer layer needs: only the surface. As covered in the previous piece — context variables, URL conventions, constraints. No internals. This layer can be any model, because the prompt does all the work. The model is almost interchangeable here.

The DarJS Example

Across eleven phases this played out concretely:

The phase specs were written with deep reasoning — what to build, what to defer, what alternatives were rejected and why. Each spec was designed to be self-contained: a model reading only the spec and the prior test output could implement the phase correctly. No re-reading of earlier phases needed.

The implementation of each phase followed the spec exactly. The spec encoded the decisions; the implementation just produced the code. Fast, high-volume, low-reasoning work.

The dashboard prompt from the previous piece is consumer-layer work. Any model given that prompt can produce correct Nunjucks templates. It needs no knowledge of MixinEngine, SchemaGenerator, or the phase history. The surface description is sufficient.

Three layers. Three different cognitive profiles. Three different information requirements.

The Handoff Contract

What makes multi-agent work is not the models — it’s the handoffs. A handoff is the artifact that passes from one layer to the next. If the handoff is wrong, the receiving layer produces wrong output regardless of how capable the model is.

A phase spec is a handoff from the architecture layer to the implementation layer. It must be self-contained — everything the implementation model needs, nothing it doesn’t. Ambiguity in the spec becomes a bug in the implementation.

A framework prompt is a handoff from the implementation layer to the consumer layer. It must describe the surface completely — every context variable, every URL, every constraint. Gaps in the prompt become wrong assumptions in the templates.

A decisions file is a handoff from any layer to the future. It captures what was rejected and why, so the next model joining the project doesn’t re-evaluate settled questions. Without it, every new model is context-blind — it sees the current state but not the path.

The quality of the output at each layer is bounded by the quality of the handoff it received. A brilliant model given a vague spec produces vague code. A modest model given a precise spec produces precise code.

Handoffs are the bottleneck. Not the models.

The Benefits

Cost scales with the work, not the project. Architecture reasoning is expensive and infrequent — use a capable model sparingly. Implementation is cheap and high-volume — use a fast model at scale. Consumer work is almost free if the prompt is well-written. Multi-agent lets you pay for reasoning only where reasoning is actually needed.

Parallelism becomes possible. In a single-model approach, everything is sequential — the model does one thing at a time. In a multi-agent approach, multiple consumer-layer models can work in parallel on different parts of the system, each given their surface prompt, none of them blocking each other.

Model switches don’t break the project. If a better model releases tomorrow, you swap it into one layer without touching the others. The handoff contracts don’t change. The architecture layer’s spec format doesn’t care which model implements it. The implementation layer’s output format doesn’t care which model consumes it.

The human stays at the right level. In a single-model approach, the developer is constantly context-switching — architecture one minute, template debugging the next. In a multi-agent approach, the developer works at the architecture and handoff level. They write specs, review contracts, and validate outputs. The implementation is delegated. The decision-making isn’t.

What This Requires

Multi-agent only works if the layers are actually separated. If the implementation layer needs to understand the architecture to produce correct code, the spec is incomplete. If the consumer layer needs to understand the implementation to build templates, the surface abstraction leaks.

Separation of concerns isn’t a nice-to-have in this paradigm — it’s load-bearing. Every leak in an abstraction is a gap in a handoff, and gaps in handoffs are wrong outputs.

This is why the patterns from the previous pieces matter. Contracts as plain objects — so handoffs are inspectable and serializable. Pure generators — so implementation output is testable without I/O. Escape hatches in the spec — so consumer-layer models know exactly when and how to override. Decisions files — so no model ever re-evaluates a settled question.

Each pattern is independently useful. Together, they make a system that multiple agents can work on without stepping on each other.

That’s the subject of the next piece.

Next: Contract-Based Architecture Is Agent-Ready Architecture