Building in the Age of Agents

A series for developers who take their craft seriously and want to work effectively with AI — not as a shortcut, but as a force multiplier for deep knowledge.

Each piece is grounded in real work on a real project (DarJS — a mixin-based business framework built over 11 phases). The patterns are transferable to any serious software project.

The Series

000 — Before You Start: The AI Concepts That Actually Matter for Builders

File: 000_preface.md Status: ✅ Written

001 — Instructions as Design Patterns

How to write AI instructions that transfer intent reliably across sessions and models. The three-part structure (trigger + why + application scope) that makes a rule scale. File: 001_instructions_as_design_patterns.md Status: ✅ Written

002 — The Sentence That Takes a Paragraph to Explain

Domain knowledge compressed into single sentences. What they look like, how they’re earned, and why the developer who can write them becomes the steering layer when AI handles execution. File: 002_domain_compression.md Status: ✅ Written

003 — How to Write a Framework Prompt

How to extract just the public surface of a system and describe it precisely enough that any AI can build against it without knowing the internals. What to include, what to deliberately omit, how to encode design decisions as constraints. File: 003_framework_prompt.md Status: ✅ Written

004 — Right Model, Right Layer

Different AI models have different strengths. A well-structured project lets you route work to the right model: architecture to the deep thinker, implementation to the fast builder, templates to any model given a tight surface prompt. File: 004_right_model_right_layer.md Status: ✅ Written

005 — Contract-Based Architecture Is Agent-Ready Architecture

The centerpiece. Everything you do to make a system composable also makes it agent-composable. The contracts that let a junior developer build without knowing internals are the same contracts that let an AI agent do it. Good software boundaries and good agent boundaries are the same thing. File: 005_contracts_and_agents.md Status: ✅ Written

006 — Writing Art Direction, Not Image Prompts

The difference between describing an image and specifying the conditions that produce a consistent family of images. The four-part structure — persona, world anchor, specific asset, technical constraints — and why it transfers to every domain where you prompt AI to produce outputs that must work together. File: 006_art_direction_not_image_prompts.md Status: ✅ Written

007 — The Diagram That Pulses While the Code Runs

Most architecture diagrams are dead the moment they’re drawn. A live diagram driven by runtime signals is always accurate because it doesn’t describe the code — it is the code, made visible. The implementation, the principle, and why it matters more for AI-assisted development than for any other kind. File: 007_the_diagram_that_pulses.md Status: ✅ Written

008 — Debug First

The conventional order — build the features, add observability later — is backwards. Every invisible problem in a framework is an instrumentation problem. The four instruments that make a system legible while it runs, why to build them before the features, and how they change what AI-assisted development can be. File: 008_debug_first_framework_design.md Status: ✅ Written

009 — How to Make Your App AI-Testable

Most apps can theoretically be AI-tested but practically can’t. The reason DarJS worked is that its DOM is deterministically derived from model metadata. The design principle: route contracts + field contracts + selector contracts must all flow from the same source of truth. File: 009_ai_testable_apps.md Status: ✅ Written

010 — The DSL Layer Between AI and Your App

The mistake is asking AI to write Playwright. The right pattern: design a semantic JSON DSL, have AI emit that, have a thin runner translate DSL → browser actions using domain knowledge. AI never touches the DOM. The DSL is the contract between AI capability and your runner. File: 010_dsl_layer.md Status: ✅ Written

011 — The NLP-First Codebase: Replacing LLM Calls with Retrieval

Most of what developers ask an LLM is a retrieval problem wearing a generation costume. Build an indexed contract corpus with @reuse-when fields, run TF-IDF over it, route to nlp-reuse / nlp-verify / llm-generate. The hit rate for retrieval on simple contracts: above 80%. The LLM handles what the index can’t — which turns out to be less than you’d expect. File: 011_nlp_first_codebase.md Status: ✅ Written

012 — Writing Code for Machines, Not Just Humans

Documentation is written from the implementor’s perspective. @reuse-when is written from the caller’s perspective — the words someone would type before they know the function exists. That gap between implementation language and caller language is exactly what makes code hard to find without an LLM. Closing it with structured annotations makes the codebase directly queryable by any retrieval system. File: 012_code_for_machines.md Status: ✅ Written

013 — Your Directory Layout Is Now a Routing Table

An agent navigating a codebase doesn’t carry internalized context across sessions. It reads what’s present and infers what’s absent. A file placement table in a local CLAUDE.md is a routing instruction that executes in zero tokens. An undocumented convention is a reasoning problem the agent solves from scratch every session — and may solve differently each time. File: 013_directory_as_routing_table.md Status: ✅ Written

014 — Your Framework Needs a `dar inspect`

AI agents working on a codebase face an introspection gap: the runtime shape of an app — fields after mixin composition, transitions, registered pages — doesn’t exist anywhere a static reader can find in one place. dar inspect closes that gap with a live CLI that answers structural questions in one call. The pattern: CLI first, MCP wrapper second, same interface for both. File: 014_framework_needs_inspect.md Status: ✅ Written

015 — 80% Without the LLM: PageDef Autofill and What It Proves

For a well-specified business UI, roughly 80% of the interface definition can be generated deterministically from model structure — columns from scalar fields, filters from enum fields, widgets from mixin lookup. The remaining 20% is genuinely undecidable and gets flagged for human/LLM judgment. The split isn’t a shortcut — it’s the correct division of labor between retrieval and reasoning. File: 015_pagedef_autofill.md Status: ✅ Written

016 — One Config Object, Five Form Screens

A twelve-line PageDef wizard declaration generates a full multi-step form: step navigation, skipWhen conditions, widget steps, summary screen, accumulated form data on submit. The pattern: declarative config expresses what the UI does, the framework handles how. Every AI agent working on the app gets a stable interface for expressing complex form behavior without touching Alpine state management. File: 016_wizard_from_config.md Status: ✅ Written

017 — The Stable Adapter Layer: Building AI Tools That Don’t Break When You Refactor

When AI tools wrap a changing codebase, every direct import is a coupling that silently rots on the next refactor. A single adapter class reduces the surface area to one change point. A spec-driven generator makes the read method section derived, not maintained. A drift detection script catches export renames before they cause partial failures that look like data problems. File: 017_stable_adapter_layer.md Status: ✅ Written

018 — From Oracle to Builder: Write-Capable AI Tools and the Scaffold Workflow

Read-only AI tools answer questions. Write-capable tools change the shape of collaboration: the AI scaffolds, verifies with health checks, corrects mistakes in its own tool-call loop — before the human sees the result. The dry_run gate solves the confirmation problem in stdio MCP servers. Six tool calls build a runnable app: suggest mixins, scaffold, generate PageDef, verify health, fix locale. File: 018_write_capable_mcp.md Status: ✅ Written

019 — The Design Layer: DESIGN.md, Token Systems, and Closing the AI Visual Loop

DESIGN.md (Google Labs, April 2026) gives AI agents a persistent, structured understanding of your visual identity — tokens for exact values, prose for design rationale. Combined with DOM contracts (data-field, data-action attributes) and screenshot vision, the AI build loop closes at the UI layer: scaffold → inspect → critique against spec → verify. Stagehand replaces brittle CSS selectors with AI-native browser actions that survive layout changes. File: 019_design_layer.md Status: 🔲 Planned

020 — DOM Contracts: The Attributes That Make Your UI Testable Without an LLM

The data-field, data-action, data-page, data-record, data-transition attributes added to PageDef templates are a DOM contract layer — machine-readable selectors that survive layout changes, work with Playwright, Stagehand, and NL test runners equally, and cost nothing to add since the renderer already knows every identifier. The pattern: annotate once at the framework level, get testability everywhere. File: 020_dom_contracts.md Status: 🔲 Planned

021 — Living Documentation: CODEMAP as a Synced Artifact

Most codebases have documentation that was accurate when written and wrong six months later. A CODEMAP that’s generated from source — symbols, line numbers, smell markers — is always accurate because it can’t drift. The dar codemap --sync pattern: extract symbols from source, detect known smells, patch the markdown table. The byproduct discipline: every file read updates the map. The result: cold-start sessions navigate by CODEMAP without verification reads. File: 021_living_codemap.md Status: 🔲 Planned

022 — The Confidence Gap as a Safety Gate

When retrieval drives automation, the difference between the top match and the second-best is more informative than the score itself. A 56% match with a 53% runner-up is more dangerous than a 30% match with no competitors — the first is ambiguous, the second is just uncertain. The pattern: ambiguous = (second >= best * 0.90) applied in ui-resolver.js prevents silent wrong-element clicks in NL test runners. The principle generalises to any system where a retrieval result triggers a side effect. File: 022_confidence_gap_safety_gate.md Status: ✅ Written

Source material:

darjs/packages/nlp/ui-resolver.js:156 — ambiguous flag implementation (second_best >= best * 0.90)
darjs/packages/nlp/__tests__/ui-resolver.test.js — ambiguity detection tests (TF-IDF score distribution, 56 tests)
darjs/packages/testing/nl-runner.js — NlAmbiguousError thrown by translateNlStep; safety gate in action
darjs/responses/RESPONSE_2026-05-16_nl-testing-p2-p3.md — score distribution table (0.85–0.99 specific, 0.53–0.56 ambiguous region)
darjs/responses/RESPONSE_2026-05-16_nl-testing-p5.md — NlAmbiguousError design + test failure analysis
darjs/decisions/nl-testing.md — full NL testing design; Layer 5 runner uses confidence threshold

023 — The Living CODEMAP: a Symbol Index That Stays Accurate

Most codebases have documentation that was accurate when written and wrong six months later. A CODEMAP generated from source — symbols, line numbers, smell markers — is always accurate because it can’t drift. The dar codemap --sync pattern: extract top-level symbols via regex, detect known smells, patch the markdown table. The byproduct discipline: every file edit re-runs --check in the pre-commit hook. The result: cold-start sessions navigate without verification reads. File: 023_living_codemap.md Status: ✅ Written

Source material:

darjs/tools/codemap/symbol-extractor.js — top-level-only regex extraction; const filter (function assignments only, not imports)
darjs/tools/codemap/smell-detector.js — RULES array pattern; 5 smell rules
darjs/tools/codemap/codemap-patcher.js — parseCodemapSections, checkSection, applyPatches; cleanSymbolName bug (params bleed)
darjs/packages/cli/commands/codemap.js — dar codemap --check/--sync
darjs/docs/CODEMAP.md — live example: 25 stale patched, 17 missing appended on first run
darjs/responses/RESPONSE_2026-05-16_codemap-sync-tool.md — design doc, token ROI argument

024 — Your Test Suite Doesn’t Test Your Browser Code

The test suite passed. 1505 tests, all green. The Studio UI was completely broken. A function called uid was declared twice — fine in Node’s module scope, a parse-time SyntaxError in a browser script tag that silences every event handler. node --check in the pre-commit hook catches the exact category of error the tests cannot see: global scope redeclarations, the gap between the Node execution model and the browser execution model. File: 024_test_suite_doesnt_test_browser.md Status: ✅ Written

Source material:

darjs/packages/studio/public/studio.js — uid redeclaration incident; browser global scope
darjs/tools/hooks/check-js-syntax.js — node --check pre-commit gate
darjs/.git/hooks/pre-commit — blocking hook suite

025 — One Artifact, Three Consumers

A scenario JSON file is read by three systems: dar test reads the structural fields (action, model, data, to) and makes HTTP requests; the NL Playwright runner reads nl and drives the browser; a business owner reading the Studio panel reads name, actor, and nl. One file, no conversion. The nl label is the hinge — it simultaneously describes intent for the human and drives automation for the NL runner. The design question for any data artifact: who reads this, and can I include all their fields without conflict? File: 025_one_artifact_three_consumers.md Status: ✅ Written

Source material:

darjs/packages/studio/server.js — scenarios-write endpoint (writes the unified format)
darjs/packages/studio/public/studio.js — Studio scenario designer (business owner view)
darjs/packages/testing/nl-runner.js — executeNlStep (NL consumer)
darjs/packages/studio/__tests__/scenarios.test.js — format validation

026 — The Test Seam for Heavy Dependencies

vi.mock() doesn’t intercept dynamic import() inside CommonJS modules. When a module lazy-loads a 23MB ML model via await import(...), module-level mocking hangs the test suite. The solution is a _setPipelineForTest(fn) export — a private seam that bypasses the real model loader entirely. The _ prefix marks it as test infrastructure, not production API. The pattern applies to any lazy-loaded heavy dependency: database connections, HTTP clients, ML models — anything where “acquiring the resource” can be separated from “using the resource.” File: 026_test_seam_for_heavy_deps.md Status: ✅ Written

Source material:

darjs/packages/nlp/semantic-resolver.js — _setPipelineForTest, _pipelineOverride, getPipeline() seam
darjs/packages/nlp/__tests__/semantic-resolver.test.js — fakePipeline + beforeAll/_setPipelineForTest pattern
darjs/packages/testing/__tests__/nl-runner.test.js — same seam propagated up to nl-runner tests

027 — Parse, Don’t Run: Schema Introspection Without a Live Process

The PrismaSchemaAdapter pattern: a text-only implementation of your introspection interface that works without a running process. Why requiring execution to answer a metadata question is the wrong dependency. How the three-level SchemaAdapter hierarchy (abstract / DarJS live / Prisma text) decouples every consumer — renderer, router, generator — from ModelClass internals. The test that proves an interface is real: swap implementations, consumer tests are identical. File: 027_parse_dont_run.md Status: ✅ Written

Source material:

darjs/packages/core/adapters/SchemaAdapter.js — abstract base, throw-on-all-methods pattern
darjs/packages/core/adapters/DarJSSchemaAdapter.js — live ModelClass wrapper, fromManifest factory
darjs/packages/core/adapters/PrismaSchemaAdapter.js — text parser, regex extraction, nested-paren @default fix
darjs/packages/platform-api/app.js — app.locals.schema built at startup, passed to PageDefRouter
darjs/packages/platform-api/renderer/PageDefRouter.js — schema threaded to all renderer calls + coerceBody + buildWhere
darjs/packages/cli/commands/generate.js — DarJSSchemaAdapter built inline before fromManifest

028 — The Contract Corpus Has Two Layers

Developer questions divide into two categories: “what exists that does X?” (answered by function contracts) and “how do I accomplish X?” (answered by procedure contracts). A corpus with only function contracts silently routes every “how do I” query to the LLM regardless of coverage. The procedure contract — @role: coordinator, CLI sequences in @example, task language in @reuse-when — closes the gap. File: 028_two_layers.md Status: ✅ Written

029 — The Token Cost Is in the Discovery

Every question about a framework involves a discovery phase — the LLM reading source to find which thing to reach for. Eight real queries, eight documented discovery paths, actual file byte sizes. dar find avoids 24,046 tokens of file reading across a single session of eight queries. The more important case: two scenarios where the LLM generates technically working but framework-incorrect code, and the token count doesn’t capture that at all. File: 029_path_comparison.md Status: ✅ Written

030 — Build Debug Tools Your AI Can See

You built a debug panel so humans could see browser state. Then you had to figure out how to give that same visibility to the AI working alongside you. The gap — tools built for human eyes vs. tools an AI can actually use — is the core agentic tooling problem most developers haven’t hit yet. Three options (manual relay, REST endpoints, CDP), why Chrome’s built-in remote debugging protocol is the right answer for browser state, and how to wire it into your project CLI in one script with zero dependencies. File: 030_ai-visible-debug-tooling.md Status: ✅ Written

019 — The Three Token Debts — and the One Architecture That Pays Them

Cold start tokens aren’t all the same thing. They break into three distinct debts: orientation tokens (where do I look?), interface tokens (what does each module take and return?), and reuse-discovery tokens (does something like this already exist?). The full retrieval stack — file map, CODEMAP, contracts.js, NLP index — replaces 4,000–8,000 tokens of cold-start reading with ~400 tokens of structured loading. Built from the chesswar modular rebuild session. File: 019_three_token_debts.md Status: ✅ Written

031 — The Debug Panel as an npm Package (TODO)

The debug-panel.js plugin architecture — register any plugin, card system, mobile-first bottom sheet — is genuinely reusable across any site. There’s no lightweight mobile-first debug overlay with a clean plugin API in the npm ecosystem. This article covers extracting it, the plugin contract, the built-in plugins (push, SW, storage, PWA, network) as optional add-ons, and why the AI-visibility angle (wiring it to CDP) makes it more than just another devtools panel. File: not yet written Status: 📋 TODO — context saved in ahmedbouchefra2 session 2026-05-18

032 — Pre-Index Your Codebase Before the Agent Needs It

How CodeGraph eliminates 94% of agent file-read tool calls by building a local SQLite knowledge graph from tree-sitter AST parsing — zero LLM tokens, real-time sync, 8 MCP tools. The general pattern: build a queryable index once so every session starts with a map instead of a blank filesystem. File: 032_pre-index-your-codebase.md Status: ✅ Written — 2026-05-20

033 — Structure Is Not Intent: Two Layers Every Code Intelligence System Needs

Structural indexes (CodeGraph) answer where things are and what touches them. Semantic contracts answer why you’d use them and when to reuse. Neither is complete without the other. Shows the combined workflow, the gap each system leaves, and how to build both layers as a byproduct of normal work. File: 033_structure-vs-intent.md Status: ✅ Written — 2026-05-20

034 — Five Principles Behind Every Good AI Code Search Tool

The underlying design decisions that separate AI-assisted code search from expensive file-reading sessions: static analysis at zero LLM cost, annotations as query targets, field weighting by intent, routing as a first-class dispatch decision, and graph relationships as queryable data. File: 034_five-principles-ai-code-search.md Status: ✅ Written — 2026-05-20

035 — Six Habits That Make Your Codebase More AI-Readable

Practical habits — not tools — that reduce the cost of every AI session: write intent at definition, use closed vocabularies, name callers and dependencies, group work into pipelines, extract metadata as a byproduct, pre-index before sessions. Each applies independently. File: 035_six-habits-ai-readable-codebase.md Status: ✅ Written — 2026-05-20

036 — Kill Your CODEMAP: When Structural Tools Make Manual Symbol Indexes Obsolete

The CODEMAP pre-commit hook blocked commits twice in one session — both times because lines shifted after adding a function. The pattern: manually maintained indexes are gap-fillers for missing structural tooling. Once CodeGraph is in place, CODEMAP is redundant on every dimension (symbol location, line numbers, caller/callee hints) and becomes pure maintenance cost. The decision to retire it, what replaced each part, and the general rule for when to kill any manual index. File: 036_kill-your-codemap.md Status: ✅ Written — 2026-05-21

037 — Two Layers of Intelligence: Merging Semantic Contracts with Structural Call Graphs

DarJS contracts had 263 nodes and 0 resolved edges. CodeGraph had 2,598 nodes and 3,941 real call edges. Neither was complete alone. The MergedAdapter uses CodeGraph nodes as the base, enriches them with DarJS semantic fields (role, domain, does, reuse-when) by name match, and uses CodeGraph edges exclusively. The result: a graph that answers both semantic questions (what is this for?) and structural ones (what breaks if this changes?). The general principle: documentation systems and AST tools answer different questions, and most projects have at most one of them. File: 037_two-layers-semantic-structural.md Status: ✅ Written — 2026-05-21

038 — codeview: How a Code Intelligence UI Gets Built as a Byproduct

codeview started as “can we have a visual for this?” and ended as an open-sourceable standalone tool. Three architectural decisions made it so: a four-method adapter interface (any data source that implements it works), auto-detection of available sources (no config, just point at a directory), and no build step (Alpine + D3 from CDN, plain http.createServer). The graph design problem — 2,042 nodes is a hairball — and three solutions: node size by reference count, labels hidden until zoom threshold, ego mode for one-click focus. What “byproduct” actually means: scoped by the question, not by a PRD. File: 038_codeview-byproduct-ui.md Status: ✅ Written — 2026-05-21

039 — Rules Don’t Route, Tools Do

You write Rule 0 in CLAUDE.md. The agent greps anyway. This isn’t disobedience — it’s execution momentum. Symbol lookup → grep is the dominant pattern for that intent across a decade of training data. Rules fire at instruction time; execution happens somewhere else entirely. Three 2026 mechanisms that actually enforce correct tool routing: pre-task routing with plan_turn (intercepts intent before momentum builds), Claude Code hooks at the Bash layer (enforcement at the tool call layer, not the instruction layer), and session waste auditing with get_optimization_report (makes the pattern visible as data). The practical takeaway: numbered protocols beat rule sentences because protocols execute; rules get interpreted. File: 039_rules-dont-route-tools-do.md Status: ✅ Written — 2026-05-22

Source Material

All pieces are grounded in the DarJS project:

Repo: /home/ahmed/antigravityapps/autonomous/darjs/
Decisions: darjs/decisions/phase1.md → phase11.md
Engineering patterns tutorial: darjs/docs/tutorial-engineering-patterns.md
Framework prompt: darjs/responses/PROMPT_dashboard-tailwind.md
Bonus essays (Claude series): /home/ahmed/antigravityapps/autonomous/Claude/BONUS_instructions_as_design_patterns.md

Series location: /home/ahmed/antigravityapps/autonomous/agents-series/ Latest update: 2026-05-22 — 039 written; sourced from fix_locale → fill_labels rename session (tool-habit inertia, protocol vs rule, 2026 enforcement mechanisms)