tags: [code-search, static-analysis, annotations, routing, graph, ai-tools, concepts] related:

032_pre-index-your-codebase.md
033_structure-vs-intent.md
011_nlp_first_codebase.md supersedes: ~ status: current —

034 — Five Principles Behind Every Good AI Code Search Tool

The Problem

Most developers think of AI code search as: type a query, the AI reads files, returns an answer. That framing is expensive and slow. Every file read is a tool call. Every tool call costs latency and tokens. A session that spends 30 tool calls orienting itself is a session that hasn’t shipped anything yet.

The tools that actually reduce that cost share five underlying principles. They’re not implementation details — they’re design decisions you can apply to any codebase, any toolchain, regardless of what the AI agent is.

1. Static Analysis Produces Intelligence at Zero LLM Cost

The most counterintuitive insight in AI-assisted development: the AI doesn’t need to read your code to understand its structure. A parser does.

Tree-sitter, AST parsers, import analyzers — these tools extract every symbol, every call relationship, every type hierarchy from your codebase in milliseconds. No model involved. No tokens spent. The output is a queryable graph: normalizeOrder calls PriceCalculator, is called by saveOrder, lives in packages/orders/processor.js at line 42.

When the agent asks “where is this defined?” or “what calls this?”, it’s asking a structural question. Structural questions have structural answers. An LLM reading file contents to answer “what calls normalizeOrder?” is doing the work a grep or a graph query could do instantly for free.

The principle: separate structural questions from semantic questions. Structural questions — location, relationships, type hierarchies, call chains — should never require an LLM call. Build or use an index that answers them in one query.

2. Annotations Are Query Targets, Not Documentation

Documentation is written for humans. Annotations for AI-assisted search are written for machines — specifically, for the retrieval layer that decides whether to reuse existing code or generate new code.

The difference shows up in what you write:

Documentation	Annotation for retrieval
“This function handles the order normalization flow.”	`@does Converts raw API payload into normalized Order with computed subtotal, tax, total.`
“Used in the checkout pipeline.”	`@reuse-when You receive a raw order payload from any REST endpoint and need a normalized Order object.`
“Complex function, touch carefully.”	`@complexity moderate`

Documentation is prose. It’s good at explaining. It’s bad at being searched.

Annotations are structured. Each field is a query target: @does is searched when someone describes what they need. @reuse-when is matched against the trigger condition. @complexity is a routing signal — it tells the tool whether to reuse this function as-is, verify it first, or hand off to an LLM.

The principle: write at least four fields for every public function — what it does (one active-verb sentence), when to reuse it (a plain English condition), what role it plays (from a closed list), and how complex it is (simple/moderate/complex). These four fields are the minimum for a retrieval system to route correctly.

3. Field Weighting Encodes Your Priority

Not all metadata is equally useful for finding the right function. A TF-IDF or embedding-based search that treats @role the same as @reuse-when will produce worse results than one that knows @reuse-when is the most intent-rich field.

The weights used in a well-tuned system look something like:

Field	Weight	Why
`@reuse-when`	×3	Highest intent signal — written exactly for this query
`@does`	×2	Second-most specific — names inputs and outputs explicitly
`@tags`	×2	Dense keywords — retrieval terms without prose noise
function name	×2	Exact name match matters — devs often know what they’re looking for
`@role`	×1	Classification signal — useful for filtering, not ranking
`@domain`	×1	Namespace — useful for scoping, not ranking

These weights aren’t arbitrary. They reflect how much each field narrows the search space. @reuse-when was written with the query intent in mind. @role was written for classification. Weighting them the same ignores that.

The principle: before you build any search over annotations, decide which fields carry intent versus classification versus context. Weight accordingly. A field written to be searched deserves more weight than a field written to categorize.

4. Routing Is a First-Class Decision

Binary thinking — “use the LLM or don’t” — is too coarse. The right model has three outputs:

Reuse as-is — the match is strong, the function is simple, the example is copy-pasteable. No LLM needed.

Verify before using — a candidate exists but the match isn’t certain, or the function is moderately complex. The LLM checks whether it fits before wiring it.

Generate new code — no match, or the existing function is too complex to adapt. The LLM writes something new, then a contract is captured immediately.

This routing decision should be made by the retrieval system, not the LLM. The signals are already in the metadata: @complexity tells you the function’s risk level, the match score tells you confidence. A simple function with a high-confidence match is a safe reuse. A moderate-complexity function with a weak match should be verified. A complex function or zero-match situation goes straight to generation.

Every LLM call that could have been a reuse is wasted tokens. Every reuse that should have been verified is a bug waiting to surface. The routing signal prevents both.

The principle: encode a complexity field on every annotation. Let the retrieval system use it as a dispatch signal. Stop making the LLM decide whether to reuse — that’s a metadata question, not a reasoning question.

5. Graph Relationships Change How You Search

A flat list of functions with annotations lets you answer “which function does X?” A graph of those functions lets you answer “which function does X, and what happens if I change it?”

The graph edges come from two sources: static analysis (what actually calls what, extracted from AST) and annotations (what the author declares as callers and dependencies). Both matter. Static analysis is always accurate. Annotations carry intent — they name the architectural relationship, not just the runtime call.

When you add graph traversal to search, three new query types become possible:

Impact query: before editing normalizeOrder, find everything that calls it and depends on it. One graph walk, zero file reads.
Context assembly: for a given task, find the relevant functions and expand their graph neighbors into a single context block. The agent reads one assembled document instead of navigating five files.
Pipeline composition: find all steps in a named workflow, ordered by step number. The agent sees the full sequence before touching any of it.

None of these require the agent to explore. The graph was built offline. The agent queries it.

The principle: don’t treat functions as isolated units. Record their relationships — at minimum, what calls them and what they depend on. Make those relationships queryable by name. The difference between “find this function” and “find this function and its context” is the difference between a lookup and an understanding.

Practical Takeaway

These five principles aren’t tied to any specific tool. They’re the design constraints that separate an AI-assisted codebase from one that just happens to have an AI agent running against it.

Apply them in order of effort:

Field weighting — costs nothing; just restructure your annotation fields by importance.
Routing signals — add @complexity to every function; update the search to use it.
Static analysis index — one-time setup; run a parser against your codebase; store the output.
Annotations as query targets — rewrite documentation as intent fields; make them machine-readable.
Graph relationships — add @used-by and @depends-on; build a graph query layer.

Each step makes the next one more powerful. A codebase with all five is one where the agent can answer structural questions without reading files, find the right function in one query, route decisions without LLM reasoning, and understand impact before making changes. That’s the baseline a serious AI-assisted workflow needs.