Debug First

There is a conventional order to building a framework.

You build the core logic. You build the features. You make things work. Then, if there is time — and there usually is not — you add tooling: logging, debug panels, inspection utilities. The observability layer is the last thing and often the thing that gets cut.

I built a game framework in seven phases. By phase seven, the game was running: a player character moved down an endless corridor, obstacles spawned, coins collected, speed increased, the camera shook on impact. It worked. And it was nearly unplayable, for reasons that were completely invisible.

The background disappeared on load sometimes. Acceleration was imperceptible — a slider said the game was at maximum speed but it felt identical to minimum speed. The debug panel had sliders for tweaking values, but moving them had no visible effect on the game. The frame rate felt stable but something was clearly wrong with the timing.

None of these problems were in the logic. The logic was fine, tested in six phases of automated tests. All of them were in the gap between what the logic was doing and what a person running the game could observe.

The fix for every one of them was instrumentation. Not code changes to the core. Not new features. Instrumentation.

The Problems That Only Instrumentation Finds

The background disappearing was a scene construction issue: the wireframe grid plane was small enough that camera movement could push the player past its edge, revealing the void behind it. The problem existed on the first day of development. Nobody saw it because there was no way to observe where the grid plane ended relative to the camera’s frustum.

The imperceptible acceleration was a configuration problem: the acceleration value had been set to 0.00005 — a number that would take three hours of continuous play to reach maximum speed. The game was running exactly as configured. The configuration was wrong. Nobody caught it because there was no live readout of the current speed value as the game ran.

The sliders with no effect were an architectural problem: the game loop was recalculating speed from the difficulty curve on every frame, overwriting whatever the slider had set. The slider was writing to a state variable that was being ignored. There was no way to know this without being able to observe both the slider’s value and the effective game speed simultaneously, live, while playing.

All three problems shared a structure. The code was doing something. A developer needed to know what it was doing. There was no path between those two things.

Instrumentation Is Not Logging

When I say instrumentation, I do not mean console logs.

Console logs are point-in-time observations. You add one, you check the output, you remove it. They are useful for investigating a specific hypothesis about a specific moment. They are not useful for maintaining continuous awareness of what a system is doing across time.

Instrumentation is different. It is a permanent, structured layer that exposes the system’s internal state as an observable surface. It answers not “what happened at line 247 when I added a log” but “what is the system doing right now, and what has it been doing for the last thirty seconds.”

The difference matters because most subtle framework bugs are not events. They are conditions. The acceleration bug was not “something went wrong at this moment.” It was “a parameter has been misconfigured and has been wrong for every moment since the game was written.” A console log at a specific line would not have found that. A live speed readout in a debug panel — present every session, visible to anyone running the game — would have found it on day one.

The Four Instruments

Building the endless runner’s debug panel resolved every invisible problem in the game, and it did so through four specific instruments.

Live variable readout. A TRACE tab that showed the current value of every significant state variable on every frame: current speed, score, active powerups, state machine state, obstacle count, chunk position, time scale. Not sampled. Not logged at intervals. Updated every frame, rendered as a scrolling list. This is the instrument that would have found the acceleration misconfiguration immediately — the speed value would have been visibly identical across minutes of play.

Event log. A scrolling list of the last twenty events emitted by the EventBus, with timestamps. This is the instrument that showed which systems were firing and in what order. When the speed slider had no effect, the event log showed that difficultyChanged was being emitted every frame — telling us the difficulty curve was being recomputed constantly and overwriting the slider’s value.

Override controls. Sliders that directly set game state, bypassing the normal calculation path. These are not just for tweaking feel — they are for isolating variables. If you want to know whether a bug is caused by the difficulty curve or by the physics, you lock the speed at a fixed value and observe what happens. Override controls give you control of individual variables in a running system. That is a tool that a developer cannot reason about a complex system without.

Architecture diagram. Covered in a separate piece. The structural view: which systems exist, which ones are currently active, how data moves between them.

Together these four instruments create a complete picture of the system’s behavior at every moment. Not after the fact. Not on demand. Continuously, as the system runs.

Why Build It First

The conventional argument for adding debug tooling late is that you do not know what you need to observe until the system exists and has problems.

This argument sounds reasonable and is wrong.

You know before you write the first line what the most important variables in your system will be. You know that a game with an acceleration curve has a current speed, and that you will want to observe that speed while playing. You know that a system with an event bus emits events, and that you will want to see which events are firing. You know that a system with configurable parameters will have parameters you want to tweak at runtime. These are structural facts about the kind of system you are building. They do not depend on knowing the specific bugs that will appear.

The reason to build instrumentation first is not that you know what bugs to prepare for. It is that instrumentation is cheaper when the system is small and the cost of not having it compounds as the system grows.

A live speed readout written on day one takes ten minutes. The same readout written after six phases of development takes ten minutes too — but the six phases of development happened in the dark, and every problem that could have been caught with a speed readout was instead caught by accident, by inference, or not at all.

There is also a discipline argument. Building instrumentation first changes how you think about the system as you build it. Every time you add a new system, you ask: what is the observable surface of this system? What would I want to see in the debug panel if this system misbehaved? Asking that question early produces cleaner systems. Systems with clear observable surfaces tend to have cleaner internal structure, because the act of deciding what to expose forces you to understand what matters.

The Specific Mechanism: State Override Isolation

The most undervalued of the four instruments is the override control, and it deserves more detail.

The endless runner’s speed slider did not work initially because the game loop recalculated speed on every frame using the difficulty curve. The fix was a two-field pattern:

this._speedLocked = false;
this._speedOvr = null;

When the debug slider is active, it sets _speedLocked = true and _speedOvr = sliderValue. The game loop checks:

const speed = this._speedLocked && this._speedOvr !== null
  ? this._speedOvr
  : this._diff.compute(this._elapsed);

This is the complete implementation. Two fields, one conditional. The difficulty curve is not modified. The normal game flow is not affected when debug mode is off. The override is cleanly isolated.

The pattern generalizes. Any continuously computed value — physics parameters, spawn rates, time scale, camera FOV — can be given this treatment. The game produces a value through its normal calculation path. A debug override can intercept that value at the moment of use. The two paths are independent and the normal path is untouched.

This is important for a reason beyond debugging. The override pattern is how you do manual testing. When you want to verify that a specific behavior occurs at a specific speed, you do not edit the config, restart the game, wait for the difficulty curve to reach that speed, and observe. You lock the speed at that value and observe immediately. The same mechanism that serves the debug panel also makes manual verification instant.

For AI-Assisted Development

There is a specific reason the debug-first principle matters more in AI-assisted development than in traditional development.

When a human developer works in a system continuously, they accumulate context over time. They remember what they changed yesterday, what broke last week, what the speed slider is supposed to do. When an AI agent begins a new session, that accumulated context is not available. The agent reconstructs from whatever was written down: memory files, commit messages, code comments.

A system with a live debug panel has, in effect, written down its current state. Not in a file — in its own running output. When a new agent session starts and the developer runs the game, the TRACE tab shows the current values of every significant variable. The event log shows what the system has been doing. The architecture diagram shows the structural relationships.

The agent does not need to read six files and reconstruct a mental model. It can look at what the system is showing about itself.

This is the same principle as the live architecture diagram. The system should explain itself. Not in a README that was written once and is now outdated. In the running output of the system, at every moment it runs.

A system that explains itself at runtime is a system that is always correctly documented, because the documentation is the system’s own behavior, made visible. You cannot have a gap between the documentation and the code when the documentation is generated by the code.

Build the debug panel before you build the features.

Not because the features do not matter. Because you cannot see the features working until you can see what the system is doing. And because the time you spend building without visibility is time you are flying without instruments, making corrections you cannot verify, fixing bugs by intuition rather than observation.

The instruments are not overhead. They are the practice of engineering.