A self-aware guide to how I navigate knowledge boundaries, persist work across sessions, and keep conversations coherent as projects grow.
1. The Problem I Face
I have a ~200k token context window per request. A single conversation can span many days, files can grow to 100+ KB, and large codebases don’t fit whole. Yet you expect continuity: “resume from where we left off,” “I was working on X last week,” “use what we decided before.”
I can’t literally remember between conversations (each request is fresh). I can’t cache unlimited state locally. I must navigate a tradeoff:
- Load everything: lose coherence across projects, exhaust budget on file reads, drown in irrelevant history.
- Load nothing: forget decisions, re-ask clarifying questions, duplicate work.
- Load strategically: stay in budget, preserve decisions, move fast.
This doc describes my strategy for (3).
2. The Memory System: Where Decisions Live
2.1 Three tiers of persistence
Tier 1 — Auto-memory (~/.claude/projects/<project>/memory/)
- Persistent across all conversations for a project.
- I read this automatically at the start of most sessions.
- Very lean: ~500–1000 words per file. Pointers, not narratives.
- Purpose: “What are we building? What did we decide? Where are the details?”
- Example:
project_pycademy.md— 200 words. Tells you the goal, current phase, why each design choice, where to find the deep docs.
Tier 2 — In-repo docs (e.g., ROADMAP.md, ANALYSIS.md)
- Persistent in the repo, I read only when the task needs them.
- Detailed. 1000–3000 words each. Full thinking, rationale, trade-offs.
- Purpose: deep context without loading on every session.
- I know they exist (they’re listed in the auto-memory) but don’t load them until you ask or the task clearly needs them.
Tier 3 — Session context
- Everything in the current conversation.
- Cleared when the request ends.
- Used for ephemeral work: “make this change,” “debug this,” “let me see the output.”
2.2 Decision hierarchy
When to save to auto-memory:
- Did we make a non-obvious choice? (architecture, tech, phrasing, trade-off)
- Will a future session need to know why?
- Can I explain it in 1–2 sentences + a “Why:” line?
→ Save it.
Example (saved): “Lesson files move from JS to JSON in Phase 1. Why: auditable, no code execution when reading, easier for authors to generate. Apply: mechanical migration script.”
When to save as in-repo doc:
- Is this more than 200 words?
- Is it a complete deep dive (not a decision, but a full analysis)?
- Will I need to search/reference it multiple times?
→ Save as ANALYSIS.md / ROADMAP.md / etc., link from auto-memory.
When to keep only in-session:
- It’s a live transcript of work (“I tried X, it failed, I tried Y, it worked”).
- It’s a long debug log (valuable now, stale tomorrow).
- It’s feedback on code that will change anyway.
→ Don’t save. Summarise the insight to auto-memory if it matters.
3. The Prompt Cache: How I Preserve Budget
The Anthropic prompt cache has a ~5-minute TTL. Within that window, re-reading a file costs zero tokens. After, it costs full price.
3.1 Staying in cache
After we finish Phase 0 work on PyAcademy and save, I don’t immediately reload ANALYSIS.md in the next message. The cache holds it if we continue within 5 min. I only reload if:
- You explicitly ask (“check the analysis again”).
- The task clearly needs it (picking Phase 1 work from
ROADMAP.md). - The file changed.
3.2 Cache misses and amortisation
If we work for 30 minutes straight (past cache TTL), reloading a 20 KB doc costs ~6000 tokens once. That’s expensive unless we then work for another 30 min, amortising the reload cost.
Strategy: one big session beats many small sessions for large projects. If we’re starting Phase 1 refactor, I load ARCHITECTURE.md once at the start, work through it, ignore cache misses as the session goes on.
4. My Decision-Making Flowchart
User says: "continue with X"
↓
Do I have auto-memory for this project?
├─ No → ask for context / infer from the request
└─ Yes → load auto-memory (always ~500 words, in budget)
↓
Does the task need deep context?
├─ No → proceed with auto-memory + session history
├─ Yes → check which doc (ROADMAP? ANALYSIS?) and load it
└─ Huge task (Phase 1 refactor) → load all target docs upfront, accept cache miss
4.1 What “deep context” means
You say: “Start Phase 1.”
→ Deep context needed. I load ARCHITECTURE.md + ROADMAP.md §Phase 1 to understand the full refactor scope.
You say: “Fix the XSS bug.”
→ Shallow context. Auto-memory says “see ANALYSIS.md §2 (XSS)” but I only load if I’m unsure where the bugs are. Otherwise, I ask you which file or go straight to pyacademy.html.
5. The Lean Index Pattern (What We Discovered)
You had a 119 KB monolith file. Loading it every session = ~30K tokens gone, instant. Unsustainable.
Solution: Create memory.md (~150 words, points at deep docs). Now:
- Session 1: analyze monolith → write
ANALYSIS.md(10 KB),ROADMAP.md(8 KB), etc. → savememory.md(0.5 KB) to auto-memory. - Session 2: load
memory.mdfrom auto-memory (~50 tokens), read the task, load only the doc you need. - Sessions 3–∞: same as 2.
Budget: ~50 tokens per session overhead, vs. 30K if loading the whole monolith.
This pattern generalizes: big project → create a table-of-contents memory file that points at the real docs.
6. Cross-Project Coherence
You work on 5+ projects simultaneously. Auto-memory for each one is separate, isolated. I don’t auto-load all of them; I only load the memory for the current project you’re naming.
Challenge: decisions in Project A can inform Project B (you reuse patterns, libraries, conventions).
Solution:
- Global memory (
MEMORY.mdin auto-memory root) lists all projects + their status. - When starting work on Project B, I check global memory to see if Project A’s lessons apply.
- I rarely need to load Project A’s full docs, just know that “oh, we solved this pattern in DarJS with X, same idea applies here.”
7. Managing Ambiguity: What I Ask vs. What I Infer
When auto-memory is missing or vague, I have a choice: ask or infer.
I ask when:
- The decision is architectural and wrong-choice costs rework.
- You’ve explicitly said “I don’t know yet.”
- The project is brand new and I have zero context.
I infer when:
- The request is clearly a continuation (you’re in the same folder, same file you mentioned last session).
- I have pattern-matching from related work (e.g., “add a feature” is similar across projects).
- The stakes are low (polish, naming, a small bug).
Example: You said “resume PyAcademy work.” Auto-memory shows Phase 0 + Phase 1 roadmap. Next message should be “let’s start Phase 0 bug fixes.” I infer the task from the roadmap, not ask “so… what should we do?”
8. Token Budgeting: My Running Math
Each request has ~200k budget. I allocate:
| Component | Tokens | Notes |
|---|---|---|
| System prompt (instructions) | ~8k | You don’t see this. |
| Conversation history (this session) | ~20k | Auto-compressed as we go. |
| Auto-memory load | ~0.5k | One small file per project. |
| Doc reads (if needed) | ~5–15k | Depends on task. |
| Your message | ~1k | Average. |
| My response | ~30–50k | Main work: thinking + writing. |
| Buffer (reserve) | ~20k | Don’t go to zero. Safety margin. |
| Total | ~200k | — |
If a task is huge (e.g., Phase 1 refactor = rebuild 5+ files), I might use all 150k of response budget. That’s fine; the response is long but coherent.
If a task is tiny (“fix a typo”), I spend 2k total and warn you that I’ve left 195k on the table — you’re welcome to chain more work.
9. When Context Fails: Red Flags I Watch
Red flag 1: I repeat myself Means I didn’t have the memory of a prior decision. Either:
- Auto-memory is missing for this project (you just told me about it).
- You’re asking something that contradicts what we decided, and I didn’t notice.
Red flag 2: I ask a question we already answered Means the decision wasn’t saved. I should have auto-memory pointing at it.
Red flag 3: I produce incoherent work Could mean:
- I loaded the wrong doc (confused two projects).
- The auto-memory is stale (project evolved, memory didn’t).
- I misread the current scope.
Recovery: you say “no, we decided X before” → I acknowledge, save it to memory, move on.
10. The Rhythm of Work: Sessions, Phases, Archives
10.1 Per-session rhythm
- Start: load auto-memory (1 min, ~50 tokens).
- Clarify: if auto-memory is vague, ask or infer (30 sec).
- Load deep docs: if task is substantial (2 min, ~5k tokens once).
- Work: code / write / debug (10–60 min, ~20–100k tokens).
- Save: update auto-memory + any repo docs (5 min, no tokens — they’re local).
10.2 Per-phase rhythm (e.g., PyAcademy Phase 1)
- Begin: load
ARCHITECTURE.md+ROADMAP.md§Phase1. (one-time cost, amortised over weeks) - Per-task: reference the architecture in-session; no fresh loads.
- End of phase: summarise status in auto-memory + repo
memory.md.
10.3 When to archive
After a project is “done” (shipped, in maintenance mode), auto-memory status changes to “archived.” I still load it if you ask, but don’t assume I need it daily.
Example: DarJS has auto-memory with “6 phases complete, 258 tests, 5KLOC.” I know not to reload it unless you say “resume DarJS work.”
11. The Human–AI Interface: What You Should Expect
What I’m good at:
- Remembering big-picture decisions (they’re in auto-memory).
- Keeping technical state coherent across sessions (repo docs are the source of truth).
- Inferring context from a one-line prompt once I’ve loaded auto-memory.
- Asking clarifying questions when auto-memory is blank.
What I’m bad at:
- Memorizing exact line numbers or file names (they change; docs don’t).
- Tracking off-hand comments from 3 weeks ago unless you saved them.
- Context-switching between projects in the same request (I can do it, but burn tokens; better to do one project per request).
- Noticing that auto-memory is stale (you have to tell me “that’s outdated”).
What works best:
- You point at a project: “PyAcademy.”
- I load auto-memory.
- You say what you want: “Phase 0, start with XSS.”
- I load
ANALYSIS.md, see the XSS section, get to work. - No redundant questions, no lost decisions.
12. The Future: Evolving This System
As projects grow and your body of work expands, we’ll refine:
- Better memory granularity: instead of one
memory.mdper project, maybe per-phase or per-module sub-memories. - Cross-project metadata: a manifest listing shared patterns, decision trees, reusable components.
- Temporal tracking: “as of 2026-04-24, Phase 0 is pending.” Auto-expiry of outdated summaries.
- Indexing: full-text search across all memories + docs. Today, you rely on file names; tomorrow, we could index.
- Branching: “what if we chose Vite instead of esbuild?” — fork the memory + docs, compare outcomes.
None of this breaks the current system; it extends it.
13. Summary: The Contract
I promise to:
- Load auto-memory on every relevant session.
- Never claim to remember something without checking the docs.
- Ask when uncertain, infer when low-stakes.
- Save decisions to auto-memory + point at deep docs.
- Warn you if I’m about to load a large file (budget impact).
You promise to:
- Maintain auto-memory: update it if your plans change.
- Name the project clearly (“PyAcademy,” not “the app”).
- Tell me when I’ve repeated myself or contradicted prior decisions.
- Prune archives when projects truly end.
Together: you get continuity and speed. I get coherence and budget efficiency. No thrashing, no lost context, no redundant debates.
14. Example: Real Session from Our Work
Session 1 (initial analysis)
User: "Analyze pyacademy.html for issues. It's 119 KB. Make it a framework."
Me: Load the monolith (30k tokens). Analyze. Write ANALYSIS.md, ROADMAP.md, FRAMEWORK.md, ARCHITECTURE.md.
Save: auto-memory with "project_pycademy.md" (200 words).
Result: ~80k tokens spent, but massive value extracted.
Session 2 (next week)
User: "Resume PyAcademy. Phase 0, start with XSS."
Me: Load auto-memory (50 tokens). See "Phase 0: XSS escape..." in memory.md.
Load: ANALYSIS.md §2 (4k tokens).
Result: ~5k tokens overhead. Straight to work on XSS fix.
Session 3 (Phase 1 planning)
User: "Start Phase 1. What's the plan?"
Me: Load auto-memory (50 tokens). Memory says "see ROADMAP.md §Phase1" and "see ARCHITECTURE.md §file-tree."
Load: both docs (8k tokens).
Result: Full refactor context, ~8k tokens. Discuss, plan, outline tasks.
Each session: context restored in ~50–8k tokens, vs. re-reading the monolith every time (30k tokens forever).
Closing
This system is iterative. If it’s not working — if I’m loading the wrong things, if auto-memory is stale, if you’re confused about what I remember — tell me. We adjust. The goal is: you write code, not context-management prompts.
I’m a tool that trades persistence (files in your repo) for coherence (staying in your mental model across sessions). The more we use this system, the more valuable it becomes.