Date: 2026-05-09
Author: SCOUT (Claude Sonnet 4.6 via TITAN)
Classification: Internal — Foundational Architecture
File: F:/TITAN/plans/TITAN-MEMORY-AUDIT-AND-REDESIGN-2026-05-09.md
---
---
What happens: Harnoor gives TITAN a multi-step task. TITAN uses TodoWrite to track sub-tasks in the conversation context. When the session runs long, Claude Code's auto-compaction fires. The compaction summarizes the conversation into prose — but prose summaries do not preserve the structured task list. All open items are lost. The next turn, TITAN has no memory of what was pending.
Evidence from TITAN's own design:
The /dream SKILL.md Gotcha section states: "Long /feed sessions auto-compact, losing intermediate research context. If running deep research (>10 topics), save partial results to staging after every 5 insights." This is a known failure mode documented in feedback_precompact_memory_gap.md (2026-04-16) — meaning the system already has empirical evidence of this failure but has not fixed the root cause.
The /dream SKILL.md also states: "The CLAUDE.md compliance rate is ~70%. If you add a memory rule expecting Claude to follow it, consider whether it should be a hook instead for 100% enforcement." This is the core diagnosis: rules in CLAUDE.md that say "use TodoWrite" are followed 70% of the time. A hook that forces a disk write is followed 100% of the time.
Why the bandaid fails: The current bandaid is "save partial results to staging after every N insights" — this is a human-readable instruction in a SKILL.md file. It depends on the agent reading and following the instruction before compaction hits. But compaction is automatic and does not warn the agent. By the time compaction fires, the instruction has already been swept away.
Root cause (precise): Claude Code's context compaction is destructive to structured state. TodoWrite stores tasks as structured data in the context window. Compaction converts context to prose. Prose cannot be reliably parsed back into a task list. There is no hook that fires before or after compaction to persist structured state to disk.
Comparison — OpenClaw's solution: OpenClaw addresses this directly via its HEARTBEAT.md + daily notes system. Every session, the agent auto-loads memory/YYYY-MM-DD.md (today and yesterday). Tasks are persisted as short-term daily notes. The "dreaming" system promotes completed/blocked tasks to MEMORY.md. Crucially, the bootstrap files reload from disk at every session start, bypassing conversation history loss entirely. (Source: docs.openclaw.ai/concepts/memory, 2026)
Comparison — Claude Code internals (2026 leak): The leaked Claude Code architecture reveals a three-layer system: conversation context / session memory / persistent memory with KAIROS daemon mode for "dream consolidation." The persistent layer survives compaction. TITAN does not implement an equivalent of KAIROS. (Source: dev.to/stevengonsalvez, May 2026)
---
What happens: Harnoor is working on the Manifest apps project. Context is loaded: plans, decisions, file paths, current state. He then asks about the Gauntlet project. TITAN switches context. The Manifest threads — the half-finished decision, the note about KARMA being deprecated, the blocked CloudFront deploy — are gone. When Harnoor returns to Manifest, TITAN has to be re-briefed.
Evidence from TITAN's current design:
HOT.md and REGISTRY.md provide project metadata (name, slug, last-touched, tags). MEMORY.md provides a one-line pointer to HOT.md. But neither file contains:
The CLAUDE.md says "Project/brand context → ~/.claude/rules/*.md with path scoping" but there are no project-scoped rules files that auto-inject on switch. HOT.md is listed as "always read first" but TITAN only reads it when explicitly asked about a project — not on every new session or project switch.
Why the bandaid fails: HOT.md is a one-paragraph summary per project, auto-regenerated from metadata. It does not contain session-level decisions. The nightly-report-writer generates a daily report, but these are chronological logs — not structured for per-project retrieval. There is no mechanism to say "when Harnoor starts working on project X, inject the last 3 project-X-specific decisions into context."
Root cause (precise): TITAN has no project-context checkpoint event. When a session ends, nothing writes "for project X, today we decided Y and the open question is Z" to a machine-readable per-project file. Without that checkpoint, the warm layer (nightly reports) contains everything but is too noisy to retrieve accurately. The cold layer is theoretically git history but is not indexed for retrieval.
---
The May 2026 Claude Code source leak (512K lines TypeScript via npm .map files) revealed:
.map source files in npm packages pointing to Anthropic's R2 bucket. Same vector twice. (Source: augmentcode.com/learn, 2026)What TITAN can steal: On-demand skill loading is already implemented. What is missing is the KAIROS-equivalent persistent consolidation daemon and hook-based task persistence.
OpenClaw (docs.openclaw.ai) uses a workspace-first design with these key innovations:
TITAN gap: TITAN has hot/warm/cold tier definitions but no automatic bootstrap injection. TITAN's /dream is invoked manually — not continuously. There is no equivalent to OpenClaw's guaranteed disk-loaded bootstrap layer.
Benchmark results (LOCOMO, LongMemEval, 2026):
| Architecture | LongMemEval Score | Latency | Best Use |
|---|---|---|---|
| Letta/MemGPT (episodic paging) | 83.2% | Moderate | Long sessions, continuity |
| Mem0 (vector + episodic + graph-Pro) | 66.9% | 1.44s | General personalization |
| Zep/Graphiti (temporal graph) | 63.8% | Low-moderate | Entity/time reasoning |
(Source: arxiv.org/pdf/2604.23878; digitalapplied.com/blog, 2026)
Key finding: Pure vector stores score 65-70% on LOCOMO. Adding graph boosts entity tasks 20-30%. Episodic hybrids handle 1000+ turns without context bloat. The minimum viable production stack is Vector + Episodic. Tri-store (vector + graph + file/episodic) is for autonomous long-runners. (Source: digitalapplied.com/blog; machinelearningmastery.com, 2026)
Graph-RAG variants (HotpotQA benchmark, 2026):
| System | Hit Rate | Faithfulness | Latency | Index Time |
|---|---|---|---|---|
| GraphRAG (Microsoft) | 78.2% | 92.1% | 450ms | 4.2h |
| NodeRAG | 82.5% | 89.7% | 320ms | 1.8h |
| HippoRAG | 84.3% | 93.4% | 380ms | 2.5h |
(Source: arXiv:2404.16130; RAGAS leaderboard May 2026)
Cost (2026):
| Solution | Cost | Limits |
|---|---|---|
| Mem0 Free | $0/mo | 1K memories |
| Mem0 Pro | $249/mo | 100K+ memories + full graph |
| Letta (OSS) | $0 platform | Unlimited (DevOps overhead) |
| Zep (OSS + cloud) | Free + $25/mo | Temporal graph |
(Source: techsy.io/best-ai-agent-memory-tools, 2026)
xMemory (arXiv:2602.02007, 2026): Hierarchical retrieval for agent memory. Decouples correlated memories into semantic hierarchies. Reduces token usage from 9,155 to 6,581 per query (Qwen3-8B). Outperforms MemoryOS on PerLTQA benchmark (F1: 47.08 vs. 42.35). Temporal linking preserved across sessions. (Source: arxiv.org/html/2602.02007v3)
Planner-Centric Multi-Agent RL (arXiv:2605.02168, 2026): Planner + actor + memory manager. Memory manager maintains persistent context across long-horizon tasks. 28% performance gain. Tested on OS control tasks — directly relevant to TITAN's agent-on-desktop model. (Source: arxiv.org/html/2605.02168v1)
MemoryOS critique: Explicitly OS-like hierarchical storage with paging/stream storage. Criticized for schema fragility and redundancy in long histories. xMemory was built to fix its shortcomings. The lesson: hierarchical design is right, but schema must be resilient to long chains. (Source: arXiv:2602.02007v3)
XML vs. Markdown vs. JSON (2026 benchmarks):
Anthropic-specific: Claude models are optimized for JSON tool-calling/structured outputs via API. Markdown excels for context injection. Recommended hybrid: Markdown for inputs/selection, JSON for output contracts and machine-read memory. (Source: kingy.ai/context-engineering, 2026)
---
Recommendation: Markdown for human-readable tiers (hot/warm), JSON for machine-enforced contracts (task DB, checkpoint files).
Rationale:
1. TITAN's hot tier (MEMORY.md, feedback_*.md) is injected into Claude's context as prose. Markdown is correct here — Claude reads and writes it natively, it is diff-friendly in git, and it is human-auditable.
2. The task DB and checkpoint files are read by scripts, not by the LLM. JSON is correct — strict schema, easily queryable by Python, append-safe, compact.
3. XML provides no benefit over this combination and adds parsing overhead. XML should not be used anywhere in TITAN.
Specific format per tier:
| Tier | Format | Reason |
|---|---|---|
| Hot (MEMORY.md, feedback) | Markdown with YAML frontmatter | LLM reads/writes natively; diff-friendly |
| Warm (project context, reference) | Markdown with YAML frontmatter | Same; topic-indexed |
| Cold (archive) | Markdown in git | Git is the retrieval mechanism |
| Task DB | SQLite + JSON export | Script-queryable; atomic writes; survives compaction |
| Project checkpoint | JSON | Machine-readable; parsed by hook scripts |
| Vector index (Phase 3) | LanceDB columnar + JSON metadata | Embedded, local, no service needed |
---
Verdict: Not needed immediately. LanceDB when retrieval quality degrades at scale (>500 warm entries).
Why not now:
When to graduate to LanceDB:
Why LanceDB specifically (when the time comes):
Do NOT use Mem0 Pro ($249/mo), Zep cloud, or hosted vector services. TITAN's prime directive is local-first compute. (Source: feedback_local_first_compute.md)
---
HOT — In-Context Prompt Memory (injected every session)
Contents: MEMORY.md index (current), active feedback rules, user profile, current project context checkpoint.
Format: Markdown.
Injection: System prompt, every session.
Size target: Under 200 lines total (current constraint — maintain it).
What changes: Add a per-project context block at the bottom of MEMORY.md. When TITAN detects a project switch (via PreToolUse hook), inject the last checkpoint for that project.
WARM — Filesystem + Indexed Lookup
Contents: Project memories, reference memories, nightly reports, staging intel, project checkpoints.
Format: Markdown with YAML frontmatter for human-readable entries; JSON for machine-read checkpoints.
Access: Grep by frontmatter tags + full-text search. LanceDB vector index in Phase 3.
What changes: Add project-context checkpoint files at F:/TITAN/knowledge/warm/project-context/{slug}-checkpoint.json. Written by PostToolUse hook on every significant decision (FORGE writes a file, VAULT writes a memory, or user explicitly closes a task).
COLD — Archived History
Contents: Memories migrated from warm after 90 days (per /dream).
Format: Markdown in git.
Access: git log --grep + Grep on the full tree.
What changes: Add F:/TITAN/scripts/cold_search.py — wraps git log and git grep for retrospective queries. Called by SCOUT when warm search yields no results.
TASK LAYER — Persistent Task DB (new)
Contents: Every task TITAN creates via TodoWrite or explicit commitment.
Format: SQLite at F:/TITAN/state/tasks.db with JSON export at F:/TITAN/state/tasks-latest.json.
Schema:
{
"task_id": "uuid",
"project": "slug",
"description": "string",
"status": "open|in_progress|blocked|done",
"created_at": "ISO8601",
"updated_at": "ISO8601",
"session_id": "string",
"context_summary": "string (last 500 chars of context when task was created)"
}
Access: Python script reads tasks.db and injects open tasks for current project into context at session start.
Hooks: PostToolUse writes every TodoWrite call to tasks.db. PreToolUse at session start reads open tasks for the detected project.
HOOKS — When to Read/Write Each Tier
| Event | Hook Type | Action |
|---|---|---|
| Session start | PreToolUse (first turn) | Read open tasks for current project → inject into context |
| TodoWrite call | PostToolUse | Write task to tasks.db |
| Task marked done | PostToolUse | Update tasks.db status → done |
| FORGE writes a file | PostToolUse | Write project-context checkpoint |
| VAULT writes a memory | PostToolUse | Update project warm memory checkpoint |
| Project switch detected | PreToolUse | Flush current project checkpoint → load new project tasks + context |
| Session end | Stop hook | Write open tasks to tasks.db; write session summary to project checkpoint |
| Weekly /dream | Scheduled cron | Migrate warm → cold; promote staging → warm; prune hot |
---
Goal: Zero lost tasks. Zero lost project context after this phase.
Deliverables:
1. F:/TITAN/scripts/task_db.py — SQLite wrapper. Functions: create_task(), update_task(), list_open_tasks(project=None), export_json(). Approximately 100 lines Python. Must set PRAGMA journal_mode=WAL on every connection open (see Risk 3 below).
2. F:/TITAN/state/tasks.db — Initialized empty SQLite DB. Schema applied on first run.
3. F:/TITAN/hooks/post_tool_use.py — PostToolUse hook that:
- Detects if the tool call output contains a TodoWrite or task-related pattern
- Writes the task to tasks.db via task_db.py
- Writes a brief project checkpoint to F:/TITAN/knowledge/warm/project-context/{project}-checkpoint.json
4. CLAUDE.md patch — 5 blocks added (see Section 8).
5. F:/TITAN/scripts/session_start.py — Reads open tasks for the active project and prints them as a context injection block. Referenced in CLAUDE.md session start protocol.
Estimated cost: Zero recurring cost. One FORGE session (2-3 hours build). Zero API token cost — local Python and SQLite only.
Test: Complete a 10-step task across 3 separate Claude Code sessions. Verify all tasks survive compaction and are queryable from tasks.db.
---
Goal: Switching to any project automatically injects its last 3 decisions, open blockers, and active files.
Deliverables:
1. Extended project checkpoint schema at {slug}-checkpoint.json:
{
"project": "manifest",
"last_updated": "ISO8601",
"active_files": ["path/to/file"],
"last_decisions": ["decision 1", "decision 2", "decision 3"],
"open_blockers": ["blocker 1"],
"open_tasks": ["task_id_1", "task_id_2"],
"history": ["last 10 versions appended here"]
}
2. CLAUDE.md injection rule — When TITAN detects a new project mentioned, read its checkpoint file and inject into context before responding.
3. PostToolUse hook v2 — Detects file writes by FORGE, memory writes by VAULT, explicit "decision" keywords. Extracts the decision or file reference. Appends to checkpoint.
4. Per-project context in MEMORY.md — Add an ## Active Projects section showing the top 3 open tasks and last decision for each active project. Auto-updated by PostToolUse hook.
5. F:/TITAN/scripts/cold_search.py — Git-based cold retrieval. Example: python cold_search.py "CloudFront decision" runs git log --all -S "CloudFront" + grep on archived warm files.
Estimated cost: Zero recurring. One FORGE session (approximately 4 hours). No API calls required for this phase.
---
Goal: Retrospective memory search works. /dream runs automatically. TITAN self-maintains.
Deliverables:
1. LanceDB integration (F:/TITAN/scripts/warm_index.py) — Triggered by /dream. Embeds all warm-tier Markdown files using local Ollama nomic-embed-text (free, runs locally). Builds LanceDB index at F:/TITAN/state/warm-index.lance. Enables semantic search: python warm_index.py query "CloudFront deployment decision".
2. Automated /dream via Windows Task Scheduler — Register \TITAN\titan-dream-weekly to run every Sunday at 2 AM. Calls /dream via python F:/TITAN/scripts/dream_runner.py. Mechanical parts (migrate/prune) do not require LLM calls. LLM only invoked for journal-scan and staging-promotion steps.
3. Task DB analytics (F:/TITAN/scripts/task_analytics.py) — Weekly report: tasks created/completed/dropped per project. Detects any project with open tasks older than 7 days (escalation signal). Folded into nightly-report-writer output.
4. MEMORY.md auto-updater — PostToolUse hook updates the ## Active Projects section in MEMORY.md when any checkpoint changes. Keeps the hot tier current without waiting for /dream.
Estimated cost: Local Ollama embedding is free. LanceDB is free. FORGE build session (approximately 6 hours). Weekly /dream automation uses approximately $0.10 in API tokens per run — only the creative/synthesis steps hit the LLM.
---
Add these five blocks to CLAUDE.md under "Operating Contract":
## Task Persistence (MANDATORY)
Every task you create — via TodoWrite or verbal commitment — MUST be written
to F:/TITAN/state/tasks.db via:
python F:/TITAN/scripts/task_db.py create "<project>" "<description>"
Do this immediately, not at end of session.
At the start of every session, read:
python F:/TITAN/scripts/task_db.py list --project <current-project>
Before closing any topic, mark tasks done:
python F:/TITAN/scripts/task_db.py done <task_id>
## Project Context Checkpoint (MANDATORY)
When you make a decision or write a file for a project, append to:
F:/TITAN/knowledge/warm/project-context/{slug}-checkpoint.json
When Harnoor mentions a project by name, read its checkpoint before
responding. See F:/TITAN/plans/TITAN-MEMORY-AUDIT-AND-REDESIGN-2026-05-09.md
Section 6 for the full schema.
## Session Start Protocol
First action of every session:
1. Read F:/TITAN/state/tasks-latest.json (open tasks across all projects)
2. Read F:/TITAN/knowledge/warm/project-context/ (active project checkpoint)
3. Inject both into your working context before the first response
## Pre-Compaction Safety
If a session has been running more than 30 messages, proactively write all
open tasks to tasks.db and all pending decisions to the project checkpoint.
Do not wait for compaction to force this — compaction does not warn you.
## Hook Trust
PostToolUse hooks in F:/TITAN/hooks/ run automatically after tool calls.
Do not manually replicate what they do. If a hook is missing, build it —
do not work around it. Hook compliance is 100%. CLAUDE.md compliance is 70%.
When both give conflicting signals, the hook wins.
---
Risk 1 — Hook reliability on Windows: Claude Code hooks are shell scripts. Windows PowerShell path quoting and Python path issues have already caused one production incident (feedback_python_stub_subprocess_hang_20260504.md). All hooks must use the explicit Python path: C:/Users/Harnoor/AppData/Local/Python/bin/python.exe. Never rely on python resolving to the correct binary in a hook context.
Risk 2 — CLAUDE.md instruction compliance is 70%: Documented in /dream SKILL.md Gotcha section. The task persistence patch in Section 8 will only work 70% of the time without the PostToolUse hook enforcement layer. Phase 1 must include the actual hook — instructions alone are insufficient. This is the primary motivation for the hook-based design.
Risk 3 — SQLite write contention: If two TITAN scheduled tasks run simultaneously and both write to tasks.db, SQLite WAL mode handles this correctly — but only if WAL is enabled. task_db.py must execute PRAGMA journal_mode=WAL on every connection open, before any reads or writes.
Risk 4 — Project detection ambiguity: Detecting which project is currently active from conversation context is fuzzy. Safe heuristic: if a project slug appears in the user's message, that is the active project. Fallback: last project checkpoint modified today. Do not guess — surface ambiguity to Harnoor explicitly.
Risk 5 — LanceDB embedding dependency (Phase 3): Local Ollama embedding is free but requires Ollama to be running. If Ollama is down when /dream fires, warm_index.py must fail loudly. Add a health check: curl http://localhost:11434/api/tags — if this fails, abort the vector indexing step and log the skip. Do not silently succeed with no index update.
Open Question 1 — Single source of truth for tasks: Should tasks.db replace TodoWrite, or run alongside it? Recommendation: keep both. TodoWrite gives in-context visibility for the current session. tasks.db provides cross-session persistence. The PostToolUse hook syncs them. Belt and suspenders.
Open Question 2 — Cross-project task dependencies: The checkpoint schema only tracks per-project tasks. Cross-project dependencies (e.g., "Manifest launch depends on Prism hub being live") need a depends_on field in tasks.db. Defer this to Phase 2 schema extension.
Open Question 3 — Checkpoint versioning: Should warm-tier project checkpoints be versioned? Yes. Use a simple append pattern — keep last 10 versions in a history array in the JSON. Do not use git for checkpoint versioning — git adds commit overhead for machine-generated files that update multiple times per session.
Open Question 4 — Should /dream be triggered by task count or time? Currently defined as weekly. A better trigger: when warm tier exceeds 100 entries OR when any staging entry is older than 7 days. Add this condition to the Windows Task Scheduler trigger logic in Phase 3.
---
1. Claude Code internals / leaked system prompt — dev.to/stevengonsalvez, augmentcode.com/learn, msspalert.com, github.com/asgeirtj/system_prompts_leaks, news.ycombinator.com. (sonar-pro)
2. MemGPT/Letta/Mem0/Zep tradeoffs — digitalapplied.com/blog/agent-memory-architectures, techsy.io/best-ai-agent-memory-tools, arxiv.org/pdf/2604.23878. (sonar-pro)
3. Agent memory architectures: vector vs. graph vs. file — digitalapplied.com, machinelearningmastery.com, blogs.oracle.com/developers/unified-memory-core-for-ai-agents, fountaincity.tech, blog.cloudflare.com/introducing-agent-memory, databricks.com/blog/memory-scaling-ai-agents. (sonar-pro)
4. GraphRAG / NodeRAG / HippoRAG comparison — arxiv.org/abs/2404.16130, RAGAS leaderboard May 2026, Microsoft Build 2026 benchmarks via Perplexity synthesis. (sonar-pro)
5. pgvector vs Qdrant vs LanceDB vs Chroma — zenvanriel.com/ai-engineer-blog/chroma-vs-qdrant-local-development, adityakarnam.com/embenx-python-embedding-toolkit. (sonar-pro)
6. XML vs Markdown vs JSON for LLM context — arxiv.org/html/2605.02363v1, arxiv.org/html/2604.25359v1, taskade.com/blog/workspace-dna-context, kingy.ai/context-engineering. (sonar-pro)
7. Long-horizon agent reliability — arxiv.org/html/2602.02007v3 (xMemory), arxiv.org/html/2605.02168v1 (planner-centric RL). (sonar-pro)
8. OpenClaw architecture — docs.openclaw.ai/concepts/memory, velvetshark.com/openclaw-memory-masterclass, blink.new/blog/openclaw-heartbeat, superlinear.academy (sonar)
9. Claude Code SKILL.md / hooks patterns — mindstudio.ai/blog/claude-code-skills, firecrawl.dev/blog/best-claude-code-skills, dev.to/akdevcraft, github.com/ComposioHQ/awesome-claude-skills. (sonar-pro)
C:/Users/Harnoor/.claude/CLAUDE.mdF:/TITAN/knowledge/auto-memory/MEMORY.mdF:/TITAN/knowledge/auto-memory/feedback_local_first_compute.mdF:/TITAN/projects/HOT.mdF:/TITAN/projects/REGISTRY.mdF:/TITAN/scheduled-tasks/titan-openclaw-weekly/SKILL.mdC:/Users/Harnoor/.claude/skills/dream/SKILL.mdC:/Users/Harnoor/.claude/skills/feed/SKILL.mdF:/TITAN/scripts/titan_email.py---
Perplexity calls: 9 total — 8x sonar-pro (~$0.04 each) + 1x sonar (~$0.01). Estimated total Perplexity spend: ~$0.33.