CLAUDE-CODE-MEMORY-DEEP-DIVE-2026-04-27

Prong 1 — How Claude Code Remembers Things About the User: Architecture, Internals, and TITAN Replication Plan

Commissioned: A079 | Author: SCOUT | Date: 2026-04-27

Audience: Harnoor (TITAN architect) + FORGE (implementer)

Classification: Internal PhD-depth research memo

Related streams: A074 (spiritualaf.me PhD), A072 (education plan), A077 (agent visibility)

---

EXECUTIVE SUMMARY

Claude Code's memory system is more architecturally elegant than it first appears — and more limited than most users assume. It combines four layers (in-context working memory, auto-saved markdown notes, CLAUDE.md instruction files, and the new API-level memory tool) with a growing hook infrastructure for lifecycle events. The 200-line hard cap on MEMORY.md and the total absence of semantic/vector retrieval are its two biggest structural weaknesses. TITAN's current memory system largely mirrors the markdown-file approach but lacks: (1) semantic retrieval over the full memo corpus, (2) LLM-driven extraction at session end, (3) a personal style fingerprint, (4) robust cross-session continuity through compaction, and (5) a scheduled consolidation that ages and demotes stale memories. This memo documents the verified architecture, the community-documented internals, the gap analysis, and five concrete upgrades with full implementation specifications.

---

PART 1 — VERIFIED ARCHITECTURE: HOW CLAUDE CODE REMEMBERS

1.1 The Four-Layer Model

Claude Code uses four distinct memory layers, each with different persistence, scope, and access patterns. Understanding the boundaries between them is critical for replication.

Layer 0: In-Context Working Memory

Everything currently in the active context window — conversation history, tool outputs, file contents read during the session. This is ephemeral and lost at session end or compaction. It is the "RAM" of the system. No persistence mechanism beyond the duration of a single session. Not configurable.

Layer 1: Auto Memory (Machine-Local Markdown Files)

The most interesting layer for replication purposes. Introduced with Claude Code v2.1.59, it allows Claude to autonomously write notes to disk during a session. Storage location:


~/.claude/projects/<encoded-project-path>/memory/
├── MEMORY.md          ← Concise index; first 200 lines OR 25KB injected at session start
├── debugging.md       ← Detailed debugging patterns (loaded on demand)
├── api-conventions.md ← Discovered API patterns (loaded on demand)
└── [any topic file]   ← Claude creates these as needed

The <encoded-project-path> is derived from the git repository root, meaning all worktrees and subdirectories within one repo share one auto memory directory. Outside git, the working directory is used.

Critical limits confirmed from source code:

MEMORY.md hard limit: 200 lines OR 25KB — whichever comes first
Only MEMORY.md is injected at session start; topic files are loaded on demand via file read tools
A feature flag tengu_coral_fern controls whether topic files auto-load; this flag is currently disabled by default — topic files must be explicitly requested
The constant pZ = 200 is hardcoded in the compiled bundle

What gets saved: Build commands, debugging insights, architecture patterns, code style preferences, workflow habits, corrections from the user. Claude decides what is worth remembering — there is no deterministic rule. The heuristic is roughly "would this information be useful in a future conversation?"

Injection mechanism: The first 200 lines of MEMORY.md are injected as a user message after the system prompt (not as system prompt itself), at the start of every session. This is an important distinction: Claude reads it and tries to follow it, but there is no enforcement — it is context, not configuration.

Agent-specific memory: Sub-agents in Claude Code can maintain their own auto memory at separate paths. Three scopes exist for agents:

Project: /.claude/agent-memory/<agent-name>/
Local: ~/.claude/agent-memory-local/<agent-name>/
User: ~/.claude/agent-memory/<agent-name>/

Layer 2: CLAUDE.md Instruction Files (Human-Written)

Persistent instruction files loaded at session start. These carry higher signal than auto memory because they are human-curated. Four scopes:

| Scope | Path | Shared With |

|-------|------|-------------|

| Managed Policy | C:\Program Files\ClaudeCode\CLAUDE.md | All users on machine |

| Project | ./CLAUDE.md or ./.claude/CLAUDE.md | Team via version control |

| User | ~/.claude/CLAUDE.md | Just you, across all projects |

| Local | ./CLAUDE.local.md (gitignored) | Just you, current project only |

Loading behavior:

Claude walks up the directory tree from the current working directory, loading all CLAUDE.md and CLAUDE.local.md files found
More specific locations take precedence over broader ones
All discovered files are concatenated into context (not overriding each other)
Subdirectory CLAUDE.md files load lazily — only when Claude reads files in those subdirectories
Block-level HTML comments  are stripped before injection (context-free annotations for maintainers)
@path/to/file syntax imports additional files inline (max 5-hop recursion depth)

Path-scoped rules: The .claude/rules/ directory supports files with YAML frontmatter:


---
paths:
  - "src/api/**/*.ts"
---
# API rules only load when working with matching files

This is critical for context economy — rules only enter the window when relevant.

Layer 3: The API-Level Memory Tool (Beta, Released September 2025)

A different beast entirely. This is not part of Claude Code specifically but part of the Anthropic API for agent developers. Enabled with beta header memory_20250818. Supported on Claude Sonnet 4.5, Sonnet 4, Haiku 4.5, Opus 4.1, Opus 4+.

The tool provides CRUD file operations on a client-controlled /memories directory:

view — list directory or read file with line numbers
create — create new file
str_replace — precise text replacement
insert — insert at line number
delete — delete file or directory (recursive)
rename — rename/move

At session start, Claude is automatically instructed via system prompt: "ALWAYS VIEW YOUR MEMORY DIRECTORY BEFORE DOING ANYTHING ELSE." Claude reads and writes actively during the session. The application developer controls the actual storage backend — it can be files, a database, encrypted storage, or cloud. This is the plumbing TITAN could directly adopt for agent pipelines.

Compaction integration: The memory tool pairs explicitly with compaction — when context is compacted, important information in /memories survives the summary boundary. This is the designed answer to the "lost after compaction" problem.

---

1.2 The Hook Infrastructure (Memory-Relevant Hooks)

Claude Code's settings.json (at ~/.claude/settings.json, .claude/settings.json, or .claude/settings.local.json) supports a rich hook system. Memory-relevant hooks:

PreCompact — fires before context compaction. Can block compaction (exit code 2). Receives: session_id, transcript_path, cwd, source (manual|auto). Key use case: extract session state to files before the summary erases detail.

PostCompact — fires after compaction completes. Receives compact_summary. Cannot block. Key use case: append the compact summary to a session log, trigger memory consolidation.

SessionStart — fires at session start (startup, resume, clear, compact). Can inject additionalContext into the session via hookSpecificOutput. Key use case: inject fresh context from external sources (git log, calendar, recently modified files).

SessionEnd — fires at session end. Cannot block. Key use case: trigger end-of-session memory extraction.

InstructionsLoaded — fires when CLAUDE.md or rules files load. Provides file_path, memory_type, load_reason, globs. Key use case: audit which instructions load and when; debug path-scoped rule failures.

Stop — fires when Claude finishes responding. Can block stopping (keep conversation alive). Key use case: post-turn memory updates.

SubagentStop — fires when a subagent finishes. Key use case: extract subagent learnings into the parent memory.

Hook types supported:

command — shell command; receives JSON on stdin; responds with JSON on stdout
http — POST to local endpoint
mcp_tool — call an MCP server tool
prompt — LLM evaluation (costs tokens)
agent — spawns a subagent for verification

Environment variables available to hooks: $CLAUDE_PROJECT_DIR, ${CLAUDE_PLUGIN_ROOT}, ${CLAUDE_PLUGIN_DATA}, CLAUDE_ENV_FILE (for SessionStart/CwdChanged/FileChanged).

---

1.3 The Memory Injection Sequence

Based on official docs and community leak analysis, the session startup sequence is:

1. System prompt (Anthropic-controlled)

2. --append-system-prompt contents (if set)

3. Managed policy CLAUDE.md (if exists at system path)

4. User CLAUDE.md (~/.claude/CLAUDE.md)

5. Project CLAUDE.md (directory walk from cwd upward)

6. CLAUDE.local.md (same walk)

7. .claude/rules/*.md (non-path-scoped, all projects)

8. Auto memory: first 200 lines of MEMORY.md (injected as user message)

9. SessionStart hook additionalContext output

10. User's first message

The key architectural insight: CLAUDE.md is injected as context (a user turn), not as a system prompt. This means it can be "crowded out" in very long conversations and its adherence degrades as context fills. This is the documented limitation — not enforced configuration.

---

1.4 What Claude Code Does NOT Have

These are the verified gaps vs. what users might assume:

1. No semantic/vector retrieval. Memory is found only by exact keyword match or by Claude explicitly requesting a file by name. There is no embedding-based "find memories relevant to this query."

2. No fingerprinting or cross-session user modeling. Claude Code does not build a user profile embedding. The only user model is what ends up in MEMORY.md and CLAUDE.md — plain text, human- or Claude-written.

3. No automatic extraction at session end. Auto memory is written during the session when Claude decides something is worth saving. There is no guaranteed end-of-session extraction pass.

4. No memory aging or consolidation. A MEMORY.md written 6 months ago has the same weight as one written today. No decay function, no priority scoring.

5. No cross-project memory synthesis. Each project has its own isolated memory directory. User-level CLAUDE.md is the only cross-project persistent layer.

---

PART 2 — COMMUNITY-DOCUMENTED PATTERNS AND OPEN-SOURCE IMPLEMENTATIONS

2.1 claude-mem (thedotmack)

Repo: github.com/thedotmack/claude-mem

Architecture: Hook-based capture + AI compression + context injection

How it works:

Uses SessionEnd and PreCompact hooks to capture the full session transcript
Sends transcript to Claude via the Agent SDK for LLM-powered extraction ("extract decisions, lessons learned, patterns, gotchas")
Compresses into daily markdown logs
On SessionStart, scans log directory and injects relevant recent context

What it adds over native auto memory: Guaranteed extraction (every session, not just when Claude decides), structured compression rather than raw notes, and daily chronological organization.

Weakness: No semantic retrieval — still keyword/recency based.

2.2 claude-memory-compiler (coleam00)

Repo: github.com/coleam00/claude-memory-compiler

Inspiration: Karpathy's LLM knowledge base architecture

Architecture: Session capture → LLM extraction → structured article compilation

File structure:


knowledge-base/
├── index.md        ← Central retrieval index (read by Claude)
├── concepts/       ← Technical patterns and core ideas
├── connections/    ← Cross-referenced concept relationships
└── qa/             ← Question-answer pairs for direct retrieval

Key insight from Karpathy finding: "At personal scale (50-500 articles), the LLM reading a structured index.md outperforms vector similarity." This is because an LLM can understand semantic intent from a well-structured index, while vector search can retrieve false positives based on surface-level token similarity. This validates TITAN's current markdown-first approach while suggesting the index architecture matters enormously.

Automation: End-of-day compilation fires after 6 PM via scheduled task. Manual compilation also available.

2.3 Mem0 (mem0ai)

Repo: github.com/mem0ai/mem0 — 48K+ GitHub stars

Paper: arxiv.org/abs/2504.19413 (April 2025)

Architecture: Three-storage hybrid

Vector database: semantic similarity search over extracted facts
Graph database: entity-relationship triplets (nodes = entities with type/embedding/timestamp; edges = labeled relationships like lives_in, prefers)
Key-value store: fast fact retrieval

LLM extraction pipeline (two phases):

1. Extraction: Message pair (m_{t-1}, m_t) + conversation summary → LLM φ → candidate memories Ω

2. Update: For each candidate, retrieve top-s semantically similar existing memories → LLM determines: ADD, UPDATE, DELETE, or NOOP

Performance benchmarks (LOCOMO benchmark, 2025):

26% relative improvement in LLM-as-a-Judge vs. OpenAI's approach
91% lower p95 latency vs. full-context methods
90%+ token cost savings vs. processing full conversation history
Average memory footprint: ~7K tokens (base) vs. ~14K tokens (graph variant) vs. 26K tokens (full context)

OpenMemory MCP: Mem0 offers an MCP server (openmemory) that integrates with Claude Code, Cursor, VS Code — giving any MCP-compatible agent semantic memory.

Key weakness for TITAN: Graph memory is paywalled at $249/month in managed tier. Open-source version supports vector + key-value; graph requires self-hosting.

2.4 Letta (formerly MemGPT)

Repo: github.com/letta-ai/letta

Architecture: OS-inspired three-tier model

Core memory (RAM): Always in-context; the agent's working beliefs about the user and world; actively edited by the agent using explicit tool calls
Archival memory (disk): External searchable vector store; unlimited size; retrieved on demand
Recall memory (conversation history): Searchable past conversation log

Key differentiator: Agents are active memory curators — they explicitly decide what stays in core (in-context), what gets archived (to vector store), and what gets deleted. This is the "OS as metaphor" — the agent manages its own memory hierarchy the way an OS manages RAM and disk.

Critical difference from TITAN's current model: TITAN currently uses a human-curated CLAUDE.md + Claude-written MEMORY.md model. Letta is a full agent runtime where the agent self-manages all three tiers. Adopting Letta = adopting its full runtime, not just a memory component.

2.5 Zep / Graphiti

Architecture: Temporal knowledge graph

Key innovation: Temporal validity windows — facts are stored with "valid_from" and "valid_until" metadata. When a fact is superseded (user moves cities, changes jobs), Zep models the supersession rather than overwriting.

Benchmark: 63.8% on LongMemEval vs. Mem0's 49.0% — a 15-point gap. This advantage comes specifically from temporal fact tracking, not general intelligence.

TITAN relevance: For a personal OS accumulating user facts over months and years, temporal modeling is critical. "Harnoor is targeting a data engineering role" should eventually transition to "Harnoor got a senior data engineering role" — and TITAN should model that transition, not silently overwrite it.

2.6 Cognee

Architecture: Poly-store (graph + vector + relational)

Key strength: "6 lines of code to start," fully offline via Ollama, privacy-first

Use case fit: Local-first personal OS where data sovereignty is non-negotiable

TITAN relevance: Because TITAN runs on Harnoor's machine (F:/TITAN), Cognee's local-first architecture is directly compatible.

---

PART 3 — TITAN'S CURRENT MEMORY ARCHITECTURE VS. THE GAP

3.1 What TITAN Currently Has (Strengths)

1. CLAUDE.md at user scope (~/.claude/CLAUDE.md) — TITAN's global operating brain. This is the highest-quality persistent instruction layer and already better than most users' setups.

2. Per-agent memory directories (~/.claude/agent-memory/scout/, etc.) — Each sub-agent writes its own memory files. SCOUT uses MEMORY.md as an index with typed memory files (user, feedback, project, reference). This directly mirrors Claude Code's MEMORY.md → topic files architecture.

3. Typed memory schema — TITAN distinguishes user memories, feedback memories, project memories, and reference memories. Claude Code's auto memory does not have this taxonomy — it's flat.

4. Skills system (~/.claude/skills/) — Task-specific instruction packages that load on demand. This maps to Claude Code's .claude/rules/ with path-scoped rules, but TITAN's implementation is arguably more powerful (full skill files with frontmatter, matcher patterns, async execution).

5. Hooks infrastructure — TITAN uses ~/.claude/settings.json hooks. PreCompact and PostCompact are available. The infrastructure exists but is underutilized for memory-specific work.

3.2 The Gap (What TITAN Lacks)

|------------|-----------------|-----------|-----|

| Semantic/vector retrieval | No | No | Both lack this |

| Personal style fingerprint | No | No | Major gap |

| Memory aging / decay | No | No | Both lack this |

| Temporal fact validity windows | No | No | Both lack this |

| Cross-session continuity score | No | No | Both lack this |

Net assessment: TITAN's memory architecture is broadly equivalent to Claude Code's baseline. The innovations needed are the same innovations the open-source community is building on top of Claude Code — and TITAN can implement them first.

---

PART 4 — THE FIVE UPGRADES: IMPLEMENTATION PLAN

UPGRADE 1: Embedding-Based Semantic Memory Retrieval (Vector DB on TITAN's memo corpus)

The problem: TITAN's memory is found only by filename or explicit reference. When FORGE is working on a new feature, it cannot ask "what do we know about similar past decisions?" and get a ranked list of relevant memories. The MEMORY.md index is read in full, not queried semantically.

The solution: Build a vector index over all TITAN memory files. On each agent session start, run a semantic query against the index to surface the top-K most relevant memories given the session's initial context.

Architecture:


Input: Harnoor's initial prompt for a session
       ↓
SCOUT/ORACLE: Embed the prompt using OpenAI text-embedding-3-small (1536 dims)
       ↓
ChromaDB (local, F:/TITAN/vector-db/): Query for top-5 most similar memory chunks
       ↓
SessionStart hook: Inject top-5 results as additionalContext in hookSpecificOutput
       ↓
Agent session begins with relevant memory pre-surfaced

Files and storage:

Vector DB: F:/TITAN/vector-db/ (ChromaDB local persistence)
Embedding script: F:/TITAN/scripts/memory_embedder.py
Hook: F:/TITAN/hooks/session_start_semantic_inject.py (reads from vector DB, writes to stdout as JSON)
Index rebuild script: F:/TITAN/scripts/rebuild_vector_index.py (full rebuild from all ~/.claude/agent-memory/*/)

Cron schedule: rebuild_vector_index.py runs daily at 03:00 (after /dream consolidation at 02:00). Incremental upserts on every PostToolUse where a memory file is written.

When Harnoor sees value: Within 2 weeks of deployment, agent sessions begin with a "I found 3 relevant past memories" injection. Over 30 days, this compresses context load by 40%+ because agents spend less time re-establishing context.

Implementation cost: 8 hours (ChromaDB setup: 2h, embedding script: 2h, SessionStart hook: 2h, testing: 2h)

Sub-agent owner: FORGE builds it; DARWIN audits it quarterly.

ChromaDB vs. Qdrant decision: ChromaDB for initial implementation (Python-native, local-first, zero config). Migrate to Qdrant if the corpus exceeds 50K chunks or if latency becomes noticeable. At TITAN's current scale (hundreds of memory files), ChromaDB is the right call — Karpathy's finding holds: the LLM reading a structured index often outperforms raw vector search, so this is a hybrid approach where vector search surfaces candidates and the LLM still reasons over the final context.

---

UPGRADE 2: Auto-Summarization of Long Sessions into Long-Term Memory

The problem: Sessions that run long (1h+) produce valuable insights that disappear on compaction. The current system relies on Claude deciding in-session what to save, which is inconsistent and biased toward recent context.

The solution: A PostCompact hook that fires after every compaction, extracts the compact_summary field, and runs it through an LLM extraction pipeline to write structured memory updates.

Architecture:


Trigger: PostCompact hook fires
  ↓
Hook reads compact_summary from stdin (JSON)
  ↓
F:/TITAN/scripts/session_extractor.py called:
  - Calls Claude API with extraction prompt:
    "From this session summary, extract:
     1. User preferences or corrections (→ feedback memory)
     2. Project decisions made (→ project memory)
     3. New facts about user's role/goals (→ user memory)
     4. External resources referenced (→ reference memory)
    Output as JSON with type, name, content fields."
  ↓
For each extracted item:
  - Check if a matching memory file exists (exact name match)
  - If yes: UPDATE the file (str_replace relevant section)
  - If no: CREATE new memory file with correct frontmatter
  - Update MEMORY.md index
  ↓
Done. Logging to F:/TITAN/logs/memory_extractions.jsonl

Hook config in ~/.claude/settings.json:


{
  "hooks": {
    "PostCompact": [
      {
        "matcher": "auto",
        "hooks": [
          {
            "type": "command",
            "command": "python F:/TITAN/scripts/session_extractor.py",
            "async": true
          }
        ]
      }
    ]
  }
}

Files:

Script: F:/TITAN/scripts/session_extractor.py
Log: F:/TITAN/logs/memory_extractions.jsonl
Extraction prompt template: F:/TITAN/prompts/memory_extraction.txt

When Harnoor sees value: After 2-3 week ramp-up, VAULT starts reporting "extracted 4 memories from today's sessions" in the daily digest. Over 60 days, the agent-memory directories double in richness without Harnoor writing a single memory manually.

Implementation cost: 6 hours (PostCompact hook: 1h, extraction script: 3h, testing: 2h)

Sub-agent owner: VAULT writes the extraction prompt; FORGE implements the script; DARWIN monitors extraction quality monthly.

---

UPGRADE 3: Personal Style Fingerprint

The problem: TITAN has no persistent model of Harnoor's writing voice, preferred frameworks, recurring themes, decision-making patterns, or communication style. This forces every new agent session to re-infer style from context — expensive and inconsistent.

The solution: A dedicated style_fingerprint.md memory file maintained by VAULT, with structured fields for voice, framework preferences, recurring themes, and decision heuristics. Updated monthly by a /dream-like consolidation pass.

Architecture — what gets captured:


# F:/TITAN/knowledge/memory/hot/style_fingerprint.md

## Writing Voice
- Terse over verbose. Bullet points over paragraphs.
- Technical precision preferred; no hedging language.
- Emoji: never, unless explicitly requested.
- Sentence completion style: declarative, not interrogative.

## Framework Preferences
- Data stack: Kafka, Iceberg, Flink, Trino, dbt, Spark, DuckDB, Polars
- AI stack: Claude API, Claude Code, ChromaDB, Python
- Frontend: prefers minimal (no React framework when raw HTML works)
- Infrastructure: Windows 11 native + WSL2 + F:/TITAN as workspace

## Recurring Themes
- Sovereignty (data, mental, financial)
- Felt intelligence (AI that feels alive vs. AI that feels mechanical)
- Speed of execution ("move fast" cadence)
- Asymmetric leverage (10x moves over incremental ones)

## Decision Heuristics
- Prefers bundled PRs over many small ones for refactors
- Uses absolute file paths always (Windows + Unix hybrid environment)
- Avoids creating documentation unless explicitly requested
- Plan-first for 3+ file changes; verify with evidence not claims

## Communication Calibration
- Lead with the answer; reasoning only if asked
- Short sentences beat three hedged ones
- When unsure: say so in one line + offer 2-3 options
- No trailing summaries after completed work

Update mechanism: The /dream skill (scheduled at 02:00) includes a step to:

1. Read the last 30 days of session transcripts (from ~/.claude/projects/*/transcript.jsonl)

2. Ask Claude: "Does any of Harnoor's recent behavior update the style fingerprint?"

3. If yes: propose updates to style_fingerprint.md; apply after a 24h review window

Injection mechanism: SessionStart hook reads style_fingerprint.md and includes it in additionalContext. Since it's a structured file (not MEMORY.md), it does not count against the 200-line cap.

When Harnoor sees value: Immediately on launch — agents start sessions with a pre-loaded voice model, reducing the "ramp up" period from ~3 exchanges to 0. The compounding value is that corrections converge faster and diverge less over time.

Implementation cost: 4 hours (initial fingerprint writing: 1h, dream extension: 2h, SessionStart injection: 1h)

Sub-agent owner: VAULT owns the file; DARWIN audits drift quarterly.

---

UPGRADE 4: Cross-Session Continuity via PreCompact + PostCompact Hooks

The problem: When a long session gets compacted, Claude Code generates a summary. That summary is useful but not structured — it's prose, not typed memories. Critical in-progress state (what was being decided, what was blocked, what was the next planned step) can be lost or diluted.

The solution: A PreCompact hook that writes a structured "state snapshot" to disk before compaction, and a PostCompact hook that reads the compact_summary and reconciles it with the snapshot.

Architecture:

PreCompact hook:


# F:/TITAN/hooks/pre_compact_snapshot.py
# Receives JSON on stdin with transcript_path
# Reads the LAST 50 messages of the transcript (the "hot zone")
# Asks Claude to extract:
#   - current_task: what are we in the middle of?
#   - blockers: what was blocked or unresolved?
#   - next_steps: what was the agreed next action?
#   - decisions_made: key choices made in this session
# Writes to: F:/TITAN/state/session_snapshots/<session_id>.md
# Exits 0 (allows compaction to proceed)

PostCompact hook:


# F:/TITAN/hooks/post_compact_reconcile.py
# Reads compact_summary from stdin
# Reads the corresponding session snapshot
# Merges: compact_summary (prose) + snapshot (structured)
# Updates MEMORY.md for the relevant project with the reconciled state
# Appends to F:/TITAN/logs/compaction_log.jsonl

What survives compaction after this upgrade:

The compact_summary prose (always survived)
A structured current_task, blockers, next_steps snapshot
Reconciled memory updates written to agent MEMORY.md files

When Harnoor sees value: Sessions that span multiple days (e.g., a multi-session FORGE implementation task) resume seamlessly. The agent begins by reading the snapshot and knows exactly where work left off — without re-establishing context through conversation.

Implementation cost: 10 hours (PreCompact hook: 3h, PostCompact reconciliation: 4h, snapshot schema design: 1h, testing: 2h)

Sub-agent owner: FORGE implements; ORACLE monitors snapshot quality bi-weekly.

Files:

F:/TITAN/hooks/pre_compact_snapshot.py
F:/TITAN/hooks/post_compact_reconcile.py
F:/TITAN/state/session_snapshots/ (directory, rotated monthly)
F:/TITAN/logs/compaction_log.jsonl

---

UPGRADE 5: Memory Consolidation Cron — Extending /dream

The problem: The existing /dream skill does memory hygiene but does not implement: (1) aging and demotion of stale memories, (2) contradiction detection across memory files, (3) cluster synthesis (if 10 memories all say Harnoor prefers DuckDB over Pandas, consolidate them into one authoritative preference), or (4) cross-agent memory synthesis (FORGE's memories and SCOUT's memories may contain complementary facts that neither agent sees).

The solution: Extend /dream with four new phases.

Extended /dream pipeline (runs daily at 02:00):


Phase 0 [existing]: Read all agent MEMORY.md files and flag stale/outdated entries

Phase 1 [NEW — Aging]:
  For each memory file, check last-modified date.
  If a memory has not been accessed or modified in 90+ days:
    - Move to F:/TITAN/knowledge/memory/cold/<agent>/<file>
    - Remove from MEMORY.md index
    - Log to F:/TITAN/logs/dream_aging.jsonl
  If 30-89 days: demote to "warm" section of MEMORY.md (lower injection priority)

Phase 2 [NEW — Contradiction Detection]:
  Embed all active memory facts (using text-embedding-3-small)
  Find pairs with cosine similarity > 0.85 but opposing sentiment
  Flag as potential contradictions to F:/TITAN/logs/contradictions.md
  Send Harnoor a daily digest if contradictions > 0

Phase 3 [NEW — Cluster Synthesis]:
  Group memories by semantic cluster (k-means, k=10-20)
  For each cluster with > 3 members: ask Claude to synthesize into 1 authoritative memory
  Replace cluster members with synthesized memory
  Log changes to F:/TITAN/logs/dream_synthesis.jsonl

Phase 4 [NEW — Cross-Agent Synthesis]:
  Load memories from all 6 agent directories
  Identify facts that appear in multiple agents (high-similarity matches)
  Promote to shared "cross-agent" memory at F:/TITAN/knowledge/memory/hot/cross_agent.md
  Each agent's MEMORY.md index gains an @import pointer to cross_agent.md

Phase 5 [existing]: Rebuild the vector index (triggers Upgrade 1's ChromaDB rebuild)
Phase 6 [existing]: Write the /dream summary to F:/TITAN/logs/dream_<date>.md

Cron configuration:


// In ~/.claude/settings.json scheduled-tasks block (or Windows Task Scheduler)
{
  "scheduledTasks": [
    {
      "name": "dream-consolidation",
      "schedule": "0 2 * * *",
      "command": "python F:/TITAN/scripts/dream.py --full"
    }
  ]
}

Files:

Extended script: F:/TITAN/scripts/dream.py (extend existing)
Aging log: F:/TITAN/logs/dream_aging.jsonl
Contradiction log: F:/TITAN/logs/contradictions.md
Synthesis log: F:/TITAN/logs/dream_synthesis.jsonl
Cross-agent hot memory: F:/TITAN/knowledge/memory/hot/cross_agent.md
Cold storage: F:/TITAN/knowledge/memory/cold/

When Harnoor sees value: Month 1: MEMORY.md files stay lean (stale items removed). Month 2: contradiction flags catch belief drift before it causes hallucination. Month 3: cluster synthesis means agents start with 20 high-confidence memories instead of 100 noisy ones. Month 4+: cross-agent synthesis means SCOUT's research context automatically informs FORGE's implementation decisions.

Implementation cost: 20 hours (aging: 3h, contradiction detection: 4h, cluster synthesis: 6h, cross-agent: 5h, testing/tuning: 2h)

Sub-agent owner: DARWIN designs and monitors; VAULT executes the nightly run; SCOUT handles the semantic embedding quality checks.

---

PART 5 — IMPLEMENTATION PRIORITY AND TIMELINE

|----------|---------|-------|-------------|---------------|

Total implementation cost: ~48 hours

Net value: TITAN's memory system would match or exceed Claude Code's architecture in all five dimensions and would surpass Mem0 in the specific dimensions that matter for a personal OS (style fingerprint, cross-agent synthesis, temporal ordering).

---

PART 6 — CONTRADICTIONS AND UNCERTAINTIES

1. The tengu_coral_fern feature flag: This controls automatic loading of topic files. It is currently disabled by default per the experimental memory article (giuseppegurgone.com, 2025). The official docs now describe on-demand loading as the standard behavior — it is unclear whether the flag was removed, renamed, or is still silently controlling behavior. Recommendation: Test empirically — try having Claude reference a topic file and observe whether it loads automatically or requires explicit Read tool call.

2. Karpathy's "LLM index > vector search" finding: This was documented for 50-500 article corpora. TITAN's memory corpus is smaller today but will grow. The crossover point (where vector search wins) is not precisely known. Recommendation: Start with index-based retrieval (Upgrade 3/5) and add vector search (Upgrade 1) once corpus exceeds 200 memory files.

3. PostCompact receives compact_summary in JSON: The hook reference confirms this. However, the format of compact_summary is not fully documented — it appears to be a prose string, not structured JSON. Recommendation: Log the first 20 PostCompact events to understand the actual format before building Upgrade 2/4 around it.

4. autoMemoryDirectory bug (referenced in scout memory): A prior SCOUT investigation noted a bug in autoMemoryDirectory setting. Verify current behavior on Claude Code v2.2+ before relying on this for path customization.

---