TASK-CONTINUITY-RESEARCH — Why TITAN Keeps Forgetting Tasks

Date: 2026-05-13

Author: SCOUT

Audience: Harnoor / FORGE

Word count: ~1,900

---

1. Why Agentic Systems Lose Tasks — The Failure Mode

TITAN's task-continuity problem is not a TITAN-specific bug. It is a known failure pattern in every LLM-based agentic system that mixes in-context state with file-based persistence. Five root causes compound here:

1a. Context window pressure and compaction. Claude Code fires its compaction pass at approximately 95% context fill. The PostToolUse hook captures TodoWrite snapshots, but at compaction time the in-memory todo list (8 tasks as of this morning's latest snapshot) is a fraction of the 130+ tasks spread across per-project TASKS.md files. The compaction summarizer has no reason to enumerate file-based tasks it has not yet read. Whatever was not loaded into the active context before the 95% threshold is not in the summary.

1b. The two-source drift problem. TITAN maintains two parallel records: TodoWrite (in-session, persisted by the PostToolUse hook to titan-tasks-latest.json) and per-project TASKS.md files (aggregated hourly by the dashboard script). These two sources go out of sync because TodoWrite only captures tasks Claude added in the current session. Work agreed verbally in conversation, tasks deferred with "add this to the queue," and sub-tasks from multi-step plans never touch TodoWrite unless Claude explicitly calls it. The TASKS.md files are authoritative, but they are read lazily — only if Claude opens them.

1c. Session resume amnesia. Claude Code does not automatically re-read titan-tasks-latest.json at session start unless the CLAUDE.md or a SessionStart hook instructs it to. The current CLAUDE.md points to F:/TITAN/state/titan-tasks-latest.json as the persistent task layer, but that file holds only the last TodoWrite snapshot — 8 tasks, not 130. The session effectively starts with no knowledge of the backlog.

1d. Multi-agent fan-out loses sub-task state. When TITAN spawns SCOUT, FORGE, or ORACLE as sub-agents, those agents produce deliverables but do not write back to the canonical TASKS.md. Their in-progress sub-tasks exist only in their own sub-session context. If the parent session compacts, the parent forgets what sub-agents were asked to do.

1e. No session-end reconciliation. There is no mechanism today that asks "what did we agree on this session that is not yet in TASKS.md?" before the session closes or compacts. The PreCompact hook (precompact-snapshots/) captures the ask ledger and recent decisions well, but it does not diff against TASKS.md to find the gap.

---

2. 2025–2026 Best-Practice Patterns (with Citations)

LangGraph checkpointing (2025). LangGraph serializes the entire graph state to a persistent store at every step. Any node failure or session end leaves a replayable checkpoint. Production teams use MemorySaver for in-thread and InMemoryStore for cross-thread persistence. The key insight is that state is typed, versioned, and external from the start — not retrofitted. (Source: LangGraph production docs, verified May 2026 via Perplexity sonar-pro.)

Anthropic Managed Agents Memory Stores (beta, 2026-04-01). Anthropic shipped a Memory Stores API as part of the Managed Agents beta. Memory stores are versioned, file-like paths that survive session teardown. Sessions attach stores at creation time and load them automatically. Content-addressed SHA-256 IDs prevent race-condition overwrites. The dreaming API allows offline memory refinement: a "dream" job reads up to 100 prior sessions and a source store, then writes a new deduplicated store. This is exactly the architecture TITAN needs for cross-session task state. (Source: platform.claude.com/docs/en/managed-agents/memory, confirmed 2026-05-13 via Perplexity sonar-pro.)

Claude Code PreCompact hook (v2.1.118+). The hook receives the full transcript before compaction. Exit code 2 or {"decision": "block"} vetoes compaction entirely until the hook exits cleanly. The current TITAN PreCompact script saves a snapshot to F:/TITAN/state/precompact-snapshots/ but does not read TASKS.md or diff for missing tasks before saving. This is an incomplete use of a powerful primitive. (Source: TITAN internal reference agent-memory/scout/reference_claude_code_internals.md, April 2026.)

Cursor / Cline Memory Bank pattern (2025). The Memory Bank pattern, popularized in Cursor and Cline communities, requires an agent to maintain a set of structured markdown files (projectbrief.md, progress.md, activeContext.md) and to re-read ALL of them at the start of every session before doing anything else. The critical rule: if memory files do not exist, do not proceed. This forces explicit state loading rather than relying on context carryover. (Source: Cursor/Cline community, Perplexity sonar, 2026.)

Broad-recall retrieval over narrow precision (2025 research). A 2025 arXiv study (arXiv:2604.22085) found that expanding retrieval from 10 to 100 chunks yields a +28.4 percentage point improvement in long-horizon task accuracy. The LLM filters noise in-context more reliably than constrained vector search filters it at retrieval time. Applied to TITAN: loading all TASKS.md content at session start is cheaper and more reliable than smart-selecting "relevant" tasks.

Zep / Graphiti temporal graph memory (2025–2026). Zep uses a temporal knowledge graph to store episodic memory with timestamps and decay signals. On LongMemEval, Zep scored 63.8% vs Mem0's 49.0% on temporal/episodic tasks. The temporal edge makes it stronger for "what did we agree three sessions ago?" queries. Graphiti (Zep's open-source graph layer) can be self-hosted. (Source: atlan.com/know/best-ai-agent-memory-frameworks-2026, arXiv:2605.11032.)

Mem0 (mem-os) selective fact extraction (2025). Mem0 extracts structured facts from conversation, deduplicates, and injects only relevant facts at the start of the next session. Reports ~80% token reduction vs naive history replay and a 26% accuracy boost. Weaker than Zep on temporal reasoning but faster and simpler to integrate. (Source: vectorize.io/articles/best-ai-agent-memory-systems, 2025–2026.)

---

3. The 5 Concrete Patches TITAN Should Ship This Week

Patch 1 — SessionStart hook that loads the full task backlog

What: A SessionStart hook that reads F:/TITAN/state/TASKS.md (the aggregated PENDING.md) and injects it as additionalContext into Claude's context at session open.
How: Create F:/TITAN/hooks/session_start_load_tasks.py. In ~/.claude/settings.json hooks block, add a SessionStart entry pointing to this script. The script reads TASKS.md (or the top 150 lines of PENDING.md) and writes to stdout as {"additionalContext": "<content>"}.
Cost: ~3K tokens per session start. At current session frequency, negligible.
Expected leak reduction: Closes the "session resume amnesia" gap entirely. Every session starts with the full backlog visible.

Patch 2 — PreCompact hook that diffs in-session todos against TASKS.md

What: Extend the existing PreCompact script to read titan-tasks-latest.json, extract task titles, then grep TASKS.md for each. Any task present in titan-tasks-latest.json but absent from TASKS.md gets appended to a diff file and injected as additionalContext before compaction.
How: Edit F:/TITAN/hooks/precompact_snapshot.py (or equivalent). Add a diff section: set(todos_in_session) - set(tasks_in_md) → write to F:/TITAN/state/precompact-diff-<ts>.md → return that content in the hook's additionalContext.
Cost: One file read per compaction. Zero.
Expected leak reduction: Catches decisions made in-session that were never written to TASKS.md.

Patch 3 — Canonical task source consolidation (single writer)

What: Declare TASKS.md files as the one source of truth. TodoWrite becomes a session-local scratchpad only. A nightly reconciler script reads titan-tasks-latest.json and appends any net-new tasks (by title hash) to the appropriate project TASKS.md.
How: New script F:/TITAN/scripts/reconcile_todos_to_tasks.py. Run via Windows Task Scheduler \TITAN\titan-todo-reconciler at 23:50 daily. Log diffs to F:/TITAN/state/todo-reconcile.log.
Cost: 5 minutes to write. Zero runtime cost.
Expected leak reduction: Eliminates the two-source drift permanently.

Patch 4 — Session-end summary hook (stop-session trigger)

What: A StopSession hook (Claude Code v2.1.121+ supports this) that prompts Claude to summarize "what was decided, what tasks were added, what tasks were completed this session" and appends that summary to F:/TITAN/state/session-summaries/<date>.md.
How: Create F:/TITAN/hooks/stop_session_summarize.py. In the hook, write a prompt to stdout as {"additionalContext": "Before this session ends, write a 5-bullet summary of: new tasks added, tasks completed, tasks deferred, decisions made, any blockers. Append to F:/TITAN/state/session-summaries/<today>.md"}.
Cost: One additional Claude turn per session end. Minimal.
Expected leak reduction: Creates an auditable session trail for human review and future context injection.

Patch 5 — TASKS.md load in CLAUDE.md (immediate, no code)

What: Add an explicit instruction to the global CLAUDE.md: "At the start of every session, read F:/TITAN/state/PENDING.md before taking any action. This is the canonical task backlog."
How: One-line edit to ~/.claude/CLAUDE.md under the Persistent State section.
Cost: Zero (instruction tokens already paid).
Expected leak reduction: Forces task context loading even without the SessionStart hook. Ships in 30 seconds.

---

4. Tools and Libraries That Solve This

Mem0 (mem-os). Python library + managed API. Extracts structured facts from conversation turns, deduplicates, and injects on next session. Best fit: user preference and project context retention. Weak on temporal task state. Self-hostable. pip install mem0ai. Free tier available; production ~$0.002/memory operation. GitHub: mem0ai/mem0.

Letta (formerly MemGPT). Production multi-agent memory framework. Agents have explicit core_memory (always-in-context), archival_memory (vector search), and recall_memory (conversation history). Designed for long-lived agents that must maintain state across months of sessions. Steeper integration than Mem0 but purpose-built for the TITAN use case. Self-hostable. (letta.com)

Zep / Graphiti. Temporal knowledge graph for episodic memory. Best benchmark performance (63.8% on LongMemEval) for "what did we agree three sessions ago?" queries. Graphiti is the open-source graph layer; Zep Cloud adds managed hosting. Most powerful for cross-session task recall but highest integration complexity. (getzep.com, github.com/getzep/graphiti)

Anthropic Managed Agents Memory Stores (beta 2026). First-party solution. Versioned file-like paths, SHA-256 content addressing, dreaming API for offline memory refinement. Attaches to sessions automatically. Requires Anthropic direct API (not Bedrock). The dreaming pass is ideal for weekly memory hygiene: read 100 recent sessions, deduplicate, output a clean store. Currently in beta — production readiness unclear. (platform.claude.com/docs/en/managed-agents/memory)

OpenAI Threads / Responses API (deprecated / successor). OpenAI's Assistants API Threads are shutting down in 2026. The replacement Responses API offers previous_response_id chaining for stateful multi-turn. Not directly applicable to Claude Code but relevant as an architectural reference for "server-side state threading."

---

5. Recommended TITAN-Specific Implementation

Stack decision: Do not add Mem0, Letta, or Zep yet. TITAN already has 90% of the infrastructure needed. The gap is three missing connections: session-start loading, pre-compact diffing, and a single canonical writer. Fix those first. Revisit Zep/Graphiti in Q3 2026 if cross-session episodic recall remains weak after Patches 1–5.

Canonical file paths:


F:/TITAN/state/TASKS.md             — aggregated hourly (already exists)
F:/TITAN/state/titan-tasks-latest.json — PostToolUse TodoWrite snapshot (already exists)
F:/TITAN/state/session-summaries/   — new directory, one file per session
F:/TITAN/state/precompact-diff-*.md — new, written by extended PreCompact hook
~/.claude/settings.json             — hooks registration (existing)

Hook wiring (settings.json additions):


{
  "hooks": {
    "SessionStart": [
      { "matcher": "*", "hooks": [
        { "type": "command", "command": "python F:/TITAN/hooks/session_start_load_tasks.py" }
      ]}
    ],
    "PreCompact": [
      { "matcher": "*", "hooks": [
        { "type": "command", "command": "python F:/TITAN/hooks/precompact_snapshot.py" }
      ]}
    ],
    "StopSession": [
      { "matcher": "*", "hooks": [
        { "type": "command", "command": "python F:/TITAN/hooks/stop_session_summarize.py" }
      ]}
    ]
  }
}

(PostToolUse for TodoWrite already exists — no change needed.)

Cron additions (Windows Task Scheduler):


\TITAN\titan-todo-reconciler   — daily 23:50 UTC — python F:/TITAN/scripts/reconcile_todos_to_tasks.py

Session-start flow (after Patch 1 + 5):

1. Claude Code fires SessionStart hook → session_start_load_tasks.py reads top 200 lines of PENDING.md → injects as additionalContext.

2. CLAUDE.md instruction confirms: "read PENDING.md before anything else."

3. Claude opens the session already aware of the full 130+ task backlog.

Pre-compact flow (after Patch 2):

1. At 95% context fill, Claude Code fires PreCompact hook → precompact_snapshot.py runs.

2. Script reads titan-tasks-latest.json (session todos) and TASKS.md (canonical).

3. Diffs: any session todo not in TASKS.md → written to precompact-diff-<ts>.md + injected as additionalContext.

4. Compaction summarizer now sees the diff and preserves it in the summary.

5. Hook exits 0 — compaction proceeds.

Did-we-actually-do-this verifier (session resume):

The session_start_load_tasks.py hook also reads the latest session-summaries/<yesterday>.md and the latest precompact-diff-*.md and injects both.
Claude sees at session open: "yesterday's session summary" + "tasks that were in-flight but not yet in TASKS.md."
FORGE then has 30 seconds of work: grep TASKS.md for each item in the precompact-diff. If missing → append. If present and completed → mark done.

Estimated total implementation time: 3–4 hours for Patches 1–5 all five shipped by FORGE.

Estimated leak reduction: 80–90%. The remaining 10–20% is tasks agreed verbally that Claude never calls TodoWrite on — solved by Patch 4 (session-end summary) creating a human-reviewable audit trail.

---

Sources

1. Perplexity sonar-pro — Claude Code hooks and context compaction, May 2026

2. Perplexity sonar-pro — LangGraph checkpointing and AutoGen memory persistence, 2025–2026

3. Perplexity sonar-pro — Mem0 / Zep / Letta / Graphiti comparison, 2025–2026

4. Perplexity sonar — Cursor/Cline Memory Bank pattern, long-horizon task continuity, 2025–2026

5. Perplexity sonar-pro — Anthropic Managed Agents Memory Stores beta, May 2026

→ platform.claude.com/docs/en/managed-agents/memory

6. Perplexity sonar-pro — Claude Code PreCompact hook, v2.1.118+ features

→ arxiv.org/pdf/2604.14228

7. TITAN internal reference — agent-memory/scout/reference_claude_code_internals.md (April 2026)

8. TITAN live state — F:/TITAN/state/titan-tasks-latest.json (8 tasks) vs TASKS.md (130+ tasks) — gap confirmed 2026-05-13

9. arXiv:2605.11032 — Zep/Graphiti temporal memory benchmark (LongMemEval 63.8%)

10. arXiv:2604.22085 — Broad-recall retrieval +28.4pp improvement on long-horizon tasks