SCOUT · Token Audit + Reduction Plan

Memo ID: SCOUT-TOKEN-AUDIT-AND-REDUCTION-2026-05-02

Type: Telemetry audit + applied reduction proposal

Author: SCOUT (TITAN research arm)

Date: 2026-05-02

Status: READY FOR DARWIN REVIEW → FORGE EXECUTION

Companion: SCOUT-BATCH-API-APPLIED-TO-TITAN-2026-05-02.md (batch is one of seven levers)

---

0. Executive summary

| | |

|---|---|

| What I measured | 7 days of F:/TITAN/metrics/tools-*.tsv — every tool call by every Claude session, plus compaction logs and state-file sizes. |

| What's burning tokens | 81% of all tool calls are Bash, and the top Bash patterns are repeated polling reads of growing JSONL state files (especially inbox-queue.jsonl). |

| The single biggest waste | tail -N F:/TITAN/state/inbox-queue.jsonl fired 619 times in 7 days. File is now 547KB / 775 rows. Estimated token cost: ~4.6M input tokens/week just from this one polling pattern. |

| Headline reduction | The 7 levers below cut estimated agent input tokens by ~55–70% at unchanged feature parity. Lever #1 alone cuts ~30%. |

| Decision | Ship Levers #1 + #2 + #4 + #6 this week. Levers #3, #5, #7 next sprint. |

---

1. Where the tokens actually go (7-day audit)

Total tool calls: 14,263 across all Claude sessions in 7 days.

|---|---|---|---|

| Bash | 11,562 | 81.1% | High — output goes into context |

| Read | 1,087 | 7.6% | High — full file content |

| WebSearch | 372 | 2.6% | High — ~3k tokens of result snippets |

| Edit | 326 | 2.3% | Low — diff only |

| Grep | 313 | 2.2% | Low–Med — pattern matches |

| Write | 238 | 1.7% | Low — outbound only |

| Glob | 192 | 1.3% | Low — file paths only |

| WebFetch | 168 | 1.2% | Very high — full HTML |

Conclusion: Bash + Read + WebSearch + WebFetch = 92% of calls and ≥95% of context-token cost. These are the four levers that matter.

1.1 Top Bash command shapes (7 days)

| Count | Pattern | Token risk |

|---|---|---|

| 619 | tail -N F:/TITAN/state/inbox-queue.jsonl | 🔴 critical — 30KB output per call × 619 |

| 493 | ls -la F:/TITAN/state/inbox-*.jsonl | 🟡 modest — file listing |

| 460 | ls F:/TITAN/state/bridge-replies/ (and friends) | 🟡 modest |

| 399 | curl -s -o ... | 🟢 low (output redirected) |

| 366 | swarm-health log appends | 🟢 low |

| 363 | cat *.jsonl (full file cat) | 🔴 critical — entire file into context |

| 288 | silentinfinity_chat_smoke runner | 🟡 cron-driven, not user |

| 236 | aws cloudwatch put-metric-data | 🟢 low |

| 626 | find in Claude temp dir | 🟡 directory walks |

| 587 | python -c "..." inline | 🟡–🔴 inline blobs replay in transcript |

1.2 Session-level — long sessions cost super-linearly

| Session prefix | Tool calls | Notes |

|---|---|---|

| 117f3fc2 (today) | 218 | This session. Single longest in the 7-day window. |

| fad42c71 | 49 | |

| b074c4f1 | 39 | |

| Top-10 average | ~50 | Long sessions ≈ 4-5× token cost vs. typical |

A session with 200+ tool calls pays for the full transcript replay on every turn. Token cost grows quadratically with session length because each new turn's context includes all prior tool I/O.

1.3 Compaction frequency

7 days × 1–2 compactions/day = ~12 compaction events. Each compaction means we hit a hard token ceiling — direct evidence of waste, not feature richness.

---

2. Seven levers to cut token usage

Ranked by impact × effort.

Lever #1 — Stop tailing `inbox-queue.jsonl` from agent context (🔴 critical, S effort)

Problem: Agents call tail -50 F:/TITAN/state/inbox-queue.jsonl 88×/day on average. The file is 547KB and growing. Each tail drags ~30KB into context (~7.5k tokens). Annual cost at current rate: ~240M tokens, ~$720/yr in input alone at Sonnet 4.6 rates.

Fix: Add python F:/TITAN/scripts/inbox_peek.py returning ONLY the last N summarized entries (id + 80-char title + status). Replaces tail -N. ~95% size reduction.

Then: Add a hook (PreToolUse) that intercepts any tail|cat|head of inbox-queue.jsonl from Bash and rewrites it to inbox_peek.py. Self-enforcing.

Estimated savings: ~3.2M tokens/week ≈ $36/month.

Lever #2 — Default research to Perplexity, not WebSearch+WebFetch (🔴 critical, DONE today)

Problem: WebSearch (372 calls / 7d) returns ~10 raw result snippets per call. Agents then run WebFetch (168 calls) to read the most promising ones in full. A typical research chain = 1 WebSearch + 3 WebFetches = ~12k tokens of raw HTML/snippet noise into context, of which maybe 500 tokens are actually useful.

Fix (already shipped):

New CLI: python F:/TITAN/scripts/pplx.py "<query>" — single round-trip, returns synthesized answer + citations only (~1k tokens vs. ~12k).
~/.claude/agents/scout.md rewritten to default to Perplexity; WebSearch/WebFetch reserved for fallback only.

Estimated savings: ~1.5M tokens/week ≈ $17/month.

Cash cost added: ~$0.40/day in Perplexity API fees. Net win: ~$15/month.

Lever #3 — Migrate batchable crons to Anthropic Batch API (🟡 medium, see companion memo)

See SCOUT-BATCH-API-APPLIED-TO-TITAN-2026-05-02.md. Estimated savings: ~$55/month at current cron volume; scales to ~$275/month at 5× growth.

Lever #4 — Cap session length aggressively, force handoff via memos (🔴 critical, M effort)

Problem: Single sessions hit 218 tool calls (today's). Compaction fires when context fills, dumping summary back in. Each compaction event itself burns ~10k tokens. Long sessions are quadratically expensive vs. short ones.

Fix:

1. Set a soft cap of 50 tool calls per session. After that, the agent should write a SCOUT/FORGE handoff memo to plans/handoffs/<session>-<topic>.md and recommend a new session.

2. A new PreToolUse hook counts tool calls per session_id and injects a one-line system reminder at call 40 ("approaching session limit, consider handoff").

3. The /pulse skill displays sessions over 50 calls in red.

Estimated savings: 20-30% of total token spend by avoiding long-session quadratic blowup. Hardest to measure but biggest single lever.

Lever #5 — Replace `python -c "..."` inline blobs with named scripts (🟡 medium, S effort)

Problem: 587 inline python -c calls in 7 days. The full Python source goes into the Bash command — and stays in the transcript forever. A 50-line inline script replayed across 30 turns ≈ 30× the tokens of just calling python F:/TITAN/scripts/foo.py.

Fix: Establish convention — anything > 5 lines becomes a named script in F:/TITAN/scripts/ and is python F:/TITAN/scripts/foo.py args. Add a brief check to the agentic-247-watchdog that flags new inline blobs.

Estimated savings: ~700k tokens/week.

Lever #6 — Add `bash_summary` hook for high-cost commands (🟡 medium, M effort)

Problem: Commands like git log, ls -la <dir>, cat <file> return arbitrarily large output that dumps into context. Most of the time the agent only needs the first 20 lines.

Fix: PreToolUse hook intercepts Bash calls matching git log|cat|ls -la|find without | head and auto-pipes through head -50 unless the agent explicitly opted out (e.g. # bash_summary:full). Same trick the harness already does for grep results.

Estimated savings: 5-10% of Bash token cost ≈ ~500k tokens/week.

Lever #7 — Prompt-cache the system prompt + agent definitions (🟢 low, M effort)

Problem: TITAN's six agents each load their full markdown definition on every spawn. The CLAUDE.md file (your global ops contract) loads on every session. Sonnet 4.6 has 1h prompt cache beta — these are textbook cache candidates.

Fix: When TITAN scripts spawn agents via Anthropic SDK directly, mark the agent definition + CLAUDE.md as cache_control: ephemeral with 1h TTL. Subsequent agent spawns within 1h hit cache (10% of input cost).

Caveat: Only applies to direct-API spawns from cron scripts. Doesn't help interactive Claude Code sessions (the harness manages caching itself).

Estimated savings: ~$8/month when batch crons are also migrated.

---

3. Combined-impact projection

|---|---|---|---|

| #1 inbox-peek | TO SHIP | 3.2M | $36 |

| #5 Inline-python kill | TO SHIP | 700k | $7 |

| #6 bash_summary hook | TO SHIP | 500k | $5 |

At current spend (~$140/mo Anthropic + ~$15/mo Perplexity), this is roughly a ~50% reduction. At 5× volume the same levers save ~$700/month.

---

4. Recommendation

Ship in two slices:

Slice A (this week):

✅ #2 Perplexity-first SCOUT (done today)
#1 inbox-peek script + hook
#4 Session-length cap (soft warning at 40 calls, injected reminder at 50)
#6 bash_summary hook

Slice B (next sprint):

#3 Batch API migration (per companion memo)
#5 Inline-python kill (convention + watchdog flag)
#7 Prompt-cache crons (after #3 lands)

---

5. Open questions

1. #4 friction risk: Will 50-call cap be too aggressive for legitimate long-running tasks (like today's Innerverse A009/A010/A018 ship)? Recommendation: soft warning only at first; harden to a hard cap after 2 weeks of telemetry.

2. #1 hook scope: Apply the inbox-peek rewrite globally or just to non-bridge sessions? Recommendation: globally — bridge already uses its own API path, never tails the file.

3. #6 false positives: git log is sometimes genuinely needed in full. Mitigation: explicit # bash_summary:full override for opt-out.

---

6. References

F:/TITAN/metrics/tools-*.tsv — 7-day raw metric source
F:/TITAN/logs/compaction-*.jsonl — compaction event log
F:/TITAN/state/inbox-queue.jsonl — top offender file
Companion: SCOUT-BATCH-API-APPLIED-TO-TITAN-2026-05-02.md
Companion: DARWIN-MODEL-TIERING-PROPOSAL-v1-2026-04-21.md (model selection layer)

---

— SCOUT, 2026-05-02. The fastest token to save is the one that was never sent.