Memo ID: SCOUT-TOKEN-AUDIT-AND-REDUCTION-2026-05-02
Type: Telemetry audit + applied reduction proposal
Author: SCOUT (TITAN research arm)
Date: 2026-05-02
Status: READY FOR DARWIN REVIEW → FORGE EXECUTION
Companion: SCOUT-BATCH-API-APPLIED-TO-TITAN-2026-05-02.md (batch is one of seven levers)
---
| | |
|---|---|
| What I measured | 7 days of F:/TITAN/metrics/tools-*.tsv — every tool call by every Claude session, plus compaction logs and state-file sizes. |
| What's burning tokens | 81% of all tool calls are Bash, and the top Bash patterns are repeated polling reads of growing JSONL state files (especially inbox-queue.jsonl). |
| The single biggest waste | tail -N F:/TITAN/state/inbox-queue.jsonl fired 619 times in 7 days. File is now 547KB / 775 rows. Estimated token cost: ~4.6M input tokens/week just from this one polling pattern. |
| Headline reduction | The 7 levers below cut estimated agent input tokens by ~55–70% at unchanged feature parity. Lever #1 alone cuts ~30%. |
| Decision | Ship Levers #1 + #2 + #4 + #6 this week. Levers #3, #5, #7 next sprint. |
---
Total tool calls: 14,263 across all Claude sessions in 7 days.
| Tool | Calls | % | Avg context cost per call (est.) |
|---|---|---|---|
| Bash | 11,562 | 81.1% | High — output goes into context |
| Read | 1,087 | 7.6% | High — full file content |
| WebSearch | 372 | 2.6% | High — ~3k tokens of result snippets |
| Edit | 326 | 2.3% | Low — diff only |
| Grep | 313 | 2.2% | Low–Med — pattern matches |
| Write | 238 | 1.7% | Low — outbound only |
| Glob | 192 | 1.3% | Low — file paths only |
| WebFetch | 168 | 1.2% | Very high — full HTML |
Conclusion: Bash + Read + WebSearch + WebFetch = 92% of calls and ≥95% of context-token cost. These are the four levers that matter.
| Count | Pattern | Token risk |
|---|---|---|
| 619 | tail -N F:/TITAN/state/inbox-queue.jsonl | 🔴 critical — 30KB output per call × 619 |
| 493 | ls -la F:/TITAN/state/inbox-*.jsonl | 🟡 modest — file listing |
| 460 | ls F:/TITAN/state/bridge-replies/ (and friends) | 🟡 modest |
| 399 | curl -s -o ... | 🟢 low (output redirected) |
| 366 | swarm-health log appends | 🟢 low |
| 363 | cat *.jsonl (full file cat) | 🔴 critical — entire file into context |
| 288 | silentinfinity_chat_smoke runner | 🟡 cron-driven, not user |
| 236 | aws cloudwatch put-metric-data | 🟢 low |
| 626 | find in Claude temp dir | 🟡 directory walks |
| 587 | python -c "..." inline | 🟡–🔴 inline blobs replay in transcript |
| Session prefix | Tool calls | Notes |
|---|---|---|
| 117f3fc2 (today) | 218 | This session. Single longest in the 7-day window. |
| fad42c71 | 49 | |
| b074c4f1 | 39 | |
| Top-10 average | ~50 | Long sessions ≈ 4-5× token cost vs. typical |
A session with 200+ tool calls pays for the full transcript replay on every turn. Token cost grows quadratically with session length because each new turn's context includes all prior tool I/O.
7 days × 1–2 compactions/day = ~12 compaction events. Each compaction means we hit a hard token ceiling — direct evidence of waste, not feature richness.
---
Ranked by impact × effort.
inbox-queue.jsonl from agent context (🔴 critical, S effort)Problem: Agents call tail -50 F:/TITAN/state/inbox-queue.jsonl 88×/day on average. The file is 547KB and growing. Each tail drags ~30KB into context (~7.5k tokens). Annual cost at current rate: ~240M tokens, ~$720/yr in input alone at Sonnet 4.6 rates.
Fix: Add python F:/TITAN/scripts/inbox_peek.py returning ONLY the last N summarized entries (id + 80-char title + status). Replaces tail -N. ~95% size reduction.
Then: Add a hook (PreToolUse) that intercepts any tail|cat|head of inbox-queue.jsonl from Bash and rewrites it to inbox_peek.py. Self-enforcing.
Estimated savings: ~3.2M tokens/week ≈ $36/month.
Problem: WebSearch (372 calls / 7d) returns ~10 raw result snippets per call. Agents then run WebFetch (168 calls) to read the most promising ones in full. A typical research chain = 1 WebSearch + 3 WebFetches = ~12k tokens of raw HTML/snippet noise into context, of which maybe 500 tokens are actually useful.
Fix (already shipped):
python F:/TITAN/scripts/pplx.py "<query>" — single round-trip, returns synthesized answer + citations only (~1k tokens vs. ~12k).~/.claude/agents/scout.md rewritten to default to Perplexity; WebSearch/WebFetch reserved for fallback only.Estimated savings: ~1.5M tokens/week ≈ $17/month.
Cash cost added: ~$0.40/day in Perplexity API fees. Net win: ~$15/month.
See SCOUT-BATCH-API-APPLIED-TO-TITAN-2026-05-02.md. Estimated savings: ~$55/month at current cron volume; scales to ~$275/month at 5× growth.
Problem: Single sessions hit 218 tool calls (today's). Compaction fires when context fills, dumping summary back in. Each compaction event itself burns ~10k tokens. Long sessions are quadratically expensive vs. short ones.
Fix:
1. Set a soft cap of 50 tool calls per session. After that, the agent should write a SCOUT/FORGE handoff memo to plans/handoffs/<session>-<topic>.md and recommend a new session.
2. A new PreToolUse hook counts tool calls per session_id and injects a one-line system reminder at call 40 ("approaching session limit, consider handoff").
3. The /pulse skill displays sessions over 50 calls in red.
Estimated savings: 20-30% of total token spend by avoiding long-session quadratic blowup. Hardest to measure but biggest single lever.
python -c "..." inline blobs with named scripts (🟡 medium, S effort)Problem: 587 inline python -c calls in 7 days. The full Python source goes into the Bash command — and stays in the transcript forever. A 50-line inline script replayed across 30 turns ≈ 30× the tokens of just calling python F:/TITAN/scripts/foo.py.
Fix: Establish convention — anything > 5 lines becomes a named script in F:/TITAN/scripts/ and is python F:/TITAN/scripts/foo.py args. Add a brief check to the agentic-247-watchdog that flags new inline blobs.
Estimated savings: ~700k tokens/week.
bash_summary hook for high-cost commands (🟡 medium, M effort)Problem: Commands like git log, ls -la <dir>, cat <file> return arbitrarily large output that dumps into context. Most of the time the agent only needs the first 20 lines.
Fix: PreToolUse hook intercepts Bash calls matching git log|cat|ls -la|find without | head and auto-pipes through head -50 unless the agent explicitly opted out (e.g. # bash_summary:full). Same trick the harness already does for grep results.
Estimated savings: 5-10% of Bash token cost ≈ ~500k tokens/week.
Problem: TITAN's six agents each load their full markdown definition on every spawn. The CLAUDE.md file (your global ops contract) loads on every session. Sonnet 4.6 has 1h prompt cache beta — these are textbook cache candidates.
Fix: When TITAN scripts spawn agents via Anthropic SDK directly, mark the agent definition + CLAUDE.md as cache_control: ephemeral with 1h TTL. Subsequent agent spawns within 1h hit cache (10% of input cost).
Caveat: Only applies to direct-API spawns from cron scripts. Doesn't help interactive Claude Code sessions (the harness manages caching itself).
Estimated savings: ~$8/month when batch crons are also migrated.
---
| Lever | Status | Weekly token savings (est.) | Monthly $ savings (est.) |
|---|---|---|---|
| #1 inbox-peek | TO SHIP | 3.2M | $36 |
| #2 Perplexity-first | ✅ SHIPPED today | 1.5M | $15 (net of pplx fees) |
| #3 Batch API migration | TO SHIP per companion memo | (volume shift, not direct) | $55 |
| #4 Session-length cap | TO SHIP | 20-30% of remaining → ~4M | $35–$50 |
| #5 Inline-python kill | TO SHIP | 700k | $7 |
| #6 bash_summary hook | TO SHIP | 500k | $5 |
| #7 Prompt-cache crons | TO SHIP (after #3) | n/a (cache, not cut) | $8 |
| Total | | ~10M tokens/week | ~$160/month |
At current spend (~$140/mo Anthropic + ~$15/mo Perplexity), this is roughly a ~50% reduction. At 5× volume the same levers save ~$700/month.
---
Ship in two slices:
Slice A (this week):
Slice B (next sprint):
---
1. #4 friction risk: Will 50-call cap be too aggressive for legitimate long-running tasks (like today's Innerverse A009/A010/A018 ship)? Recommendation: soft warning only at first; harden to a hard cap after 2 weeks of telemetry.
2. #1 hook scope: Apply the inbox-peek rewrite globally or just to non-bridge sessions? Recommendation: globally — bridge already uses its own API path, never tails the file.
3. #6 false positives: git log is sometimes genuinely needed in full. Mitigation: explicit # bash_summary:full override for opt-out.
---
F:/TITAN/metrics/tools-*.tsv — 7-day raw metric sourceF:/TITAN/logs/compaction-*.jsonl — compaction event logF:/TITAN/state/inbox-queue.jsonl — top offender fileSCOUT-BATCH-API-APPLIED-TO-TITAN-2026-05-02.mdDARWIN-MODEL-TIERING-PROPOSAL-v1-2026-04-21.md (model selection layer)---
— SCOUT, 2026-05-02. The fastest token to save is the one that was never sent.