TITAN Credit Audit — 2026-05-09

Window: trailing 7 days (2026-05-02 → 2026-05-09)

Generated by: FORGE

Source of truth: ~/.claude/projects/*.jsonl via F:/TITAN/scripts/session_token_audit.py

---

1. Executive Summary

7-day notional Bedrock-equivalent burn: ~$5,626. This is what the same Claude Code traffic would cost at public Bedrock prices. Actual Max-plan dollars are flat $200/mo with rate-cap throttling — but the rate caps are what Harnoor is hitting, and the ratios below tell us where quota is going.
One project consumes 100% of session tokens: C:/Users/Harnoor/Desktop/Trillionair Trillionaire Trillionaire. Every single one of the 17,341 assistant turns over 7 days came from this directory. There is no diversification — TITAN runs entirely from one cwd.
Opus 4.7 = 99.2% of cost ($5,581 of $5,626). Sonnet $40, Haiku $0.94. Model-tiering is essentially absent — Opus is the default for everything from "what time is it" to "rebuild the swarm."

2. By-provider breakdown

|---|---|---|---|

| Anthropic (Claude Code) | $5,625.63 notional | ✅ Excellent (session_token_audit.py) | All Opus, one project, 100% cache-hit ratio (caching works — burn is from sheer call volume + cache replays + output tokens) |

| AWS Bedrock (direct) | ~$0 | ✅ Code search | Only 3 scripts reference bedrock-runtime (innerverse_apps_nightly, lambda_innerverse_nightly, titan_bridge). No daily cron uses Bedrock for LLM calls. |

Finding F-1 (red flag #1): external-spend.jsonl and llm-costs.jsonl publishers are stale 19–20 days (per nightly report 2026-05-09, line 36–37). TITAN has no live ingestion of Anthropic billing dashboard, OpenAI usage, or ElevenLabs character counts. Anything not in the Claude Code session JSONL is dark.

3. Today's 24-hour spike (2026-05-09)

$345.01 notional across 1,097 assistant turns, all Opus, one session-tree.
76,386 raw input tokens, 710,516 output tokens, 114M cache reads, 7.3M cache creations.
Cache hit ratio 100% — caching is doing its job; cost is volume × output × Opus rate.

What ran today: 5 MANIFEST forge agents, 5 GAUNTLET, 5 SADHANA (+2 replacements), 5 Innerverse upgrades, 3 newsletter scripts, project registry agent, orphan-distro cleanup, tasks dashboard agent, this credit audit. ~26 sub-agents in one day. Each sub-agent is its own Opus session at ~$0.30–$0.60/turn × 30–60 turns = $10–$30 per agent. 26 agents × $20 average ≈ $520, which lines up directionally with the $345 measured (cache reduces it).

The spike vs the trend: May 7 = $1,525 (peak), May 3 = $1,335, May 5 = $1,024, May 4 = $826. Today's $345 is actually below the recent average. The "burn rate problem" Harnoor noticed is the last 7 days have averaged $804/day notional — this is the new normal, not a one-off.

4. Top 5 Recommendations (priority-ordered)

R1. Hard model-tiering for sub-agents — Sonnet by default, Opus on demand.

Estimated savings: 60–75% of forge/scout sub-agent burn = ~$2,500–$3,500/wk notional, which translates to materially fewer Max rate-cap throttles.
How: Update agent definitions in ~/.claude/agents/ so SCOUT, FORGE (small), GUIDE, VAULT, ORACLE specify model: claude-sonnet-4-7 unless the parent explicitly passes --model opus. Reserve Opus for FORGE-on-codebase and DARWIN.
Why this is #1: 99.2% Opus share with 100% cache hits means most calls are simple turns where Sonnet would deliver identical output at 20% of the cost.

R2. Fix the publisher gap on `external-spend.jsonl` / `llm-costs.jsonl`.

Estimated savings: $0 directly — but eliminates a 20-day blind spot. Without this you can't see OpenAI or ElevenLabs spend at all.
How: Find the publisher script (likely F:/TITAN/scripts/llm_cost_publisher.py or similar), restart the cron, validate freshness in next nightly. If the publisher doesn't exist, add: image gen logging in vision app, voice gen logging in any ElevenLabs path.

R3. Cap parallel sub-agent fan-out per session.

Estimated savings: ~30% of spike-day cost = ~$300/day on heavy days.
How: Add a soft-warn hook (extend titan-token-saver.py) at >5 sub-agents in a 30-min window: "you've spawned N agents; consider batching or sequencing." Today's 5+5+5+5 fan-out is the textbook spike pattern.

R4. Throttle / kill scheduled tasks that are noisy or duplicative.

See sweep table below. Several scheduled tasks run via claude CLI (Opus) when a pure-Python script would suffice. claude-code-audit-every-6h and swarm-health-orchestrator are the suspects.
Estimated savings: 5–15% of weekly burn.

R5. One-prompt-per-agent rule for routine tasks.

Estimated savings: Hard to quantify, but cuts sub-agent context bloat.
How: Memo into the agent definitions: "for read-only / inventory tasks, finish in one assistant turn; do not iterate." Many of today's agents took 30–60 turns to produce a memo that needed 5.

5. Sweep table — every recurring cost surface

|---|---|---|---|---|

6. Three Biggest Red Flags

1. No model tiering. Opus = 99.2% of 7-day burn ($5,581 of $5,626). Even cron-scheduled audits and routine inventory agents are running on Opus. This is the single biggest fixable lever — switching the default for sub-agents to Sonnet would cut weekly notional burn by ~70%.

2. Spend visibility is broken outside Claude Code. external-spend.jsonl and llm-costs.jsonl are 20 days stale. OpenAI image gen, ElevenLabs voice, and any direct Bedrock calls are completely untracked. We are flying blind on every provider except Anthropic-via-Claude-Code and Perplexity.

3. The spike isn't a spike — it's the new baseline. 7-day daily average = $804 notional. Today's $345 is below average. May 7 hit $1,525 in a single day. The pattern is "Harnoor opens claude → spawns 5+ sub-agents → repeat 3x" and it has been running this way for at least a week. Without R1+R3 above, this trajectory continues.

---

Memo author: FORGE

Source script: F:/TITAN/scripts/session_token_audit.py --window 7d

Path: F:/TITAN/plans/TITAN-CREDIT-AUDIT-2026-05-09.md