ALL MEMOS Download .docx

LOCAL-FIRST-COMPUTE STUDY — 2026-05-03

> Goal: maximize TITAN capability per dollar. Local compute is free. Every LLM call is a tax — pay it only when the task genuinely needs reasoning.

---

Pricing Reality

Sourced from token_cost_estimator.py (internal, May 2026) + Perplexity sonar cross-check.

| Model | Input $/MTok | Output $/MTok | Batch (50% off) |

|---|---|---|---|

| Haiku 4.5 | $0.80 | $4.00 | $0.40 / $2.00 |

| Sonnet 4.6 | $3.00 | $15.00 | $1.50 / $7.50 |

| Opus 4.7 | $15.00 | $75.00 | $7.50 / $37.50 |

| Prompt cache read | 10% of input rate | — | — |

Claude Code Max plan: $200/mo flat. API calls (Bedrock, direct Anthropic) bill separately on top.

Per-invocation cost estimate (typical TITAN agent call)

Assumptions: 8K tokens in (context + skill + memory) + 1.5K tokens out, Sonnet 4.6, no cache hit.

With prompt caching (hot context reused): input drops to 10%, so ~$0.026/call.

Heavy calls (SCOUT deep research, 15K in / 3K out): ~$0.09.

Running 5 scheduled agent tasks/day = ~$0.25/day = ~$7.50/month in API spend. This is real money at scale; more importantly it's wasted latency and context slots.

---

Decision Framework


Is this a RECURRING / SCHEDULED task?
├── YES → Can it run as a local script (file reads, regex, template fill, log aggregation)?
│         ├── YES → Local script + cron. No LLM. (Save $0.05–$0.09/run × N runs/day)
│         └── NO (needs prose, judgment, or ambiguity resolution) →
│               Is the output structure KNOWN IN ADVANCE?
│               ├── YES → Haiku ($0.01/call) or local Jinja2 template
│               └── NO → Sonnet (or Opus only for cross-file refactor / empathic persona)
└── NO (on-demand, exploratory, complex) →
      Does it need PERSONA / VOICE / EMPATHY?
      ├── YES → LLM required. Haiku for structured persona; Sonnet for nuanced voice.
      └── NO → Does it need AMBIGUOUS REASONING across files?
               ├── YES → Sonnet (or Opus if architecture-level)
               └── NO → Local script. grep/glob/python. Always.

Price-per-decision-class

| Task class | Right tool | Typical cost/run |

|---|---|---|

| File ops, grep, aggregation | Local Python/bash | $0 |

| Template rendering (HTML email, reports) | Jinja2 local | $0 |

| Log parsing + structured digest | Local Python | $0 |

| Regex NLP (extract/dedupe/stats) | Local Python | $0 |

| Lint / format / test runner | Local Python/pre-commit | $0 |

| Structured summarization (known schema) | Haiku batch | $0.003 |

| Persona / voice-matched writing | Haiku or Sonnet | $0.01–$0.05 |

| Cross-file ambiguous reasoning | Sonnet | $0.05–$0.09 |

| Novel architecture / creative strategy | Opus | $0.20–$0.50 |

---

TITAN Apply Checklist — 5 Concrete Switches

1. nightly-report-writer — strip LLM from data aggregation

Current pattern: Full Sonnet call to read 10 source files, aggregate metrics, write a report. Cost ~$0.06/night = $1.80/month. Latency: 2–4 min.

Local replacement: Write F:/TITAN/scripts/nightly_report_builder.py. It reads events.jsonl, watchdog.log, llm-costs.jsonl, CloudWatch (via boto3), and swarm-orchestrator.log — all structured data. Outputs the report by filling a Jinja2 template. LLM invoked only if a RED watchdog flag is present and a plain-English explanation is needed (Haiku, <500 tokens).

Schedule: Keep existing 5:00 AM cron. Replace the Claude Code task body with python F:/TITAN/scripts/nightly_report_builder.py.

Estimated savings: $1.80/month + faster (20s vs 3min).

---

2. agentic-247-watchdog — pure local health checks

Current pattern: Claude Code agent spawned every 30 min to run grep/read checks and write a log line. Even at Haiku this is 48 Claude Code invocations/day. Cost: ~$0.01 × 48 = $0.48/day = $14.40/month.

Local replacement: F:/TITAN/scripts/watchdog_check.py — pure Python: reads log files, does regex checks on secrets patterns, calls netstat, checks lastRunAt via the scheduled-tasks MCP JSON file, sums cost logs. Writes one JSON line to watchdog-structured.jsonl. Only spawns a Claude Code agent (Haiku) when status = RED to compose the alert email.

Schedule: Windows Task Scheduler every 30 min: python F:/TITAN/scripts/watchdog_check.py.

Estimated savings: ~$12–$14/month (eliminates 95% of invocations; RED alerts are rare).

---

3. claude-code-audit-every-6h — already batched, push to daily + cache

Current pattern: Already downgraded to daily batch (good). But the SCOUT prompt is ~4K tokens of instructions re-sent cold every time. No prompt caching.

Local augmentation: Add a preflight_audit_check.py that runs before the batch job: (a) checks if npm view @anthropic-ai/claude-code version differs from the last audited version stored in F:/TITAN/state/cc-last-audited-version.txt; (b) if no version change, SKIP the LLM call entirely and write "no change" to the audit log. Most days the version hasn't changed — this eliminates ~80% of audit calls.

Prompt cache the system prompt: The 4K-token skill instruction block is identical every run. Use the Anthropic batch API with cache_control: ephemeral on the system block. Cache read = $0.0003 vs $0.012 per call.

Estimated savings: ~$0.50/month (small base, but pattern applies everywhere).

---

4. /feed (ORACLE intel ingestion) — local RSS before Perplexity

Current pattern: Every /feed run starts with Perplexity calls for all default topics regardless of whether anything has changed. ~10 Perplexity queries/run × $0.01 = $0.10/run. If run daily: $3/month.

Local-first augmentation: Write F:/TITAN/scripts/rss_check.py — fetches known RSS feeds (Anthropic blog, Claude Code releases GitHub atom, AWS ML blog) with feedparser. Writes new-item digests to staging without any LLM. ORACLE's Perplexity/WebSearch calls then fire only for: (a) curiosity queue items, (b) topics where RSS reported a new entry. Reduces Perplexity queries by ~60%.

Estimated savings: ~$1.80/month + ORACLE runs faster with smaller scope.

---

5. /daily-summary and /briefing — file reads, not agent calls

Current pattern: Both skills spin up a Sonnet agent whose primary work is: read some files, count lines, extract sections. These are file operations, not reasoning tasks.

Local replacement: F:/TITAN/scripts/daily_summary_builder.py — reads events.jsonl, plan-review files, journals, staging count, MEMORY.md line count. Outputs the structured summary as Markdown. No LLM needed for the data-gathering pass. Only if "mood analysis" or "next priorities" require judgment: pipe the raw data into a single Haiku call with structured output (JSON), < 500 tokens in.

The /briefing skill is already marked read-only — convert it to run the local script first, display the structured part, then optionally invoke Haiku for the "TODAY'S FOCUS" sentence only.

Estimated savings: $0.05/call × ~30 calls/month = $1.50/month. Faster startup (no agent spin-up).

---

Monthly savings summary

| Switch | $/month saved |

|---|---|

| nightly-report-writer local | $1.80 |

| watchdog pure-local | $12–14 |

| audit preflight skip | $0.50 |

| RSS before Perplexity | $1.80 |

| daily-summary / briefing local | $1.50 |

| Total | ~$17–20/month |

At current TITAN scale that's a 40–60% reduction in discretionary API spend while keeping all intelligent outputs intact.

---

Hook Recommendation — local-first-preflight

Add a PreToolUse hook (alongside titan-token-saver.py) named titan-local-first.py. Logic:


# Intercept Bash tool calls that match known "dumb data fetch" patterns.
# If the call is: reading a .jsonl / .log / .md file for summarization,
# check if a local summarizer script exists for that file.
# If yes: soft-block (exit 2) with the local alternative command.

LOCAL_ALTERNATIVES = {
    r"F:/TITAN/state/inbox-queue\.jsonl": "python F:/TITAN/scripts/inbox_peek.py",
    r"F:/TITAN/logs/events\.jsonl":       "python F:/TITAN/scripts/event_summary.py",
    r"F:/TITAN/logs/llm-costs\.jsonl":    "python F:/TITAN/scripts/cost_summary.py",
    r"watchdog\.log":                     "python F:/TITAN/scripts/watchdog_check.py --report",
}

The orchestrator sees the soft-block message, switches to the local script, and gets a pre-digested summary (100–500 tokens) instead of dumping a raw file (5,000–50,000 tokens) into context. This alone can cut 30–50% of input tokens on information-gathering passes.

Install path: C:/Users/Harnoor/.claude/hooks/titan-local-first.py (use titan_skill_writer.py to write it — bypass the approval bug).

---

Open Questions

1. Haiku 4.5 quality ceiling — some ORACLE summaries need voice-matching for The Agent Stack newsletter. Need a 10-call A/B: Haiku vs Sonnet for newsletter hero story. If Haiku quality is acceptable, newsletter cost drops from $0.05 to $0.01/run.

2. Prompt caching ROI — TITAN's scheduled tasks have stable system prompts (skill SKILL.md = same every run). Measuring actual cache hit rates requires instrumenting the API response headers. Add cache_hit: bool field to batch-cost-log.jsonl in the next batch job update.

3. Watchdog local script maintenance — the Python watchdog must be kept in sync with new TITAN state files as they're added. Consider a watchdog_manifest.json that lists monitored paths so the script doesn't need code changes for new files.

4. Claude Code Max plan vs API spend — Max plan ($200/mo) covers Claude.ai + Claude Code interactive sessions. The API-billed calls (Bedrock, direct Anthropic) are additive. The framework above reduces API spend; Max plan is sunk cost. Confirm which scheduled tasks hit Bedrock vs Claude Code Max to correctly attribute savings.