name: Token-First Compute (Prime Directive)

Rule: The prime directive is minimize token usage, not minimize cloud spend.

Coding, processing, file work, scripts, scheduled jobs → run locally on Harnoor's machine. Free.
Hosting, deployment, persistent services → run on AWS / Lambda / S3 / CloudFront as usual. Cloud spend is acceptable.
The trade-off being optimized: every token billed against Claude Code Max plan or Anthropic API. Local compute = $0 token cost. Cloud compute = pennies and unconstrained.

Why: Harnoor burnt 53% of his Claude Code credits in 2 days (2026-05-03). Routine pattern-matching, file ops, log scans, syntax checks, deploys, and template rendering don't need LLM reasoning — they need shell commands and Python scripts. Burning Max-plan tokens on grep is waste. Cloud hosting on the other hand is cheap and well-managed; no need to penny-pinch S3 or Lambda.

How to apply — before every agent spawn or LLM call, ask in this order:

1. Can Grep / Glob / Read answer it? → use those tools directly, no agent.

2. Can a local Python / Node / PowerShell / bash script do it? → write the script, run it locally.

3. Is this recurring? → schedule it as a local cron / Windows Task / mcp__scheduled-tasks__create_scheduled_task running a local script. The cron itself shouldn't invoke Claude unless the task genuinely needs prose.

4. Can cached / prior work answer it? → check F:/TITAN/knowledge/, recent session outputs, advisor memos.

5. Does this genuinely need LLM reasoning across ambiguous content? → THEN spawn an agent. Prefer Haiku 4.5 → Sonnet → Opus in cost order. Use Bedrock prompt caching and batch APIs where applicable.

Hosting / deployment is NOT covered by this rule — deploy whatever, wherever, on AWS or whatever cloud. The directive only applies to LLM TOKEN consumption during the work that produces the deploy.

Examples — token-cheap routes:

| Task | Bad (tokens) | Good (local CPU) |

|---|---|---|

| File inventory | scout/explore agent | Glob "*/.html" directly |

| Single-file edit | forge agent | Edit tool directly |

| Recurring digest | LLM each run | Python + Jinja2 + cron, LLM only for narrative bits |

| Syntax check | agent reads | node --check, python -m py_compile |

| Data transform / format conversion | batch LLM | local pandas / Python / pandoc |

| "Find all X in directory" | scout agent | Grep pattern |

| Watchdog audit | Sonnet agent | Python script reading logs (saves $12-14/mo) |

| Newsletter HTML assembly | LLM per article | Jinja2 template + structured staging data; LLM only for hero prose |

| Deploy a Lambda | agent | Local PowerShell + AWS CLI |

| Linting / formatting / tests | agent reads | ruff, black, eslint, pytest |

| AWS hosting / S3 / CloudFront | n/a | go ahead — cloud spend is fine |

When agents ARE the right call (LLM is the right tool):

Brand-aligned creative writing (newsletters, marketing copy, persona-driven content)
Cross-file architectural reasoning where the answer's structure isn't known
Voice/tone-matching content (Innerverse mirror, Cloud 8 academy lessons)
Open-ended research with synthesis across many sources

When agents ARE needed, prefer:

Haiku 4.5 over Sonnet/Opus for routine reasoning
Bedrock prompt caching (cache_point on system prompts)
Batch API for non-realtime work (50% off via titan-master-batch-nightly)
Single comprehensive agent over multiple narrow ones

Avoid:

Spawning multiple agents for what one well-briefed agent can do
Spawning a "research" agent when web search + a single Read would suffice
Re-spawning the same agent because the prompt was unclear (debug locally, give the next agent better context once)
Using the LLM as a calculator, dictionary, code formatter, regex engine, JSON parser, or shell wrapper

Cloud is not the enemy. Tokens are.