ALL MEMOS Download .docx

SCOUT · Anthropic Batch API — Applied to TITAN

Memo ID: SCOUT-BATCH-API-APPLIED-TO-TITAN-2026-05-02

Type: Applied research / implementation proposal

Author: SCOUT (TITAN research arm)

Date: 2026-05-02

Status: READY FOR DARWIN REVIEW → FORGE EXECUTION

Predecessor research:

---

0. Executive summary

| | |

|---|---|

| Headline | Migrate ~14 of TITAN's 35 scheduled tasks to Batch API. Projected savings $48–$132/month at current volume; $320–$880/month if volume 5x's per growth plan. |

| Effort | ~6 engineering hours (one shared titan_batch.py helper + per-skill 10-line patches) |

| Risk | Low. Batch is async — no UX impact. Fallback to real-time on batch failure. |

| Decision | GO. Ship Tier-1 (3 highest-leverage crons) this week. |

Why now:

1. We already have ≥14 cron jobs that hit Claude with 30-min-to-24h latency tolerance. They are paying full price for sync inference they don't need.

2. Anthropic's Batch API gives 50% off input + output. Stacks with prompt caching (cache_read at 10% of input) where we can preserve cache TTL.

3. The 300k max_tokens beta (output-300k-2026-03-24) unblocks multi-document synthesis we currently chunk around.

4. Innerverse's TC7 batch synthesis (already designed in DARWIN's tiering proposal) needs the same helper — building once amortizes both.

---

1. What Batch API actually is (recap, brief)

(Full mechanics live in DARWIN's tiering proposal §2.4 and the two market-intel memos. Recapped here for self-contained read.)

---

2. TITAN application surface — what migrates

I scanned every scheduled task in C:/Users/Harnoor/.claude/scheduled-tasks/ (35 total) and every skill in F:/TITAN/skills/ (25 total). Each was scored on three dimensions:

2.1 Tier 1 — ship this week (3 jobs, ~70% of savings)

| Job | Path | Frequency | Est. tokens/run | Latency tier | Effort | Notes |

|---|---|---|---|---|---|---|

| titan-daily-feed | skills/feed/SKILL.md | daily | ~25k in / ~8k out | Soft (overnight) | S | Already async in spirit. Classification of fetched articles is the batch-able work — Perplexity stays sync. |

| titan-daily-monologue | skills/monologue/SKILL.md | daily | ~40k in / ~3k out | Hard-batch | S | Self-reflection prose. No user blocks on this. |

| titan-weekly-benchmark | scheduled-tasks/titan-weekly-benchmark | weekly | ~120k in / ~60k out | Hard-batch | S | Eval runs across model tiers — textbook batch use case. |

Tier-1 monthly token estimate:

Tier-1 monthly savings (Sonnet 4.6 pricing, $3/$15 → $1.50/$7.50 in batch):

That number is small because TITAN's current Claude spend is small. The leverage shows up in Tiers 2–3.

2.2 Tier 2 — next 2 weeks (8 jobs, infrastructure)

| Job | Frequency | Est. tokens/run | Notes |

|---|---|---|---|

| titan-daily-improve | daily | ~30k in / ~10k out | Code-change scan |

| titan-daily-newsletter | daily | ~50k in / ~6k out | Generation, not delivery |

| titan-weekly-dream | weekly | ~80k in / ~12k out | Memory consolidation |

| titan-weekly-review | weekly | ~60k in / ~8k out | |

| nightly-prompt-eval | nightly | ~100k in / ~20k out | Eval grid |

| nightly-report-writer | nightly | ~70k in / ~15k out | |

| nightly-pmf-d7 | nightly | ~25k in / ~4k out | |

| claude-code-audit-every-6h | 4×/day | ~15k in / ~3k out | 6h cadence > 24h SLA — fits |

Tier-2 monthly token estimate: ~12M input, ~3M output

Tier-2 monthly savings: ~$40–$50/mo on top of Tier 1.

2.3 Tier 3 — month+ (3 jobs + Innerverse TC7)

| Job | Notes |

|---|---|

| titan-monthly-evolve | Heavy. Could use 300k output beta. |

| titan-design-audit-quarterly | Quarterly — savings small but cleanest fit |

| titan-agent-utilization-daily | Analytics |

| Innerverse TC7 weekly synthesis | Already designed in DARWIN tiering proposal. Wires through same titan_batch.py. |

2.4 Explicitly NOT batch — stay real-time

| Job | Reason |

|---|---|

| titan-bridge-watchdog, titan-revive-watch-1m, titan-heartbeat-monitor-15m | Health checks, sub-minute latency required |

| titan-hourly-progress-email | Per directive — must fire every hour, no async drift |

| silentinfinity-chat-smoke-10m | Smoke test, 10-min cadence |

| Any /cmd, /voice, /api/voice path | User-facing |

| Innerverse Silent Infinity live chat | User-facing |

| titan-inbox-watch | Email triage needs sub-hour |

| agentic-247-watchdog, swarm-health-orchestrator | Real-time control loops |

---

3. Cost model (current → projected)

3.1 Current monthly Claude spend (estimated, sync only)

| Workload class | Monthly tokens (in/out) | Cost |

|---|---|---|

| Tier-1 candidates (sync today) | 2.43M / 0.57M | $15.84 |

| Tier-2 candidates (sync today) | 12M / 3M | ~$80 |

| Tier-3 candidates (sync today) | ~3M / ~0.5M | ~$15 |

| Real-time (untouched) | ~5M / ~1M | ~$30 |

| Total | ~22M / ~5M | ~$140/mo |

3.2 Post-migration projection

| Workload class | Pricing applied | Cost |

|---|---|---|

| Tier-1 (batch) | $1.50/$7.50 | $7.92 |

| Tier-2 (batch) | $1.50/$7.50 | ~$40 |

| Tier-3 (batch) | $1.50/$7.50 | ~$7.50 |

| Real-time (untouched) | $3/$15 | ~$30 |

| Total | | ~$85/mo |

Projected monthly savings: ~$55/mo at current volume.

3.3 At 5x volume (12-month outlook per growth plan)

Same workload mix scaled:

This is the case for building the helper now even though Tier-1 alone looks like coffee money.

---

4. Implementation plan

4.1 Build titan_batch.py (the one shared helper)

Location: F:/TITAN/scripts/titan_batch.py

Interface:


def submit(requests: list[dict], *, tag: str, model: str = "claude-sonnet-4-6") -> str:
    """
    requests: [{"custom_id": str, "messages": [...], "max_tokens": int, ...}]
    tag: workflow identifier (e.g. "feed-classify-2026-05-02")
    Returns: batch_id. Persists to F:/TITAN/state/batch-jobs.jsonl.
    """

def poll_and_collect(batch_id: str) -> list[dict] | None:
    """
    Returns parsed results when batch ends, None while in_progress.
    Writes results to F:/TITAN/state/batch-results/<batch_id>.jsonl.
    """

def reconcile(tag: str) -> dict:
    """
    Used by /batches dashboard. Returns lifecycle summary for a tag.
    """

custom_id convention: <workflow>-<YYYY-MM-DD>-<seq> (e.g. feed-classify-2026-05-02-007). This is your traceability hook — every result must map back to a known input.

State file: F:/TITAN/state/batch-jobs.jsonl — one line per submission with {batch_id, tag, submitted_ts, request_count, status, results_url, completed_ts}. Append-only.

4.2 Result-fetch cron

New scheduled task: titan-batch-poll-15m (every 15 minutes)


# scheduled-tasks/titan-batch-poll-15m/SKILL.md
# Read open batch jobs from state/batch-jobs.jsonl
# For each in_progress: poll. If ended: download results, write to batch-results/, mark complete.
# Errored requests → state/batch-errors.jsonl for replay.

15-minute cadence is the right balance — most batches finish in 5min–4h, polling more often wastes API quota; polling less often delays downstream consumers.

4.3 Per-skill migration pattern (Tier-1 example: feed)

Before:


for article in articles:
    resp = client.messages.create(model="claude-sonnet-4-6", messages=[...])
    classify(article, resp)

After:


import titan_batch
requests = [{"custom_id": f"feed-{today}-{i}", "messages": [...], "max_tokens": 800}
            for i, article in enumerate(articles)]
batch_id = titan_batch.submit(requests, tag=f"feed-{today}")
# Pickup happens via titan-batch-poll-15m → writes to staging/feed-results-<date>.jsonl
# Newsletter cron reads from staging on next fire

4.4 Dashboard panel — /batches on TITAN bridge

Add a route to titan_bridge.py that reads state/batch-jobs.jsonl and renders:

4.5 Telemetry & cost attribution

Tag every batch with the calling workflow. On completion, log:


{"ts": "...", "tag": "feed-classify", "input_tokens": 24500, "output_tokens": 7800,
 "model": "claude-sonnet-4-6", "batch_savings_usd": 0.21}

to F:/TITAN/state/batch-cost-log.jsonl. The hourly progress email reads this for the "saved this hour" line.

---

5. Operational risks & mitigations

| Risk | Likelihood | Mitigation |

|---|---|---|

| 24h SLA missed → stale daily digest | Low | Submit at T-26h. Fallback: if not done at T-2h, trigger sync re-run for missing custom_ids only. |

| Orphaned batches (we lose track of batch_id) | Medium | All batch_ids written to state/batch-jobs.jsonl before submission. Reconciler scans this on every poll. |

| Errored requests silently dropped | Medium | state/batch-errors.jsonl + alert in hourly digest if >5 errors/day. |

| Cache TTL collision with batch latency | High | Explicit: do NOT use 5min cache on batch payloads. Use 1h beta header OR accept full-price input. |

| Anthropic rate-limits batch endpoint | Low | Already-async workflows; fallback to sync is acceptable behavior if ratelimited. |

| Helper bug → all crons fail | Medium | Wrap every call in try/except; on submit failure, fall back to sync inline. Never block the cron. |

| Billing surprise (canceled batches still billed) | Low | Anthropic bills only for succeeded requests. Document this assumption in helper. |

---

6. KPIs (post-launch, 30-day review)

1. % of batchable cron token-volume actually on Batch API — target ≥80%.

2. $ saved per month vs. sync baseline — target ≥$40/mo at current volume.

3. Mean batch turnaround time — target ≤4h p50, ≤12h p95.

4. Batch error rate — target ≤1% of requests.

5. Helper outages (batch submit failed → sync fallback fired) — target ≤5/month.

Surface all five on /batches.

---

7. Recommendation

GO. Ship in three slices:

| Slice | Owner | Deadline | Outcome |

|---|---|---|---|

| A — Build titan_batch.py + titan-batch-poll-15m cron + /batches dashboard | FORGE | end of week 2026-W18 | Helper LIVE |

| B — Migrate Tier-1 (feed, monologue, weekly-benchmark) | FORGE | 2026-05-09 | First savings on the books |

| C — Migrate Tier-2 + Innerverse TC7 | FORGE | 2026-05-23 | ~$50/mo run-rate savings, helper proven across 8+ workflows |

Tier-3 is opportunistic — convert when other work touches those skills.

---

8. Open questions (for Harnoor)

1. Cache strategy on batch: Accept loss of 5-min cache (simpler) or wire 1h cache beta (more savings but more code)? Recommendation: skip caching on batch payloads in v1; revisit if a single workflow exceeds $20/mo.

2. Bedrock batch: Worth keeping Innerverse-Bedrock and TITAN-direct-API as separate batch lanes, or unify under direct API for consistency? Recommendation: unify on direct API; Innerverse can keep Bedrock for sync user-facing chat where sovereignty matters.

3. 300k output beta: Any current TITAN workflow that would benefit (i.e., currently chunking around output limits)? Candidate: titan-monthly-evolve. Defer to Tier-3.

---

9. References

1. DARWIN — plans/advisors/DARWIN-MODEL-TIERING-PROPOSAL-v1-2026-04-21.md (TC7 batch synthesis design, batch pricing math)

2. SCOUT intel — knowledge/memory/warm/market-intel/intel_model_deprecations_june2026_20260421.md (300k beta, model retirement dates)

3. SCOUT intel — knowledge/memory/warm/market-intel/intel_opus47_api_breaking_changes_20260424.md (output-300k-2026-03-24 header, Haiku 3 retirement)

4. Anthropic docs — docs.anthropic.com/en/api/creating-message-batches (mechanics, retrieved per existing memos)

5. Anthropic pricing — www.anthropic.com/pricing (50% batch discount)

---

— SCOUT, 2026-05-02. Findings only. FORGE owns the build.