SCOUT · Anthropic Batch API — Applied to TITAN

Memo ID: SCOUT-BATCH-API-APPLIED-TO-TITAN-2026-05-02

Type: Applied research / implementation proposal

Author: SCOUT (TITAN research arm)

Date: 2026-05-02

Status: READY FOR DARWIN REVIEW → FORGE EXECUTION

Predecessor research:

DARWIN-MODEL-TIERING-PROPOSAL-v1-2026-04-21.md (TC7 batch synthesis design)
knowledge/memory/warm/market-intel/intel_model_deprecations_june2026_20260421.md (300k batch beta)
knowledge/memory/warm/market-intel/intel_opus47_api_breaking_changes_20260424.md (max_tokens=300k via output-300k-2026-03-24)

---

0. Executive summary

| | |

|---|---|

| Headline | Migrate ~14 of TITAN's 35 scheduled tasks to Batch API. Projected savings $48–$132/month at current volume; $320–$880/month if volume 5x's per growth plan. |

| Effort | ~6 engineering hours (one shared titan_batch.py helper + per-skill 10-line patches) |

| Risk | Low. Batch is async — no UX impact. Fallback to real-time on batch failure. |

| Decision | GO. Ship Tier-1 (3 highest-leverage crons) this week. |

Why now:

1. We already have ≥14 cron jobs that hit Claude with 30-min-to-24h latency tolerance. They are paying full price for sync inference they don't need.

2. Anthropic's Batch API gives 50% off input + output. Stacks with prompt caching (cache_read at 10% of input) where we can preserve cache TTL.

3. The 300k max_tokens beta (output-300k-2026-03-24) unblocks multi-document synthesis we currently chunk around.

4. Innerverse's TC7 batch synthesis (already designed in DARWIN's tiering proposal) needs the same helper — building once amortizes both.

---

1. What Batch API actually is (recap, brief)

(Full mechanics live in DARWIN's tiering proposal §2.4 and the two market-intel memos. Recapped here for self-contained read.)

Endpoint: POST /v1/messages/batches (Python: client.messages.batches.create(...))
Shape: requests=[{custom_id: str, params: {model, max_tokens, messages, ...}}] — params is a normal Messages API call, batched.
SLA: 24h. Real-world: typically 5min–4h for batches under 10k requests.
Pricing: 50% off both input and output tokens vs. real-time on the same model.
Compatibility: Sonnet 4.6, Opus 4.6, Haiku 4.5, Opus 4.7. Tool use ✓, vision ✓, prompt caching ✓ (caveat below), files API ✓, citations ✓, extended thinking ✓.
Cache TTL collision: 5-min cache TTL is incompatible with batch dispatch latency. Use cache_control: ephemeral + 1h TTL beta for batch workloads, OR accept full-price input on the parts where caching is moot.
300k output: extra_headers={"anthropic-beta": "output-300k-2026-03-24"} raises max_tokens cap to 300k for Opus 4.6 / Sonnet 4.6. Only available in batch.
Bedrock batch: Separate API (S3 in, S3 out, JSONL). Not what we want unless we're already on Bedrock for sovereignty.
Polling: client.messages.batches.retrieve(batch_id) until processing_status == "ended", then download results_url (JSONL, one row per custom_id).

---

2. TITAN application surface — what migrates

I scanned every scheduled task in C:/Users/Harnoor/.claude/scheduled-tasks/ (35 total) and every skill in F:/TITAN/skills/ (25 total). Each was scored on three dimensions:

Latency tolerance: Real-time (<60s) / Soft (mins–1h) / Hard-batch (1h–24h)
Volume: Tokens per fire × fires per month
Migration effort: S = drop-in (≤30 LOC), M = needs result reconciler, L = needs schema changes

2.1 Tier 1 — ship this week (3 jobs, ~70% of savings)

|---|---|---|---|---|---|---|

Tier-1 monthly token estimate:

Input: (25k × 30) + (40k × 30) + (120k × 4) = 2.43M tokens/month
Output: (8k × 30) + (3k × 30) + (60k × 4) = 0.57M tokens/month

Tier-1 monthly savings (Sonnet 4.6 pricing, $3/$15 → $1.50/$7.50 in batch):

Real-time cost: 2.43M × $3 + 0.57M × $15 = $7.29 + $8.55 = $15.84/mo
Batch cost: $7.92/mo
Tier-1 savings: ~$7.92/mo at current volume

That number is small because TITAN's current Claude spend is small. The leverage shows up in Tiers 2–3.

2.2 Tier 2 — next 2 weeks (8 jobs, infrastructure)

|---|---|---|---|

Tier-2 monthly token estimate: ~12M input, ~3M output

Tier-2 monthly savings: ~$40–$50/mo on top of Tier 1.

2.3 Tier 3 — month+ (3 jobs + Innerverse TC7)

| Job | Notes |

|---|---|

| titan-monthly-evolve | Heavy. Could use 300k output beta. |

| titan-design-audit-quarterly | Quarterly — savings small but cleanest fit |

| titan-agent-utilization-daily | Analytics |

| Innerverse TC7 weekly synthesis | Already designed in DARWIN tiering proposal. Wires through same titan_batch.py. |

2.4 Explicitly NOT batch — stay real-time

| Job | Reason |

|---|---|

| titan-bridge-watchdog, titan-revive-watch-1m, titan-heartbeat-monitor-15m | Health checks, sub-minute latency required |

| titan-hourly-progress-email | Per directive — must fire every hour, no async drift |

| silentinfinity-chat-smoke-10m | Smoke test, 10-min cadence |

| Any /cmd, /voice, /api/voice path | User-facing |

| Innerverse Silent Infinity live chat | User-facing |

| titan-inbox-watch | Email triage needs sub-hour |

| agentic-247-watchdog, swarm-health-orchestrator | Real-time control loops |

---

3. Cost model (current → projected)

3.1 Current monthly Claude spend (estimated, sync only)

| Workload class | Monthly tokens (in/out) | Cost |

|---|---|---|

| Tier-1 candidates (sync today) | 2.43M / 0.57M | $15.84 |

| Tier-2 candidates (sync today) | 12M / 3M | ~$80 |

| Tier-3 candidates (sync today) | ~3M / ~0.5M | ~$15 |

| Real-time (untouched) | ~5M / ~1M | ~$30 |

| Total | ~22M / ~5M | ~$140/mo |

3.2 Post-migration projection

| Workload class | Pricing applied | Cost |

|---|---|---|

| Tier-1 (batch) | $1.50/$7.50 | $7.92 |

| Tier-2 (batch) | $1.50/$7.50 | ~$40 |

| Tier-3 (batch) | $1.50/$7.50 | ~$7.50 |

| Real-time (untouched) | $3/$15 | ~$30 |

| Total | | ~$85/mo |

Projected monthly savings: ~$55/mo at current volume.

3.3 At 5x volume (12-month outlook per growth plan)

Same workload mix scaled:

Sync-only baseline: ~$700/mo
Post-migration: ~$425/mo
Savings: ~$275/mo · ~$3,300/yr

This is the case for building the helper now even though Tier-1 alone looks like coffee money.

---

4. Implementation plan

4.1 Build `titan_batch.py` (the one shared helper)

Location: F:/TITAN/scripts/titan_batch.py

Interface:


def submit(requests: list[dict], *, tag: str, model: str = "claude-sonnet-4-6") -> str:
    """
    requests: [{"custom_id": str, "messages": [...], "max_tokens": int, ...}]
    tag: workflow identifier (e.g. "feed-classify-2026-05-02")
    Returns: batch_id. Persists to F:/TITAN/state/batch-jobs.jsonl.
    """

def poll_and_collect(batch_id: str) -> list[dict] | None:
    """
    Returns parsed results when batch ends, None while in_progress.
    Writes results to F:/TITAN/state/batch-results/<batch_id>.jsonl.
    """

def reconcile(tag: str) -> dict:
    """
    Used by /batches dashboard. Returns lifecycle summary for a tag.
    """

custom_id convention: <workflow>-<YYYY-MM-DD>-<seq> (e.g. feed-classify-2026-05-02-007). This is your traceability hook — every result must map back to a known input.

State file: F:/TITAN/state/batch-jobs.jsonl — one line per submission with {batch_id, tag, submitted_ts, request_count, status, results_url, completed_ts}. Append-only.

4.2 Result-fetch cron

New scheduled task: titan-batch-poll-15m (every 15 minutes)


# scheduled-tasks/titan-batch-poll-15m/SKILL.md
# Read open batch jobs from state/batch-jobs.jsonl
# For each in_progress: poll. If ended: download results, write to batch-results/, mark complete.
# Errored requests → state/batch-errors.jsonl for replay.

15-minute cadence is the right balance — most batches finish in 5min–4h, polling more often wastes API quota; polling less often delays downstream consumers.

4.3 Per-skill migration pattern (Tier-1 example: `feed`)

Before:


for article in articles:
    resp = client.messages.create(model="claude-sonnet-4-6", messages=[...])
    classify(article, resp)

After:


import titan_batch
requests = [{"custom_id": f"feed-{today}-{i}", "messages": [...], "max_tokens": 800}
            for i, article in enumerate(articles)]
batch_id = titan_batch.submit(requests, tag=f"feed-{today}")
# Pickup happens via titan-batch-poll-15m → writes to staging/feed-results-<date>.jsonl
# Newsletter cron reads from staging on next fire

4.4 Dashboard panel — `/batches` on TITAN bridge

Add a route to titan_bridge.py that reads state/batch-jobs.jsonl and renders:

KPI strip: open jobs, completed today, errored requests, $ saved this month (from token telemetry)
Lifecycle table: tag · submitted · status · request_count · ETA / completed_at
Cost-attribution chart: $ spent per workflow tag

4.5 Telemetry & cost attribution

Tag every batch with the calling workflow. On completion, log:


{"ts": "...", "tag": "feed-classify", "input_tokens": 24500, "output_tokens": 7800,
 "model": "claude-sonnet-4-6", "batch_savings_usd": 0.21}

to F:/TITAN/state/batch-cost-log.jsonl. The hourly progress email reads this for the "saved this hour" line.

---

5. Operational risks & mitigations

| Risk | Likelihood | Mitigation |

|---|---|---|

| 24h SLA missed → stale daily digest | Low | Submit at T-26h. Fallback: if not done at T-2h, trigger sync re-run for missing custom_ids only. |

| Orphaned batches (we lose track of batch_id) | Medium | All batch_ids written to state/batch-jobs.jsonl before submission. Reconciler scans this on every poll. |

| Errored requests silently dropped | Medium | state/batch-errors.jsonl + alert in hourly digest if >5 errors/day. |

| Cache TTL collision with batch latency | High | Explicit: do NOT use 5min cache on batch payloads. Use 1h beta header OR accept full-price input. |

| Anthropic rate-limits batch endpoint | Low | Already-async workflows; fallback to sync is acceptable behavior if ratelimited. |

| Helper bug → all crons fail | Medium | Wrap every call in try/except; on submit failure, fall back to sync inline. Never block the cron. |

| Billing surprise (canceled batches still billed) | Low | Anthropic bills only for succeeded requests. Document this assumption in helper. |

---

6. KPIs (post-launch, 30-day review)

1. % of batchable cron token-volume actually on Batch API — target ≥80%.

2. $ saved per month vs. sync baseline — target ≥$40/mo at current volume.

3. Mean batch turnaround time — target ≤4h p50, ≤12h p95.

4. Batch error rate — target ≤1% of requests.

5. Helper outages (batch submit failed → sync fallback fired) — target ≤5/month.

Surface all five on /batches.

---

7. Recommendation

GO. Ship in three slices:

|---|---|---|---|

Tier-3 is opportunistic — convert when other work touches those skills.

---

8. Open questions (for Harnoor)

1. Cache strategy on batch: Accept loss of 5-min cache (simpler) or wire 1h cache beta (more savings but more code)? Recommendation: skip caching on batch payloads in v1; revisit if a single workflow exceeds $20/mo.

2. Bedrock batch: Worth keeping Innerverse-Bedrock and TITAN-direct-API as separate batch lanes, or unify under direct API for consistency? Recommendation: unify on direct API; Innerverse can keep Bedrock for sync user-facing chat where sovereignty matters.

3. 300k output beta: Any current TITAN workflow that would benefit (i.e., currently chunking around output limits)? Candidate: titan-monthly-evolve. Defer to Tier-3.

---

9. References

1. DARWIN — plans/advisors/DARWIN-MODEL-TIERING-PROPOSAL-v1-2026-04-21.md (TC7 batch synthesis design, batch pricing math)

2. SCOUT intel — knowledge/memory/warm/market-intel/intel_model_deprecations_june2026_20260421.md (300k beta, model retirement dates)

3. SCOUT intel — knowledge/memory/warm/market-intel/intel_opus47_api_breaking_changes_20260424.md (output-300k-2026-03-24 header, Haiku 3 retirement)

4. Anthropic docs — docs.anthropic.com/en/api/creating-message-batches (mechanics, retrieved per existing memos)

5. Anthropic pricing — www.anthropic.com/pricing (50% batch discount)

---

— SCOUT, 2026-05-02. Findings only. FORGE owns the build.

SCOUT · Anthropic Batch API — Applied to TITAN

0. Executive summary

1. What Batch API actually is (recap, brief)

2. TITAN application surface — what migrates

2.1 Tier 1 — ship this week (3 jobs, ~70% of savings)

2.2 Tier 2 — next 2 weeks (8 jobs, infrastructure)

2.3 Tier 3 — month+ (3 jobs + Innerverse TC7)

2.4 Explicitly NOT batch — stay real-time

3. Cost model (current → projected)

3.1 Current monthly Claude spend (estimated, sync only)

3.2 Post-migration projection

3.3 At 5x volume (12-month outlook per growth plan)

4. Implementation plan

4.1 Build titan_batch.py (the one shared helper)

4.2 Result-fetch cron

4.3 Per-skill migration pattern (Tier-1 example: feed)

4.4 Dashboard panel — /batches on TITAN bridge

4.5 Telemetry & cost attribution

5. Operational risks & mitigations

6. KPIs (post-launch, 30-day review)

7. Recommendation

8. Open questions (for Harnoor)

9. References

4.1 Build `titan_batch.py` (the one shared helper)

4.3 Per-skill migration pattern (Tier-1 example: `feed`)

4.4 Dashboard panel — `/batches` on TITAN bridge