Newsletter Infrastructure Overhaul — Plan

Date: 2026-05-13 · Status: drafting (scout research running in parallel)

> Problem identified by Harnoor: newsletters repeat the same news across days. No indexed research. No timestamps. No multimedia variety. Same titles + summaries appear repeatedly. The shell looks nice but the engine is broken.

> Goal: every newsletter must feel fresh every single day. Same story across days OK — same title/angle/summary across days NEVER.

---

Root cause

Current architecture:

Each newsletter script (openclaw_newsletter.py etc.) starts fresh each run — no memory of what it sent yesterday
Source feeds are queried fresh each morning — but if the same news item is top-trending 3 days in a row (e.g. Anthropic release week), it lands in 3 issues with similar framing
No fingerprinting / dedup / freshness scoring
Generic "[emoji] The Agent Stack #042" subject lines that don't reflect what's actually inside
No timestamps on items inside the newsletter ("when did this break")
No multimedia diversity (today: usually text + 1 stat. Best-in-class: text + YouTube + screenshot + chart + quote)

---

New architecture (4 components)

1. STORY LEDGER (the brain)

File: F:/TITAN/state/newsletter-stories.sqlite

Schema:


CREATE TABLE stories (
  story_id      TEXT PRIMARY KEY,     -- sha256(entity_normalized + headline_keywords)
  first_seen    DATETIME NOT NULL,
  last_sent     DATETIME,
  send_count    INTEGER DEFAULT 0,
  sent_in       TEXT,                 -- JSON array of newsletter slugs that have used it
  entity        TEXT,                 -- "Anthropic", "OpenAI", "Linear" etc
  headline      TEXT,
  summary_hash  TEXT,                 -- sha256 of summary so we never re-use exact text
  source_url    TEXT,
  story_kind    TEXT,                 -- release, partnership, hire, funding, leak, opinion
  freshness     REAL,                 -- 0.0–1.0 decay score
  tags          TEXT                  -- json array
);
CREATE INDEX idx_freshness ON stories(freshness DESC, last_sent);
CREATE INDEX idx_entity_date ON stories(entity, first_seen);

Dedup rule on send: before any item lands in a newsletter draft:

1. Compute story_id from entity + headline-keywords

2. Look up in ledger

3. If sent in last 3 days and no new angle → SKIP

4. If sent in last 7 days but has a new development → ALLOWED, but Gemini must rewrite headline + summary completely (compare summary_hash to ensure they differ)

5. Insert/update row after send

2. RESEARCH INDEXER (the eyes)

Script: F:/TITAN/scripts/newsletter_research_indexer.py (new)

Runs at 05:00 UTC daily, BEFORE any newsletter is generated.

Tasks:

1. Perplexity Pro queries (via pplx.py) — 6 queries per newsletter topic:

- "what shipped in {topic} in last 24 hours, with source URLs and timestamps"

- "what's trending on /r/{subreddit} in {topic} this morning"

- "GitHub trending repos in {topic} last 24h"

- "latest YouTube uploads from {channel-list} in {topic}"

- "latest blog posts from {author-list}"

- "biggest opinion / hot take in {topic} from past day"

2. Gemini Pro dedupes results across the 6 queries, scores freshness, generates a fingerprint per story

3. YouTube API (or simple oEmbed) → pull thumbnail + duration for any embedded video

4. Screenshot service — for product launches, capture og:image or a Playwright screenshot

5. Output: F:/TITAN/state/research-index-{date}.jsonl — every story with id/entity/headline/summary/source-url/timestamp/media

3. NEWSLETTER DRAFTER (the writer)

Each newsletter script reads the indexed research, filters via ledger, picks N stories with diversity rules:

Diversity rule: max 2 stories per entity. No 3 Anthropic stories in one issue.
Freshness rule: 70% must be < 24h old, 20% can be 24-72h with new angle, 10% are evergreen ("here's what to remember from this week")
Multimedia rule: every issue gets at least 1 video embed + 1 screenshot + 1 quote pull
Subject line: Gemini generates a unique subject based on TOP story, not "Issue #N"

4. POST-SEND LEDGER UPDATE

After SES send, the script:

Marks every story sent with current timestamp
Increments send_count
Records which newsletter sent it
Backs up the rendered HTML to S3 for archive at newsletter-archive/<slug>/<date>.html

---

Multimedia handling (Harnoor explicitly asked)

| Type | Approach | Copyright |

|---|---|---|

| YouTube video | Embed <iframe> via oEmbed; thumbnail as fallback | Fair use — single embedded clip, attribution required, no re-upload |

| Product screenshot | Playwright/og:image scrape, attribute source, max 600×400 | Fair-use commentary; always credit + link |

| Quote | <15 words verbatim only; otherwise paraphrase | Already in copyright rules |

| Chart | Generate our own via Chart.js or Imagen 4 if data quoted | Avoid republishing source charts |

| Tweet/X post | Native blockquote with attribution | Standard embed rules |

---

Architecture diagram


                                ┌────────────────────────────────┐
                                │  newsletter_research_indexer.py│
                                │   (cron 05:00 UTC daily)       │
                                └────────────┬───────────────────┘
                                             │
                          ┌──────────────────┼──────────────────┐
                          │                  │                  │
                  ┌───────▼──────┐  ┌────────▼─────┐  ┌─────────▼────────┐
                  │ Perplexity   │  │ Gemini Pro   │  │  YouTube oEmbed  │
                  │ (6 queries)  │  │ (dedupe +    │  │  + Playwright    │
                  │              │  │  fingerprint)│  │  screenshots     │
                  └───────┬──────┘  └────────┬─────┘  └─────────┬────────┘
                          │                  │                  │
                          └──────────────────┼──────────────────┘
                                             │
                            ┌────────────────▼─────────────────┐
                            │ research-index-{date}.jsonl      │
                            │  story_id · entity · headline ·  │
                            │  summary · source · media · ts   │
                            └────────────────┬─────────────────┘
                                             │
                  ┌──────────────────────────┼──────────────────────────┐
                  │                          │                          │
        ┌─────────▼────────┐       ┌─────────▼─────────┐       ┌─────────▼─────────┐
        │ openclaw_news.py │       │ agentic_ai_news.py│       │ claude_news.py    │
        │  08:00 UTC       │       │  08:15 UTC        │       │  08:30 UTC        │
        └─────────┬────────┘       └─────────┬─────────┘       └─────────┬─────────┘
                  │                          │                          │
                  └───── filter via STORY LEDGER (sqlite) ───────────────┘
                                             │
                              ┌──────────────▼──────────────┐
                              │   SES send → ledger update  │
                              │   archive HTML → S3         │
                              └─────────────────────────────┘

---

Cost estimate (per day, all 4 newsletters)

| Item | Cost |

|---|---|

| Perplexity Pro (Harnoor's plan, free for him) | $0 (Pro plan) |

| Gemini Pro dedup + fingerprint (~50k tokens) | $0.15 |

| Gemini Flash subject-line + headline rewrites | $0.05 |

| YouTube oEmbed + Playwright screenshots | $0 (free + local) |

| SES sends (4 emails × 1 subscriber Harnoor for now) | $0.0004 |

| Total daily | ~$0.20 |

When scale hits 1,000 subscribers: ~$0.50/day all-in. Cheap.

---

Implementation order (8 tracked tasks queued)

🔴 P0 — ship this week

1. newsletter_stories.sqlite with schema above — empty DB ready

2. newsletter_research_indexer.py — Perplexity + Gemini + writes JSONL daily

3. newsletter_ledger.py — shared module for dedup lookup + update by every newsletter script

4. Retrofit openclaw_newsletter.py to use ledger

🟠 P1 — week 2

5. Retrofit agentic-ai + claude + agent-stack newsletter scripts

6. Add YouTube oEmbed support to newsletter templates (we already have 4 picked: Rolling Stone / Pop-Sci / Anthropic-brand / Comic-panel)

7. Add Playwright screenshot capture for product launches

🟡 P2 — week 3

8. Smart subject-line generator (Gemini Flash, dynamic per top story)

---

What's running right now

🛰 SCOUT doing deep research on top 10 newsletters' dedup strategies (Stratechery / TLDR / Morning Brew / Lenny's / Every / The Information / Hustle / Platformer / Axios / Pragmatic Engineer)
Output: F:/TITAN/plans/audits/NEWSLETTER-DEDUP-RESEARCH-2026-05-13.md (memo + emailed)

When SCOUT delivers, I'll merge its findings into this plan + then ship the P0 components.

---

Success metrics (how we know it's working)

| Metric | Target |

|---|---|

| Same headline appearing twice in same week across any 2 newsletters | 0 |

| Same summary text (>50% similarity) | 0 |

| Issues with embedded video | ≥1 per newsletter per week |

| Issues with screenshot | ≥1 per newsletter per week |

| Subject-line click-through (when we have analytics) | +30% vs current "Issue #N" |

| Reader survey "felt fresh" rating | ≥4/5 |