Version: v1 · 2026-04-21 · HERALD
Authority: design doc, build M1 on approval
Rough-Ask: R0112
Companion: USER-FEEDBACK-SYSTEM-2026-04-21.md
> Harnoor: "dashboard from the data analyst — per user cost, average user cost, maximum cost, days/times, how many users, demographics. All the data should be collected. CloudFront may be a good place. Comprehensive plan. Nice format. Three tiers. PhD-level."
---
Silent Infinity has six users and 189 conversations. We have just enough data to start instrumenting — and we are at the exact inflection point where instrumenting now prevents every founder's nightmare twelve months from now: "we scaled to 10k users and don't know what any of them are doing."
Three PhD-stacked disciplines govern what we build:
1. Information architecture (Ralph Kimball 1996, The Data Warehouse Toolkit) — the discipline of modeling events, facts, and dimensions into a queryable shape.
2. Behavioral analytics (Amy Heineike's work on product instrumentation; Hofmann 2014 on implicit feedback) — what events to capture, what to ignore.
3. Privacy-by-design (Cavoukian 2010, Privacy by Design; GDPR Art. 25) — data minimization, purpose limitation, storage limitation. A wellness app cannot treat user data casually.
These three must balance. Over-collect and we inherit liability + breach surface. Under-collect and we fly blind. The architecture below is the middle path.
---
The standard warehouse pattern is Bronze → Silver → Gold (Databricks / Lakehouse pattern; Ralph Kimball's variant is Staging → Integration → Presentation). Same idea.
What's in it:
innerverse-logs-bronze/cf/)innerverse-logs-bronze/lambda/)Format: newline-delimited JSON, one line per event, partitioned by dt=YYYY-MM-DD/hour=HH. Gzip-compressed. Parquet-converted on daily rollup.
Retention: 90 days raw, then aggregated to Silver and deleted. This is GDPR-compliant "data minimization."
Owner: automatic (Kinesis Firehose writes from CloudFront + Lambda + DDB Streams).
What's in it:
sessions fact table — one row per conversation session: uid, cid, start_ts, end_ts, turn_count, region, device_type, referrer, total_tokens_in/out, cost_usd, crisis_flagturns fact table — one row per turn: uid, cid, turn_index, ts, user_char_count, assistant_char_count, model_id, latency_ms, tokens_in/out, cost_usd, sentiment, emotion (from Sentinel), frustration_flags, feature_wishesusers dim table — uid, first_seen, last_seen, total_sessions, total_turns, total_cost_usd, region, device_type, consent_state, opt_out_flags, cohort_weekfeedback fact table — ts, uid, type (form|reaction|rating|sentinel_observation), love, bad, wish, mood, email_hash, related_turn_idcosts fact table — ts, service (bedrock|transcribe|polly|cloudfront|lambda|dynamodb), component, usd, uid, cidFormat: Parquet in S3 innerverse-logs-silver/, partitioned by table name + dt=YYYY-MM-DD. AWS Glue catalog for schema. Queryable via Athena.
PII handling:
Retention: 2 years. Then aggregated to Gold and deleted.
Owner: nightly Glue ETL job that reads Bronze → transforms → writes Silver.
What's in it:
Format: Parquet + materialized views in S3 innerverse-logs-gold/. Also exposed to QuickSight dashboard + the internal /sage dashboard (we build that).
Retention: forever. These are aggregates; no single user can be re-identified.
Owner: weekly SAGE rollup job. Publishes to the SAGE dashboard and to the quarterly transparency report.
---
Already captured by default. Fields we care about:
timestamp, client_ip (truncated to /24 for privacy), cs(Referer), cs(User-Agent), cs-uri-stem, sc-status, time-taken, x-edge-location (city-level)This gives us: traffic volume, geographic distribution, device types, referrers, error rates — at zero incremental cost.
Action: enable if not already: aws cloudfront get-distribution-config --id E2M8T6S9SM3OQY → ensure Logging.Enabled = true + S3 bucket for logs. We need this TODAY.
Every Lambda invocation emits an Embedded Metric Format (EMF) JSON blob to CloudWatch:
{
"_aws": {"Timestamp": 1745222400000, "CloudWatchMetrics": [{"Namespace": "Innerverse", "Dimensions": [["model_id", "route"]], "Metrics": [{"Name": "latency_ms"}, {"Name": "tokens_in"}, {"Name": "tokens_out"}, {"Name": "cost_usd"}]}]},
"model_id": "us.anthropic.claude-sonnet-4-6",
"route": "/invoke",
"uid": "sha256(raw_uid)",
"cid": "sha256(raw_cid)",
"turn_index": 7,
"latency_ms": 1280,
"tokens_in": 4532,
"tokens_out": 287,
"cost_usd": 0.0179,
"region": "us-west",
"device": "mobile",
"crisis_flag": null
}
Partially there already (EMF in handler.py). Need to add: cost_usd computation per turn, device detection, region from CloudFront header.
Enable streams on innerverse-users, innerverse-conversations, innerverse-feedback. Kinesis Firehose delivers to S3 Bronze. This captures every write event for audit + analytics without requiring Lambda to double-write.
Every user turn → Haiku observation → DynamoDB innerverse-observations → stream → S3.
From CloudFront CloudFront-Viewer-Country header + User-Agent string parsing (ua-parser library). Gives us device class, browser, country (not city — privacy). No fingerprinting.
---
Two URLs, one private, one public.
/sage (private — admin only, IP-restricted or Cognito group)Landing page with these panels:
/safety/transparency (public quarterly)Per Innovation 5. Shows users we're honest:
---
Each turn's cost is computed synchronously after the Bedrock response:
def compute_turn_cost(model_id: str, tokens_in: int, tokens_out: int) -> float:
PRICING = {
"us.anthropic.claude-sonnet-4-6": {"in": 3.00, "out": 15.00},
"anthropic.claude-opus-4-7": {"in": 18.00, "out": 90.00},
"anthropic.claude-opus-4-6-v1": {"in": 15.00, "out": 75.00},
"anthropic.claude-haiku-4-5-20251001-v1:0": {"in": 0.80, "out": 4.00},
}
p = PRICING.get(model_id, {"in": 3.00, "out": 15.00})
return (tokens_in / 1_000_000) * p["in"] + (tokens_out / 1_000_000) * p["out"]
Pricing table stored in F:/projects/innerverse/backend/src/pricing.py + version-controlled so historical data is accurate when Anthropic changes rates.
Per-user cost = sum(turn.cost) across all their turns.
Average user cost = sum(turn.cost) / count(distinct uid) over window.
Max user cost = max(user.total_cost) over window.
Cost percentiles = p50/p90/p99 distribution of per-user cost (lets us spot outlier "power users" who consume disproportionate resources).
---
| Step | Ship in | Cost | Owner |
|---|---|---|---|
| 1. Enable CloudFront access logging to S3 | today | ~$0.50/mo S3 | FORGE |
| 2. Per-turn cost computation in handler.py → EMF | this week | 0 | FORGE |
| 3. DynamoDB Streams → Kinesis Firehose → S3 Bronze | this week | ~$2/mo | FORGE |
| 4. Glue ETL: Bronze → Silver nightly | next week | ~$5/mo | FORGE |
| 5. Athena + QuickSight dashboard (SAGE MVP) | next week | ~$12/mo QuickSight reader seat | SAGE |
| 6. Chat Sentinel (per feedback memo) | 2 weeks out | ~$30/mo Haiku | SCOUT |
| 7. /sage private dashboard page | 3 weeks | 0 (reuses QuickSight embed) | FORGE |
| 8. Public /safety/transparency page | 4 weeks | 0 | HERALD |
| 9. Weekly SAGE rollup job → Gold | month out | 0 | SAGE |
Total incremental infra cost at 1,000 DAU: ~$50-80/month. Trivial vs the value of actually knowing what's happening.
---
| Concern | Control |
|---|---|
| GDPR Art. 6 (lawful basis) | Consent at clickwrap + Privacy Policy §4 (already live) |
| GDPR Art. 17 (right to erasure) | DELETE route removes from Silver + Gold within 30 days; Bronze auto-expires at 90 days |
| GDPR Art. 25 (privacy by design) | Bronze → Silver transform drops raw message content + IP below /24; never enters Silver |
| CCPA §1798.105 (right to delete) | Same as GDPR above; honored for all CA users automatically |
| COPPA | 13+ attestation in clickwrap gate; no identifiers collected for users we can't verify |
| CA SB 243 | /safety page already discloses + /safety/transparency will show aggregates |
| SOC 2 (future) | Audit trail in Bronze (immutable S3 object-lock) is the basis |
---
1. Goodhart's law. The moment a metric becomes a target, it ceases to be a good metric. We must NEVER target NPS, MAU, or any single number. Use the triangulation rules from the feedback memo.
2. Sentinel over-reach. The AI monitor will want to classify everything. Constrain its schema tightly + audit 5% of outputs weekly.
3. Creepy personalization. Just because we can tag every emotion doesn't mean we should remember them. Memory should serve the user, not show them off.
4. False precision. A dashboard that says "sentiment trending -3.2%" invites false confidence. Always show error bars or confidence intervals.
5. Cohort collapse. At 6 users, single interesting users dominate averages. Don't publish per-cohort metrics until we have n ≥ 50 per cohort.
---
— HERALD
2026-04-21