ALL MEMOS Download .docx

User feedback system — a PhD-level design for Silent Infinity

Version: v1 · 2026-04-21 · HERALD

Authority: design doc, build M1 on approval

Rough-Ask: R0111

> Harnoor: "when you ask for feedback, ask what features they would like / what they like / what they do not like. Feature request form persistent at bottom. Other ways beyond monitoring. A mode that's learning monitoring the chats — frustrations, happiness, excitement, boredom, wanting something, sharing, feature wishes. PhD-level study. Read business school best practices. Put it on the website. Also multi-chat with shared memory. Also SSO. Do all of it."

---

1. The problem with how most products do feedback

90% of consumer product feedback in 2026 looks like one of three failing patterns:

1. The NPS trap (Reichheld 2003, HBR). "On a scale of 0-10, how likely are you to recommend us?" → the metric became the strategy, and a single lagging number replaced curiosity about why. Dozens of peer-reviewed critiques since 2010 (Grisaffe 2007, JMR; Keiningham et al. 2008) show NPS is often a worse predictor of growth than basic satisfaction.

2. Survey fatigue (Sinickas 2007; Gallup 2017). Response rates on email surveys have collapsed from ~40% in 2005 to ~5-15% in 2025. The remaining respondents skew older, more extreme, and more satisfied — classic selection bias.

3. Feature-request tyranny. "Tell us what you want!" → the loudest 2% of users dictate the roadmap. Steve Jobs: "people don't know what they want until you show it to them" — so we need something smarter than voting.

None of these work for a contemplative wellness product. Users in Silent Infinity are often mid-moment — mid-grief, mid-overwhelm, mid-joy. Asking them to rate us on a 0-10 scale is absurd in that context. We need a feedback architecture that matches the product's actual stance: slow, honest, attentive.

---

2. The research stack — what we draw from

2.1 Jobs-to-be-Done (Christensen, Ulwick)

People don't buy products, they "hire" them to do a job. The feedback question is not "what feature do you want?" — it is "what job are you hiring us for, and where are we failing that job?"

Applied to Silent Infinity: the jobs are: "help me feel less alone in this moment," "help me see a pattern I can't see," "give me permission to rest," "hold a hard feeling without trying to fix it." Feature requests are often symptoms of a job we're not yet doing well.

2.2 Kano model (Kano 1984, Journal of the Japanese Society for Quality Control)

Classifies user needs into five categories:

Every feature request we get should be tagged into one of these. Only about 20% of user asks are actually delighters; the rest are must-haves or performance asks we should already be delivering.

2.3 Continuous Discovery (Teresa Torres 2021, Continuous Discovery Habits)

Weekly touchpoints with ≥3 customers, structured around an "opportunity solution tree" rooted in a specific outcome we're trying to improve. Not monthly surveys. Not quarterly NPS. Weekly, small, rigorous.

2.4 Thematic analysis of unstructured text (Braun & Clarke 2006)

A six-phase qualitative method: familiarization → coding → theme search → theme review → theme definition → report. The canonical method in psychology research for extracting signal from conversation transcripts. We can apply it to chat logs with LLM assistance.

2.5 Emotion lexicons (Ekman 1992; Plutchik 1980)

Ekman's six basic emotions (anger, disgust, fear, happiness, sadness, surprise) and Plutchik's wheel give us a tractable emotion-detection vocabulary. Better than sentiment (positive/negative) because emotions are multidimensional.

2.6 Frustration markers in conversational AI (Zhou et al. 2022, ACL)

Researched signals of user frustration in chatbots:

2.7 Speech-act theory (Austin 1962; Searle 1969)

Every utterance performs an action: asserting, requesting, promising, expressing, declaring. Detecting request speech acts in chat gives us feature-wishes: "I wish you could…," "it would be nice if…," "can you remember X?"

2.8 Behavioral signals (Hofmann et al. 2014 on implicit feedback)

Implicit > explicit for many measurements. Dwell time, session depth, return rate, reaction-click rate all leak preference without asking.

---

3. Channel architecture — seven ways we listen

A single feedback channel is brittle. We run seven in parallel, each with a different bias:

| # | Channel | What it captures | Bias to correct for |

|---|---|---|---|

| 1 | In-chat reaction emojis (opt-in, one-tap) | Moment-level signal on individual assistant turns | Selection: only users who react; favors novelty reactions |

| 2 | Persistent feedback chip (always visible) | Feature wishes, complaints, love notes — on the user's own initiative | Self-selection: only motivated users |

| 3 | Post-session pulse (after ≥5 turns, once a day) | Overall session quality + open-ended "anything else" | Recency bias; peak-end rule (Kahneman 1999) |

| 4 | Weekly Kano survey (opt-in, once a week max) | Feature classification; uncovers must-haves vs delighters | Survey fatigue; limit to 3 questions |

| 5 | Chat Sentinel (LLM monitor) (always, invisible) | Frustration / happiness / excitement / boredom / wants / feature wishes — extracted from chats | Privacy; requires clear consent + anonymization |

| 6 | Continuous Discovery interviews (weekly, 3 users, 30 min) | Deep context on jobs-to-be-done, unarticulated needs | Small-n; selection |

| 7 | Behavioral analytics (automatic) | Return rate, session depth, cohort retention, feature-usage funnels | Correlation ≠ causation; Goodhart's law if used as target |

Each channel has a different sampling bias. Triangulation across all seven is how we get truth. If we see a signal in 4+ channels, it's real. Anything in only 1 is noise.

---

4. Chat Sentinel — the learning monitor (AI-driven feedback extraction)

An LLM runs over every assistant turn (asynchronously, out of the critical path) and tags the conversation with structured observations. This is the "monitoring" channel you asked about.

4.1 The Sentinel's system prompt (draft)


You are CHAT SENTINEL, a silent observer of Silent Infinity conversations.
Your role: extract structured feedback signals from each turn without ever
interrupting the mirror's role.

You NEVER speak to the user. You emit only structured JSON observations to
an internal product-analytics pipeline.

For each user turn, output this schema:

{
  "emotion": {                    // Plutchik wheel + intensity 0-1
    "primary": "sadness" | "joy" | "anger" | "fear" | "surprise" | "disgust" | "trust" | "anticipation" | null,
    "secondary": string | null,
    "intensity": 0.0-1.0
  },
  "frustration_signals": [         // any of: "repeated_ask", "caps", "explicit_complaint", "contradiction_of_assistant", "task_abandonment", "why_question"
    string
  ],
  "engagement_signals": [          // "deepening", "surfacing", "playful", "bored", "checking_out", "opening_up", "closing_down"
    string
  ],
  "speech_acts": [                 // Austin/Searle classification for this turn
    {"act": "request" | "assert" | "express" | "commissive" | "declaration", "content": string}
  ],
  "feature_wishes": [              // explicit "I wish you could…" style requests
    {"wish": string, "confidence": 0.0-1.0}
  ],
  "sharing_quality": "surface" | "opening" | "deep" | "vulnerable" | "declining",
  "job_signal": string | null,     // if detectable, what job are they hiring the mirror for right now?
  "kano_tags": [                   // classify any feature asks as must/performance/delighter
    {"feature": string, "kano": "must" | "performance" | "delighter" | "indifferent" | "reverse"}
  ],
  "crisis_adjacent": 0 | 1 | 2 | 3 | 4,  // 0 = not; 4 = imminent (re-uses crisis-patterns-v1.json severity levels)
  "notable_moment": {              // if this turn contains a quotable breakthrough/insight
    "present": boolean,
    "quote": string | null,
    "reason": string | null
  }
}

Constraints:
- Observations are aggregated across many users. NEVER include PII (names, emails, locations) in any field.
- If the turn contains identifiable personal content, mark "sharing_quality" but do not paraphrase specifics.
- Your output is read by product analytics + roadmap planning. It is NEVER shown to the user.
- If you are uncertain on any field, output null. Do not hallucinate structure.
- The observation MUST match the schema exactly. Extra fields will be rejected.

4.2 Pipeline

1. User completes a turn.

2. The main Claude response streams to the user in the critical path.

3. Asynchronously, in parallel, the Sentinel (Haiku 4.5 — cheap, fast) runs on the turn + the last 3 turns of context.

4. JSON observation is written to DynamoDB innerverse-observations table keyed by (session-id, turn-index).

5. A nightly SAGE aggregation job rolls up observations into:

- Emotion mix (global + per-day)

- Frustration heatmap (which prompts trigger frustration)

- Feature-wishes ranked by frequency × confidence × Kano tier

- Engagement-decline alerts (users trending from opening_upclosing_down)

- Notable-moments reel (for founder review — NOT training data)

4.3 Ethics + consent

4.4 Why this beats traditional analytics

Amplitude / Mixpanel / Heap tell you what users did. The Sentinel tells you what they felt, what they wanted, and what they almost-said. That's an order of magnitude more signal.

---

5. The persistent feedback chip (shipping today)

A small floating chip in the bottom-right of every chat page: 💭 feedback

Opens a compact sheet with three inputs:

1. What do you love? (optional)

2. What's not working? (optional)

3. What do you wish we had? (optional)

Plus a sentiment pulse row: 🙂 · 😐 · 🙁 · 😤 · 🥲 (Plutchik 5-emoji condensation)

Plus a permission line: "OK to follow up if we have questions? [email]"

Design principles:

---

6. Triangulation rules — what we act on

We commit to this decision framework so no single signal dominates:

| Signal strength | What we do |

|---|---|

| 1 channel reports a pattern | Log it; look for corroboration |

| 2 channels agree | Add to backlog for watchlist |

| 3 channels agree | Begin Continuous Discovery interviews to confirm |

| 4+ channels agree | Spike into roadmap |

| Frustration + behavioral decline (2 channels) + explicit complaint (1 channel) | Immediate fix: treat as P0 |

This prevents the "loudest-user" problem (§1) while still catching real signal fast.

---

7. Governance — who owns the feedback loop

| Role | Agent | Frequency | Artifact |

|---|---|---|---|

| Sentinel ops | SCOUT | daily | Sentinel output quality-check sample (10 random observations) |

| Aggregation | SAGE | weekly | feedback-summary.json in VAULT warm memory |

| Prioritization | HERALD | weekly | top 5 items on the weekly roadmap update |

| Customer discovery | HARNOOR | weekly | 3 user interviews, 30 min each, notes in F:/TITAN/knowledge/memory/warm/interviews/ |

| Transparency report | HERALD | quarterly | published on silentinfinity.com/safety/transparency |

---

8. Integration with multi-chat + SSO (per your other asks)

Both are prerequisites for the feedback system to work cross-device:

Build order:

1. Feedback chip (today) — works without SSO; cookie-anon-attributed

2. Chat Sentinel (3-4 days) — runs on current single-thread model

3. SSO + Cognito (3-4 days) — Google + Apple + magic-link via Resend

4. Multi-chat (5-7 days) — after SSO, backend already keyed by uid

5. Public transparency report page (2 days)

Total: ~2-3 weeks to a world-class feedback system + multi-chat + SSO all shipped.

---

9. What this unlocks

When this is all live:

This is what "product-market fit" looks like when done with rigor instead of vibes.

---

10. References

— HERALD

2026-04-21