Version: v2 · 2026-04-21 · SCOUT
Task reference: T008
Authority: Design memo — build on approval
Prior state: Per-bubble 8-emoji reaction row (🪷✨💛👁🫶🌊🙏😶) removed per Harnoor directive: "remove emoji hats on each text bubble."
---
The per-bubble emoji row was not decoration. It was the product's only real-time, turn-level feedback mechanism. Every tap — even a single 🪷 — told us: this specific mirror response landed. That signal is now gone.
The removal was architecturally correct. Per the Brand Book v1, Silent Infinity refuses to gamify the inner-work experience. A row of reaction buttons on every chat bubble is structurally identical to the mechanics it rejects: it is borrowed from social media, it exploits the same trigger-reward loops, and it places an implicit demand on a user who is, in many sessions, mid-grief, mid-overwhelm, or mid-silence. The removal was right.
But the problem it creates is real. We now have a gap in the feedback stack:
| Layer | Before removal | After removal |
|---|---|---|
| Turn-level (real-time, per message) | 8-emoji reaction row | Nothing explicit |
| Session-level | Daily rating widget (40 variants) | Daily rating widget (40 variants) |
| Passive / always-on | Chat Sentinel (Haiku 4.5) | Chat Sentinel (Haiku 4.5) |
| User-initiated | /feedback form in footer | /feedback form in footer |
Turn-level signal is the most granular and most diagnostic signal we can collect. The Chat Sentinel fills some of this gap passively, but passive inference is not a substitute for explicit signal. This memo answers the question: what, if anything, fills that gap in a way that is consistent with who Silent Infinity is?
---
The per-bubble emoji row was a System 1 instrument. It captured the immediate, pre-reflective gut response to a message — the kind of knowing that happens before conscious thought forms words. A tap on 🫶 three seconds after reading a mirror response is not a reasoned judgment. It is an affective signal: this touched me. That is valuable precisely because it has not been filtered through rationalization.
The daily rating widget is a System 2 instrument. It asks the user to reflect, compare, and evaluate — to produce a considered judgment about the session as a whole. Both types of signal are useful. They are not substitutes for each other.
The design challenge is: can we recover System 1 signal without re-importing the social-media mechanics that made the original emoji row inappropriate?
The answer requires separating what made the emoji row feel wrong — its visual prominence, its eight-option width, its always-on presence on every bubble — from what made it useful — its proximity in time to the specific response it was rating. A minimal, hidden-by-default, one-bit signal (did this land / did this not) attached to individual responses can preserve System 1 usefulness while removing the social-media aesthetic.
Users do not interact with a feedback mechanism to rate us. They "hire" a feedback mechanism when one specific condition is met: they feel something strongly enough that the friction of expressing it falls below the energy of holding it unexpressed. No one taps 🙁 on a chat message because they want to improve our model training. They tap it because something felt off and the tap is a small exhale.
The design implication is critical: the feedback signal should be offered at moments of felt response, not at moments of neutral evaluation. A prompt after every 3rd message will be ignored by users in neutral emotional states (the signal cost is higher than the felt need). The same prompt will be used by users who have just read something that surprised, moved, or frustrated them. The mechanism should reduce friction for the second group without imposing friction on the first.
Kano's model distinguishes:
Applied to feedback mechanisms themselves: the Chat Sentinel is a must-have once it exists — users expect (per privacy policy disclosure) that we are paying attention at some level. The daily rating widget is performance quality — a better-designed widget generates marginally more response. A hidden, discoverable, contextual feedback affordance is a potential delighter: users who find it feel seen and heard; users who never find it do not miss what they never knew existed.
This is the architectural insight that drives Option D below.
Torres argues that a single weekly micro-touchpoint with 3 users — structured, focused, 20–30 minutes — generates more actionable product signal than months of passive analytics. The reason: implicit behavioral data tells you what users did; conversation tells you why.
The weekly digest email (Option E) is a Torres-compatible instrument. It creates a low-friction invitation to a richer signal channel. A user who replies to "was this version of me helpful?" with three sentences has given us more signal than forty emoji taps. The digest is not a replacement for structured discovery interviews — it is a funnel that surfaces candidates for them.
Nielsen's think-aloud method captures in-context reasoning that retrospective surveys miss entirely. The user who says "I'm clicking this because..." while using the product reveals motivations they would not remember or articulate in a post-session survey.
The closest asynchronous analogue is the mid-session micro-prompt (Option B): a brief, non-modal invitation to express what is happening right now, in the moment. This is not think-aloud in the classic usability-lab sense, but it shares the key property: the signal is captured while the experience is live, not in retrospect.
Any feedback metric that becomes a target ceases to be a good measure. This is the central risk of explicit per-bubble feedback. If the product team begins optimizing the mirror's outputs for high emoji reaction rates, the mirror stops reflecting the user and starts performing for the rating. The user's inner weather becomes the raw material for a performance metric rather than the subject of an honest conversation.
This is not a hypothetical risk. Goodhart dynamics are observable in every recommendation system that optimizes for engagement: the metric improves while the underlying quality degrades. A feedback system for Silent Infinity must be designed with this explicitly in mind. The Chat Sentinel is Goodhart-resistant by design because it is not a KPI that the mirror's outputs are directly optimized against. Explicit per-turn ratings are more vulnerable and must be treated with care.
---
Each option is scored on five dimensions. Scores are 1 (worst) to 5 (best).
Friction: 1 = high friction, 5 = zero friction
Signal quality: 1 = low quality / easily gamed, 5 = rich / hard to game
False-positive rate: 1 = high false positive, 5 = low false positive
Implementation effort: 1 = high effort, 5 = minimal effort
Cultural fit: 1 = contradicts Silent Infinity's slow/contemplative voice, 5 = deeply aligned
---
Description: No new explicit mechanism. The Chat Sentinel (Haiku 4.5) observes every conversation turn asynchronously and emits structured JSON: emotion, frustration signals, engagement signals, feature wishes, sharing quality, job signal, crisis-adjacency, Kano tags. Daily rating widget and /feedback footer form remain as-is.
What we gain: Zero UX disruption. No risk of gamification. Fully Goodhart-resistant for turn-level signal. Already operational.
What we lose: System 1 explicit signal entirely. We cannot know whether a specific response resonated — only infer from the Sentinel's probabilistic read of the conversation. The Sentinel's emotion and engagement tags are imputed, not reported. There is a fundamental epistemic difference between a user tapping "this landed" and an LLM inferring "this probably landed."
Scores:
| Dimension | Score | Rationale |
|---|---|---|
| Friction | 5 | User does nothing |
| Signal quality | 3 | Rich inference, but always imputed; cannot replace explicit signal |
| False-positive rate | 4 | Sentinel is calibrated; still subject to LLM hallucination of emotional state |
| Implementation effort | 5 | Already shipped |
| Cultural fit | 5 | Invisible; zero violation of contemplative aesthetic |
| Total | 22/25 | |
Verdict: Strong baseline. Insufficient alone because imputed signal and explicit signal measure different things. Valid as the permanent floor; not sufficient as the ceiling.
---
Description: After every 3rd to 5th assistant message (configurable via variants.py), a small, non-modal chip appears below the message: a single line with three affordances — thumbs-up · thumbs-down · skip — in a very small, muted typographic treatment. No label. No explanation. Disappears after 5 seconds if untapped, or on next user input.
Design detail: The chip must not interrupt the reading of the message. It must not appear until the user has had 2–3 seconds to read. It must not be present on every message (that re-creates the original emoji row problem). It must auto-dismiss so that ignoring it is the zero-effort path.
What we gain: Periodic explicit System 1 signal, specifically the binary "did this land" question. Low enough frequency to avoid survey fatigue. Auto-dismiss means the friction of not responding is zero.
What we lose: If the trigger fires during a vulnerable or deep moment, even a well-designed chip can feel like an intrusion. The appearance of an evaluation prompt mid-conversation subtly reframes the interaction as one being monitored and rated — which contradicts the mirror metaphor's promise of an agenda-free space.
Goodhart risk: Medium. If we A/B test mirror prompts against thumbs-up rate, the mirror will be nudged toward prompts that generate positive reactions — which may be prompts that validate rather than challenge. The mirror's job is to reflect, not to comfort. This is a design risk to actively manage.
Scores:
| Dimension | Score | Rationale |
|---|---|---|
| Friction | 4 | Auto-dismiss makes ignoring costless; appearing at all is mild friction |
| Signal quality | 4 | Explicit binary per-message; higher quality than inference |
| False-positive rate | 3 | Thumbs-up/down is coarse; a thumbs-up on a comforting response vs a challenging one look identical |
| Implementation effort | 4 | 1-2 days; variants.py controls trigger frequency |
| Cultural fit | 2 | The evaluation frame, however subtle, contradicts the non-judging mirror aesthetic |
| Total | 17/25 | |
Verdict: Better signal acquisition than Option A but at a real cultural cost. Worth testing in a low-exposure cohort (5–10%) before any broader rollout. The auto-dismiss is the key design requirement — without it, this is worse than the removed emoji row.
---
Description: After a session reaches a natural close — either the user explicitly closes/navigates away, or there has been a 10-minute silence — a single line appears: "Before you go — one word for how this landed." A small open text field, max 50 characters. Submit or skip. Disappears with the session.
What we gain: Rich, qualitative, user-generated signal captured at the moment of highest reflective capacity — after the conversation, when System 2 can speak. A single word like "seen," "stuck," "surprised," "grateful" is more diagnostic than a thumbs-up and less gameable than a star rating. Aggregated across sessions, word frequency builds a lexicon of how the product lands.
What we lose: Post-session recency and peak-end bias (Kahneman 1999). The user's one-word summary is dominated by the final two or three turns. A difficult exchange followed by a gentle resolution will produce a more positive word than an identical difficult exchange that ended there. This is structurally unavoidable with session-end prompts.
Cultural fit: This is the strongest cultural fit of any explicit mechanism. "One word" is consistent with the slow, contemplative voice. It does not ask for a rating. It asks for honest expression — which is exactly what the mirror asks the user for in the conversation itself. The prompt mirrors the product.
Scores:
| Dimension | Score | Rationale |
|---|---|---|
| Friction | 4 | One-tap dismiss; open text requires typing but single word is genuinely low effort |
| Signal quality | 5 | Qualitative, uncoerced, user-authored; extremely high diagnostic value |
| False-positive rate | 4 | Single words are hard to game; peak-end bias is the main distortion |
| Implementation effort | 4 | 1.5 days; requires session-end detection (idle timer or window event) |
| Cultural fit | 5 | Asking for honest expression is the product's core act |
| Total | 22/25 | |
Verdict: Tied with Option A on score but complementary to it. Option A gives us passive turn-level inference; Option C gives us active session-level explicit signal in the product's own voice. These should ship together.
---
Description: No visible affordance. On long-press (desktop: hover-and-pause or right-click area near message; mobile: 500ms hold), a small contextual panel appears adjacent to any message bubble — user's or mirror's — with a compact 4-emoji picker. The emoji set is reduced from the original 8 to 4, chosen for signal density:
The panel disappears after 3 seconds of no interaction. There is no visible badge, count, or confirmation. The tap is silent.
Rationale for this set: The original 8 emojis included some that are emotionally redundant (🪷 and 💛 overlap; ✨ and 🌊 overlap). Reducing to 4 maximizes signal-per-choice. The four chosen map to: emotional movement, warmth, accuracy of reflection, and miss — covering the four most diagnostically useful dimensions for a contemplative AI product.
What we gain: The hidden affordance is Kano-category-delighter: users who discover it feel heard in a subtle, private way. Because it is never presented, it cannot be optimized against by the product team in a Goodhart loop — you cannot A/B test toward a metric no one sees in a dashboard. It also cannot create the social-media evaluation frame, because it is invisible until the user decides to seek it.
What we lose: Discovery rate will be low — perhaps 5–15% of users ever find it. This limits statistical power. It is not a primary signal channel; it is a secondary enrichment.
Design constraint: The long-press must not accidentally trigger during normal reading on mobile. The 500ms hold is standard for this interaction pattern but requires careful touch event handling to distinguish from scroll.
Scores:
| Dimension | Score | Rationale |
|---|---|---|
| Friction | 5 | Zero friction for non-discoverers; low friction for discoverers |
| Signal quality | 4 | 4-option explicit signal; limited to users who discover the affordance |
| False-positive rate | 4 | Contextual; harder to tap accidentally than a visible row |
| Implementation effort | 3 | 2–3 days; requires touch event handling + mobile testing |
| Cultural fit | 5 | Hidden, quiet, voluntary — perfectly aligned with anti-engagement stance |
| Total | 21/25 | |
Verdict: The most culturally aligned explicit mechanism. Should be shipped but not as the primary signal channel given its low discovery rate. Pairs naturally with Option C — the session-end word and the hidden long-press together give two explicit channels that never impose on the conversation.
---
Description: Once per week, a brief email (Resend → harnoors@gmail.com infrastructure already operational) to opted-in users: "Here's a glimpse of what we reflected together this week. Was this version of the mirror helpful?" The email contains one anonymized theme pulled from the user's own conversations (not raw text — a meta-observation like "you returned to the question of belonging three times this week"). A single reply-style prompt: "Yes / Not really / Tell me more."
What we gain: A Torres-compatible weekly touchpoint that surfaces users who want a richer dialogue. A "tell me more" reply becomes a micro-interview candidate. The weekly cadence matches the Torres recommendation of "at least weekly discovery contact with real users." The reply-based architecture means the signal is a deliberate act — not a tap — which makes it higher-value and lower Goodhart risk.
What we lose: Requires email opt-in (another friction layer on top of SSO, which itself depends on T003 resolution). Requires the conversation-theme-extraction pipeline (SAGE aggregation job from USER-FEEDBACK-SYSTEM doc) to be operational. The personalized meta-observation requires Sentinel output to be stable and high-quality. This is a Q3 deliverable, not a sprint one.
Scores:
| Dimension | Score | Rationale |
|---|---|---|
| Friction | 3 | One-click response is low; email opt-in is medium; depends on infrastructure |
| Signal quality | 5 | Reply is deliberate, qualitative, high-intent; Torres-validated channel |
| False-positive rate | 5 | Deliberate written reply is nearly impossible to game |
| Implementation effort | 2 | Requires SSO (T003), SAGE pipeline, email templates, opt-in management |
| Cultural fit | 4 | A weekly letter is contemplative; mass email in wellness is a cultural minefield |
| Total | 19/25 | |
Verdict: Highest signal quality of any option. Not immediately actionable because it sits behind multiple infrastructure prerequisites. Queue for Q3 after T003, multi-chat, and SAGE are operational.
---
The two axes: Discovery friction (x-axis, low to high) and Signal richness (y-axis, low to high).
HIGH SIGNAL
RICHNESS
|
| [E: Weekly Digest] [C: Session-End Word]
| (high infra, high signal) (low friction, rich)
|
| [B: Micro-Prompt 3rd-5th] [D: Long-Press Hidden]
| (medium friction, binary) (zero discovery friction,
| medium signal)
|
| [A: Passive Sentinel Only]
| (zero friction, inferred)
|
LOW ---+------------------------------------------- HIGH
SIGNAL LOW DISCOVERY FRICTION HIGH DISCOVERY FRICTION
RICHNESS
Quadrant classification (Kano mapped):
---
Phase 1 — Immediate (this sprint): Ship Option C (session-end one-word reflection).
beforeunload + 10-minute idle timer. Single text input, 50-char cap, skip button. Quiet ✓ on submit.session_end_reflection — 100% on immediately. No A/B needed at this stage; the baseline is zero signal, so any response is additive.Phase 2 — Next sprint (5–7 days out): Ship Option D (long-press hidden emoji picker).
.message-bubble, 4-emoji panel, silent write to DynamoDB innerverse-observations with reaction type + turn index.bubble_longpress_reactions — launch at 20% to test mobile touch handling before full rollout.Phase 3 — Conditional (7-day check after Phase 1): Evaluate Option B (micro-prompt).
bubble_micro_prompt_frequency at 5% cohort with 5-turn trigger interval (most conservative configuration). 14-day A/B read before expanding.Phase 4 — Q3 (after T003 + SAGE operational): Ship Option E (weekly digest).
weekly_digest_enabled — launch at 10% of opted-in users.No return of the 8-emoji row or any visible per-bubble reaction UI. The removal decision is permanent.
---
# Phase 1 — Session-end reflection
VARIANTS.register(Variant(
category="session_end_reflection",
name="one_word_v1",
description="Post-session single-word reflection prompt",
weight=100,
config={
"trigger": "session_end", # beforeunload + 10min idle
"prompt_text": "Before you go — one word for how this landed.",
"char_limit": 50,
"dismiss_label": "skip",
"auto_dismiss_seconds": None, # persists until dismissed or submitted
"storage_table": "innerverse-observations",
"storage_key": "session_end_word",
}
))
# Phase 2 — Hidden long-press reactions
VARIANTS.register(Variant(
category="bubble_longpress_reactions",
name="hidden_4emoji_v1",
description="Long-press bubble reveals 4-emoji contextual reaction picker",
weight=20, # 20% rollout initially
config={
"trigger": "longpress_500ms",
"emoji_set": ["🌊", "💛", "👁", "😶"],
"emoji_labels": ["moved me", "warm", "saw me", "missed"],
"panel_dismiss_seconds": 3,
"visible_affordance": False, # hidden until discovered
"storage_table": "innerverse-observations",
"storage_key": "bubble_reaction",
}
))
# Phase 3 — Conditional micro-prompt (do not activate by default)
VARIANTS.register(Variant(
category="bubble_micro_prompt_frequency",
name="thumbs_every_5th_v1",
description="Binary thumbs up/down chip after every Nth mirror message",
weight=0, # OFF by default; activate only per Phase 3 conditions
config={
"trigger_every_n_messages": 5,
"affordance": "thumbs", # up / down / skip
"auto_dismiss_seconds": 5,
"storage_table": "innerverse-observations",
"storage_key": "message_thumbs",
}
))
# Phase 4 — Weekly digest (Q3, gate on SSO + SAGE)
VARIANTS.register(Variant(
category="weekly_digest_enabled",
name="weekly_reflection_v1",
description="Weekly email digest with session themes + reply-based feedback",
weight=0, # OFF until T003 + SAGE prerequisites met
config={
"cadence": "weekly_sunday_18:00_user_tz",
"theme_source": "sage_aggregation",
"reply_options": ["Yes", "Not really", "Tell me more"],
"opt_in_required": True,
"discovery_interview_trigger": "tell_me_more_reply",
}
))
---
Option A (Sentinel — always on):
emotion.primary null rate < 15% (higher nulls indicate Sentinel is under-confident on emotional state — investigate system prompt)frustration_signals fire rate: baseline in first 30 days; alert if rate increases > 2x vs baseline week-over-weekfeature_wishes accumulation: first 100 wishes collected → Kano classification → roadmap inputOption C (Session-end word):
Option D (Long-press hidden):
Option E (Weekly digest, Q3):
A healthy feedback stack in 90 days post-Phase-2:
1. Sentinel emitting clean JSON on > 99% of turns
2. Session-end words accumulating at > 8% response rate
3. Long-press reactions discovered by > 5% of users, with 😶 rate < 20%
4. No single Sentinel frustration pattern firing on > 30% of sessions (high rate = systemic prompt problem, not individual session variation)
5. Feature wishes (Sentinel field) generating at least 20 unique wishes per month for Kano analysis
---
Description: If Option B is ever activated and the team begins tracking thumbs-up rate as a performance metric for mirror outputs, the mirror will be tuned toward comfort and validation over honest reflection. The product's core promise breaks.
Mitigation: Option B is permanently gated at weight=0 in variants.py until the Phase 3 conditions are met. If it is ever activated, the thumbs-up rate is stored in DynamoDB but is explicitly excluded from the primary performance dashboard. It is reviewed quarterly, not weekly. A documented policy in the engineering runbook: "bubble_micro_prompt signals may not be used as optimization targets for system prompt A/B tests."
Description: A "before you go" prompt, however gently designed, is structurally identical to the exit-prevention mechanics that predatory apps use to keep users on the screen. If users experience it as an attempt to delay their departure, it violates the brand's "the mirror does not chase you" principle.
Mitigation: The prompt appears as the user is already leaving (triggered by beforeunload or long idle). It is not a modal — it cannot block departure. It has no animation, no countdown, no urgency. The copy ("one word for how this landed") positions it as an offering, not a request for more time. Monitor for skip rate; if skip rate exceeds 90%, the prompt is unwelcome and should be disabled.
Description: If fewer than 3% of users discover the long-press mechanism, the reaction data lacks statistical power for any meaningful analysis. We have the implementation cost with none of the signal benefit.
Mitigation: 30-day discovery rate check after Phase 2 launch. If below 3%, add a single non-intrusive onboarding disclosure (e.g., in the user settings page: "Press and hold any message to leave a private note on it — no one else will see it"). This is a one-time disclosure, not a recurring prompt. Never surface in the conversation itself.
---
---
Prepared by SCOUT for TITAN · Task T008 · 2026-04-21
Replace, do not extend, per-bubble emoji reaction row. Ship C first, then D.