Silent Infinity — Feedback Signal v2

Replacing the Removed Per-Bubble Emoji Reactions

Version: v2 · 2026-04-21 · SCOUT

Task reference: T008

Authority: Design memo — build on approval

Prior state: Per-bubble 8-emoji reaction row (🪷✨💛👁🫶🌊🙏😶) removed per Harnoor directive: "remove emoji hats on each text bubble."

---

1. Why This Decision Matters

The per-bubble emoji row was not decoration. It was the product's only real-time, turn-level feedback mechanism. Every tap — even a single 🪷 — told us: this specific mirror response landed. That signal is now gone.

The removal was architecturally correct. Per the Brand Book v1, Silent Infinity refuses to gamify the inner-work experience. A row of reaction buttons on every chat bubble is structurally identical to the mechanics it rejects: it is borrowed from social media, it exploits the same trigger-reward loops, and it places an implicit demand on a user who is, in many sessions, mid-grief, mid-overwhelm, or mid-silence. The removal was right.

But the problem it creates is real. We now have a gap in the feedback stack:

| Layer | Before removal | After removal |

|---|---|---|

| Turn-level (real-time, per message) | 8-emoji reaction row | Nothing explicit |

| Session-level | Daily rating widget (40 variants) | Daily rating widget (40 variants) |

| Passive / always-on | Chat Sentinel (Haiku 4.5) | Chat Sentinel (Haiku 4.5) |

| User-initiated | /feedback form in footer | /feedback form in footer |

Turn-level signal is the most granular and most diagnostic signal we can collect. The Chat Sentinel fills some of this gap passively, but passive inference is not a substitute for explicit signal. This memo answers the question: what, if anything, fills that gap in a way that is consistent with who Silent Infinity is?

---

2. Theoretical Grounding

2.1 Kahneman: System 1 vs. System 2 (Thinking Fast and Slow, 2011)

The per-bubble emoji row was a System 1 instrument. It captured the immediate, pre-reflective gut response to a message — the kind of knowing that happens before conscious thought forms words. A tap on 🫶 three seconds after reading a mirror response is not a reasoned judgment. It is an affective signal: this touched me. That is valuable precisely because it has not been filtered through rationalization.

The daily rating widget is a System 2 instrument. It asks the user to reflect, compare, and evaluate — to produce a considered judgment about the session as a whole. Both types of signal are useful. They are not substitutes for each other.

The design challenge is: can we recover System 1 signal without re-importing the social-media mechanics that made the original emoji row inappropriate?

The answer requires separating what made the emoji row feel wrong — its visual prominence, its eight-option width, its always-on presence on every bubble — from what made it useful — its proximity in time to the specific response it was rating. A minimal, hidden-by-default, one-bit signal (did this land / did this not) attached to individual responses can preserve System 1 usefulness while removing the social-media aesthetic.

2.2 Christensen: Jobs to Be Done (2016, Competing Against Luck)

Users do not interact with a feedback mechanism to rate us. They "hire" a feedback mechanism when one specific condition is met: they feel something strongly enough that the friction of expressing it falls below the energy of holding it unexpressed. No one taps 🙁 on a chat message because they want to improve our model training. They tap it because something felt off and the tap is a small exhale.

The design implication is critical: the feedback signal should be offered at moments of felt response, not at moments of neutral evaluation. A prompt after every 3rd message will be ignored by users in neutral emotional states (the signal cost is higher than the felt need). The same prompt will be used by users who have just read something that surprised, moved, or frustrated them. The mechanism should reduce friction for the second group without imposing friction on the first.

2.3 Kano (1984, Journal of JSQC): Delighter vs. Basic Feedback Channels

Kano's model distinguishes:

Must-have quality: absence causes dissatisfaction; presence is simply expected (e.g., the product not losing your conversation).
Performance quality: more = better (e.g., response speed).
Delighter quality: unexpected presence creates disproportionate satisfaction; absence is not noticed until the delighter exists.

Applied to feedback mechanisms themselves: the Chat Sentinel is a must-have once it exists — users expect (per privacy policy disclosure) that we are paying attention at some level. The daily rating widget is performance quality — a better-designed widget generates marginally more response. A hidden, discoverable, contextual feedback affordance is a potential delighter: users who find it feel seen and heard; users who never find it do not miss what they never knew existed.

This is the architectural insight that drives Option D below.

2.4 Torres: Continuous Discovery (2021)

Torres argues that a single weekly micro-touchpoint with 3 users — structured, focused, 20–30 minutes — generates more actionable product signal than months of passive analytics. The reason: implicit behavioral data tells you what users did; conversation tells you why.

The weekly digest email (Option E) is a Torres-compatible instrument. It creates a low-friction invitation to a richer signal channel. A user who replies to "was this version of me helpful?" with three sentences has given us more signal than forty emoji taps. The digest is not a replacement for structured discovery interviews — it is a funnel that surfaces candidates for them.

2.5 Nielsen: Heuristic Evaluation and Think-Aloud Signals (1994)

Nielsen's think-aloud method captures in-context reasoning that retrospective surveys miss entirely. The user who says "I'm clicking this because..." while using the product reveals motivations they would not remember or articulate in a post-session survey.

The closest asynchronous analogue is the mid-session micro-prompt (Option B): a brief, non-modal invitation to express what is happening right now, in the moment. This is not think-aloud in the classic usability-lab sense, but it shares the key property: the signal is captured while the experience is live, not in retrospect.

2.6 Goodhart's Law

Any feedback metric that becomes a target ceases to be a good measure. This is the central risk of explicit per-bubble feedback. If the product team begins optimizing the mirror's outputs for high emoji reaction rates, the mirror stops reflecting the user and starts performing for the rating. The user's inner weather becomes the raw material for a performance metric rather than the subject of an honest conversation.

This is not a hypothetical risk. Goodhart dynamics are observable in every recommendation system that optimizes for engagement: the metric improves while the underlying quality degrades. A feedback system for Silent Infinity must be designed with this explicitly in mind. The Chat Sentinel is Goodhart-resistant by design because it is not a KPI that the mirror's outputs are directly optimized against. Explicit per-turn ratings are more vulnerable and must be treated with care.

---

3. Option Analysis

Each option is scored on five dimensions. Scores are 1 (worst) to 5 (best).

Friction: 1 = high friction, 5 = zero friction

Signal quality: 1 = low quality / easily gamed, 5 = rich / hard to game

False-positive rate: 1 = high false positive, 5 = low false positive

Implementation effort: 1 = high effort, 5 = minimal effort

Cultural fit: 1 = contradicts Silent Infinity's slow/contemplative voice, 5 = deeply aligned

---

Option A — Passive Sentinel Only (Current Default)

Description: No new explicit mechanism. The Chat Sentinel (Haiku 4.5) observes every conversation turn asynchronously and emits structured JSON: emotion, frustration signals, engagement signals, feature wishes, sharing quality, job signal, crisis-adjacency, Kano tags. Daily rating widget and /feedback footer form remain as-is.

What we gain: Zero UX disruption. No risk of gamification. Fully Goodhart-resistant for turn-level signal. Already operational.

What we lose: System 1 explicit signal entirely. We cannot know whether a specific response resonated — only infer from the Sentinel's probabilistic read of the conversation. The Sentinel's emotion and engagement tags are imputed, not reported. There is a fundamental epistemic difference between a user tapping "this landed" and an LLM inferring "this probably landed."

Scores:

| Dimension | Score | Rationale |

|---|---|---|

| Friction | 5 | User does nothing |

| Signal quality | 3 | Rich inference, but always imputed; cannot replace explicit signal |

| False-positive rate | 4 | Sentinel is calibrated; still subject to LLM hallucination of emotional state |

| Implementation effort | 5 | Already shipped |

| Cultural fit | 5 | Invisible; zero violation of contemplative aesthetic |

| Total | 22/25 | |

Verdict: Strong baseline. Insufficient alone because imputed signal and explicit signal measure different things. Valid as the permanent floor; not sufficient as the ceiling.

---

Option B — Micro-Prompt After Every 3rd–5th Mirror Message

Description: After every 3rd to 5th assistant message (configurable via variants.py), a small, non-modal chip appears below the message: a single line with three affordances — thumbs-up · thumbs-down · skip — in a very small, muted typographic treatment. No label. No explanation. Disappears after 5 seconds if untapped, or on next user input.

Design detail: The chip must not interrupt the reading of the message. It must not appear until the user has had 2–3 seconds to read. It must not be present on every message (that re-creates the original emoji row problem). It must auto-dismiss so that ignoring it is the zero-effort path.

What we gain: Periodic explicit System 1 signal, specifically the binary "did this land" question. Low enough frequency to avoid survey fatigue. Auto-dismiss means the friction of not responding is zero.

What we lose: If the trigger fires during a vulnerable or deep moment, even a well-designed chip can feel like an intrusion. The appearance of an evaluation prompt mid-conversation subtly reframes the interaction as one being monitored and rated — which contradicts the mirror metaphor's promise of an agenda-free space.

Goodhart risk: Medium. If we A/B test mirror prompts against thumbs-up rate, the mirror will be nudged toward prompts that generate positive reactions — which may be prompts that validate rather than challenge. The mirror's job is to reflect, not to comfort. This is a design risk to actively manage.

Scores:

| Dimension | Score | Rationale |

|---|---|---|

| Friction | 4 | Auto-dismiss makes ignoring costless; appearing at all is mild friction |

| Signal quality | 4 | Explicit binary per-message; higher quality than inference |

| False-positive rate | 3 | Thumbs-up/down is coarse; a thumbs-up on a comforting response vs a challenging one look identical |

| Implementation effort | 4 | 1-2 days; variants.py controls trigger frequency |

| Cultural fit | 2 | The evaluation frame, however subtle, contradicts the non-judging mirror aesthetic |

| Total | 17/25 | |

Verdict: Better signal acquisition than Option A but at a real cultural cost. Worth testing in a low-exposure cohort (5–10%) before any broader rollout. The auto-dismiss is the key design requirement — without it, this is worse than the removed emoji row.

---

Option C — Session-End Reflection Prompt

Description: After a session reaches a natural close — either the user explicitly closes/navigates away, or there has been a 10-minute silence — a single line appears: "Before you go — one word for how this landed." A small open text field, max 50 characters. Submit or skip. Disappears with the session.

What we gain: Rich, qualitative, user-generated signal captured at the moment of highest reflective capacity — after the conversation, when System 2 can speak. A single word like "seen," "stuck," "surprised," "grateful" is more diagnostic than a thumbs-up and less gameable than a star rating. Aggregated across sessions, word frequency builds a lexicon of how the product lands.

What we lose: Post-session recency and peak-end bias (Kahneman 1999). The user's one-word summary is dominated by the final two or three turns. A difficult exchange followed by a gentle resolution will produce a more positive word than an identical difficult exchange that ended there. This is structurally unavoidable with session-end prompts.

Cultural fit: This is the strongest cultural fit of any explicit mechanism. "One word" is consistent with the slow, contemplative voice. It does not ask for a rating. It asks for honest expression — which is exactly what the mirror asks the user for in the conversation itself. The prompt mirrors the product.

Scores:

| Dimension | Score | Rationale |

|---|---|---|

| Friction | 4 | One-tap dismiss; open text requires typing but single word is genuinely low effort |

| Signal quality | 5 | Qualitative, uncoerced, user-authored; extremely high diagnostic value |

| False-positive rate | 4 | Single words are hard to game; peak-end bias is the main distortion |

| Implementation effort | 4 | 1.5 days; requires session-end detection (idle timer or window event) |

| Cultural fit | 5 | Asking for honest expression is the product's core act |

| Total | 22/25 | |

Verdict: Tied with Option A on score but complementary to it. Option A gives us passive turn-level inference; Option C gives us active session-level explicit signal in the product's own voice. These should ship together.

---

Option D — Long-Press Any Bubble → Contextual Emoji Picker

Description: No visible affordance. On long-press (desktop: hover-and-pause or right-click area near message; mobile: 500ms hold), a small contextual panel appears adjacent to any message bubble — user's or mirror's — with a compact 4-emoji picker. The emoji set is reduced from the original 8 to 4, chosen for signal density:

🌊 (moved me / something shifted)
💛 (warm / gentle)
👁 (saw me clearly)
😶 (this missed the mark)

The panel disappears after 3 seconds of no interaction. There is no visible badge, count, or confirmation. The tap is silent.

Rationale for this set: The original 8 emojis included some that are emotionally redundant (🪷 and 💛 overlap; ✨ and 🌊 overlap). Reducing to 4 maximizes signal-per-choice. The four chosen map to: emotional movement, warmth, accuracy of reflection, and miss — covering the four most diagnostically useful dimensions for a contemplative AI product.

What we gain: The hidden affordance is Kano-category-delighter: users who discover it feel heard in a subtle, private way. Because it is never presented, it cannot be optimized against by the product team in a Goodhart loop — you cannot A/B test toward a metric no one sees in a dashboard. It also cannot create the social-media evaluation frame, because it is invisible until the user decides to seek it.

What we lose: Discovery rate will be low — perhaps 5–15% of users ever find it. This limits statistical power. It is not a primary signal channel; it is a secondary enrichment.

Design constraint: The long-press must not accidentally trigger during normal reading on mobile. The 500ms hold is standard for this interaction pattern but requires careful touch event handling to distinguish from scroll.

Scores:

| Dimension | Score | Rationale |

|---|---|---|

| Friction | 5 | Zero friction for non-discoverers; low friction for discoverers |

| Signal quality | 4 | 4-option explicit signal; limited to users who discover the affordance |

| False-positive rate | 4 | Contextual; harder to tap accidentally than a visible row |

| Implementation effort | 3 | 2–3 days; requires touch event handling + mobile testing |

| Cultural fit | 5 | Hidden, quiet, voluntary — perfectly aligned with anti-engagement stance |

| Total | 21/25 | |

Verdict: The most culturally aligned explicit mechanism. Should be shipped but not as the primary signal channel given its low discovery rate. Pairs naturally with Option C — the session-end word and the hidden long-press together give two explicit channels that never impose on the conversation.

---

Option E — Weekly Digest Email

Description: Once per week, a brief email (Resend → harnoors@gmail.com infrastructure already operational) to opted-in users: "Here's a glimpse of what we reflected together this week. Was this version of the mirror helpful?" The email contains one anonymized theme pulled from the user's own conversations (not raw text — a meta-observation like "you returned to the question of belonging three times this week"). A single reply-style prompt: "Yes / Not really / Tell me more."

What we gain: A Torres-compatible weekly touchpoint that surfaces users who want a richer dialogue. A "tell me more" reply becomes a micro-interview candidate. The weekly cadence matches the Torres recommendation of "at least weekly discovery contact with real users." The reply-based architecture means the signal is a deliberate act — not a tap — which makes it higher-value and lower Goodhart risk.

What we lose: Requires email opt-in (another friction layer on top of SSO, which itself depends on T003 resolution). Requires the conversation-theme-extraction pipeline (SAGE aggregation job from USER-FEEDBACK-SYSTEM doc) to be operational. The personalized meta-observation requires Sentinel output to be stable and high-quality. This is a Q3 deliverable, not a sprint one.

Scores:

| Dimension | Score | Rationale |

|---|---|---|

| Friction | 3 | One-click response is low; email opt-in is medium; depends on infrastructure |

| Signal quality | 5 | Reply is deliberate, qualitative, high-intent; Torres-validated channel |

| False-positive rate | 5 | Deliberate written reply is nearly impossible to game |

| Implementation effort | 2 | Requires SSO (T003), SAGE pipeline, email templates, opt-in management |

| Cultural fit | 4 | A weekly letter is contemplative; mass email in wellness is a cultural minefield |

| Total | 19/25 | |

Verdict: Highest signal quality of any option. Not immediately actionable because it sits behind multiple infrastructure prerequisites. Queue for Q3 after T003, multi-chat, and SAGE are operational.

---

4. Kano-Style Quadrant Analysis

The two axes: Discovery friction (x-axis, low to high) and Signal richness (y-axis, low to high).


HIGH SIGNAL
RICHNESS
    |
    |   [E: Weekly Digest]        [C: Session-End Word]
    |   (high infra, high signal) (low friction, rich)
    |
    |   [B: Micro-Prompt 3rd-5th] [D: Long-Press Hidden]
    |   (medium friction, binary) (zero discovery friction,
    |                              medium signal)
    |
    |   [A: Passive Sentinel Only]
    |   (zero friction, inferred)
    |
LOW ---+------------------------------------------- HIGH
SIGNAL  LOW DISCOVERY FRICTION            HIGH DISCOVERY FRICTION
RICHNESS

Quadrant classification (Kano mapped):

Option A: Must-have (floor — cannot remove once present)
Option C: Performance quality (more session-ends captured = more signal; culturally fit)
Option D: Delighter (unexpected discovery; creates disproportionate loyalty in those who find it)
Option B: Indifferent-to-Reverse (binary signal with cultural-fit cost; test carefully)
Option E: Delighter once mature (requires infrastructure; high payoff when operational)

---

5. Recommendation and Rollout Order

Recommended stack (in ship order)

Phase 1 — Immediate (this sprint): Ship Option C (session-end one-word reflection).

Cultural fit: 5/5. The only mechanism that asks users to do exactly what the mirror asks them to do in every session.
Implementation: 1.5 days. Session-end detection via beforeunload + 10-minute idle timer. Single text input, 50-char cap, skip button. Quiet ✓ on submit.
variants.py entry: session_end_reflection — 100% on immediately. No A/B needed at this stage; the baseline is zero signal, so any response is additive.

Phase 2 — Next sprint (5–7 days out): Ship Option D (long-press hidden emoji picker).

Complements Phase 1. Phase 1 captures session-level explicit signal. Phase 2 recovers turn-level signal without the social-media aesthetic.
Implementation: 2–3 days. Touch event listener on .message-bubble, 4-emoji panel, silent write to DynamoDB innerverse-observations with reaction type + turn index.
variants.py entry: bubble_longpress_reactions — launch at 20% to test mobile touch handling before full rollout.

Phase 3 — Conditional (7-day check after Phase 1): Evaluate Option B (micro-prompt).

Only activate if Option C's session-end response rate falls below 6% AND Option A's Sentinel frustration signal is showing patterns that session-end words are not explaining.
If activated: variants.py entry bubble_micro_prompt_frequency at 5% cohort with 5-turn trigger interval (most conservative configuration). 14-day A/B read before expanding.
Do not ship as a default. The cultural-fit score (2/5) makes this a last-resort instrument.

Phase 4 — Q3 (after T003 + SAGE operational): Ship Option E (weekly digest).

Gate: SSO live, SAGE aggregation job stable for 30+ days, email opt-in UI built.
variants.py entry: weekly_digest_enabled — launch at 10% of opted-in users.

What we do NOT ship

No return of the 8-emoji row or any visible per-bubble reaction UI. The removal decision is permanent.

---

6. variants.py Entries


# Phase 1 — Session-end reflection
VARIANTS.register(Variant(
    category="session_end_reflection",
    name="one_word_v1",
    description="Post-session single-word reflection prompt",
    weight=100,
    config={
        "trigger": "session_end",  # beforeunload + 10min idle
        "prompt_text": "Before you go — one word for how this landed.",
        "char_limit": 50,
        "dismiss_label": "skip",
        "auto_dismiss_seconds": None,  # persists until dismissed or submitted
        "storage_table": "innerverse-observations",
        "storage_key": "session_end_word",
    }
))

# Phase 2 — Hidden long-press reactions
VARIANTS.register(Variant(
    category="bubble_longpress_reactions",
    name="hidden_4emoji_v1",
    description="Long-press bubble reveals 4-emoji contextual reaction picker",
    weight=20,  # 20% rollout initially
    config={
        "trigger": "longpress_500ms",
        "emoji_set": ["🌊", "💛", "👁", "😶"],
        "emoji_labels": ["moved me", "warm", "saw me", "missed"],
        "panel_dismiss_seconds": 3,
        "visible_affordance": False,  # hidden until discovered
        "storage_table": "innerverse-observations",
        "storage_key": "bubble_reaction",
    }
))

# Phase 3 — Conditional micro-prompt (do not activate by default)
VARIANTS.register(Variant(
    category="bubble_micro_prompt_frequency",
    name="thumbs_every_5th_v1",
    description="Binary thumbs up/down chip after every Nth mirror message",
    weight=0,  # OFF by default; activate only per Phase 3 conditions
    config={
        "trigger_every_n_messages": 5,
        "affordance": "thumbs",  # up / down / skip
        "auto_dismiss_seconds": 5,
        "storage_table": "innerverse-observations",
        "storage_key": "message_thumbs",
    }
))

# Phase 4 — Weekly digest (Q3, gate on SSO + SAGE)
VARIANTS.register(Variant(
    category="weekly_digest_enabled",
    name="weekly_reflection_v1",
    description="Weekly email digest with session themes + reply-based feedback",
    weight=0,  # OFF until T003 + SAGE prerequisites met
    config={
        "cadence": "weekly_sunday_18:00_user_tz",
        "theme_source": "sage_aggregation",
        "reply_options": ["Yes", "Not really", "Tell me more"],
        "opt_in_required": True,
        "discovery_interview_trigger": "tell_me_more_reply",
    }
))

---

7. Metrics to Watch

Healthy signal per option

Option A (Sentinel — always on):

Sentinel JSON parse success rate > 99%
emotion.primary null rate < 15% (higher nulls indicate Sentinel is under-confident on emotional state — investigate system prompt)
frustration_signals fire rate: baseline in first 30 days; alert if rate increases > 2x vs baseline week-over-week
feature_wishes accumulation: first 100 wishes collected → Kano classification → roadmap input

Option C (Session-end word):

Response rate target: > 8% of sessions with 5+ turns
Lexical diversity: healthy if top-10 words cover < 60% of all responses (high concentration = users defaulting to safe words, prompt may be too narrow)
Most common words: first read at 30 days — look for negative cluster (stuck, lost, confused, weird) as signal for system prompt or interaction quality
Skip rate: < 75% is acceptable (most users will skip; that is correct behavior)

Option D (Long-press hidden):

Discovery rate target: > 5% of users find it within 30 days (if below, add a single subtle onboarding hint in settings — never in the conversation)
Reaction distribution: a healthy product should see 😶 (miss) at < 20% of all reactions. Persistently higher → investigate which turn types generate misses.
🌊 (moved me) rate: leading indicator of session depth quality

Option E (Weekly digest, Q3):

Open rate target: > 35% (wellness-adjacent email; above-average engagement expected from opted-in users)
Reply rate target: > 12% ("Tell me more" + other replies combined)
Discovery-interview yield: > 1 interview candidate per 50 digest sends

What "healthy feedback signal" looks like system-wide

A healthy feedback stack in 90 days post-Phase-2:

1. Sentinel emitting clean JSON on > 99% of turns

2. Session-end words accumulating at > 8% response rate

3. Long-press reactions discovered by > 5% of users, with 😶 rate < 20%

4. No single Sentinel frustration pattern firing on > 30% of sessions (high rate = systemic prompt problem, not individual session variation)

5. Feature wishes (Sentinel field) generating at least 20 unique wishes per month for Kano analysis

---

8. Top 3 Risks

Risk 1 — Goodhart contamination of the mirror via micro-prompts

Description: If Option B is ever activated and the team begins tracking thumbs-up rate as a performance metric for mirror outputs, the mirror will be tuned toward comfort and validation over honest reflection. The product's core promise breaks.

Mitigation: Option B is permanently gated at weight=0 in variants.py until the Phase 3 conditions are met. If it is ever activated, the thumbs-up rate is stored in DynamoDB but is explicitly excluded from the primary performance dashboard. It is reviewed quarterly, not weekly. A documented policy in the engineering runbook: "bubble_micro_prompt signals may not be used as optimization targets for system prompt A/B tests."

Risk 2 — Session-end prompt as re-engagement vector

Description: A "before you go" prompt, however gently designed, is structurally identical to the exit-prevention mechanics that predatory apps use to keep users on the screen. If users experience it as an attempt to delay their departure, it violates the brand's "the mirror does not chase you" principle.

Mitigation: The prompt appears as the user is already leaving (triggered by beforeunload or long idle). It is not a modal — it cannot block departure. It has no animation, no countdown, no urgency. The copy ("one word for how this landed") positions it as an offering, not a request for more time. Monitor for skip rate; if skip rate exceeds 90%, the prompt is unwelcome and should be disabled.

Risk 3 — Low discovery rate of Option D undermines signal viability

Description: If fewer than 3% of users discover the long-press mechanism, the reaction data lacks statistical power for any meaningful analysis. We have the implementation cost with none of the signal benefit.

Mitigation: 30-day discovery rate check after Phase 2 launch. If below 3%, add a single non-intrusive onboarding disclosure (e.g., in the user settings page: "Press and hold any message to leave a private note on it — no one else will see it"). This is a one-time disclosure, not a recurring prompt. Never surface in the conversation itself.

---

9. References

Kahneman, D. (2011). Thinking, Fast and Slow. Farrar, Straus and Giroux.
Kahneman, D., Wakker, P.P., & Sarin, R. (1997). "Back to Bentham? Explorations of experienced utility." Quarterly Journal of Economics, 112(2), 375–406.
Christensen, C., Hall, T., Dillon, K., & Duncan, D. (2016). Competing Against Luck: The Story of Innovation and Customer Choice. HarperBusiness.
Torres, T. (2021). Continuous Discovery Habits. Product Talk.
Kano, N. (1984). "Attractive quality and must-be quality." Journal of the Japanese Society for Quality Control, 14(2), 39–48.
Nielsen, J. (1994). Usability Engineering. Morgan Kaufmann. (Chapter 4: Think-Aloud Testing)
Goodhart, C. (1975). "Problems of Monetary Management: The UK Experience." Papers in Monetary Economics, Reserve Bank of Australia. (Marinescu formulation: "When a measure becomes a target, it ceases to be a good measure.")
Grisaffe, D. (2007). "Questions About the Ultimate Question: Conceptual Considerations in Evaluating Reichheld's Net Promoter Score." Journal of Consumer Satisfaction, Dissatisfaction and Complaining Behavior, 20.
Kahneman, D. (1999). "Objective happiness." In D. Kahneman, E. Diener, & N. Schwarz (Eds.), Well-Being: The Foundations of Hedonic Psychology. Russell Sage Foundation.
Austin, J.L. (1962). How to Do Things with Words. Oxford University Press.
Zhou, K., et al. (2022). "Is Your Goal-Oriented Dialog Model Performing Really Well? Empirical Analysis of System-wise Evaluation." ACL 2022.

---

Prepared by SCOUT for TITAN · Task T008 · 2026-04-21

Replace, do not extend, per-bubble emoji reaction row. Ship C first, then D.