ALL MEMOS Download .docx

CLAUDE-UI-AND-LOCAL-STT-2026-04-25

SCOUT Research Memo — Two-Part: Claude.ai UI Reverse Engineering + Client-Side STT (A013)

Prepared: 2026-04-25 | Agent: SCOUT | For: Silent Infinity / TITAN

---

PART 1 — Claude.ai UI Reverse Engineering

1.1 Loading / Thinking States

What Claude.ai actually shows:

Claude.ai's web interface uses an animated ellipsis (three dots, ...) during streaming — no text label like "Thinking..." or "Pondering..." appears in the production web UI. The dots pulse or animate in sequence. This is confirmed by multiple UI comparisons and is distinct from Claude Code's approach.

Claude Code's loading phrases (different product, same brand):

Claude Code uses a rotating set of playful single-word gerunds displayed as the task status:

These appear after every prompt. They are deliberate design decisions adding personality. They rotate; they are not tied to task type.

Contrast with competitors:

Pre-first-token vs. mid-stream:

Design decision for Silent Infinity:

Do NOT copy ellipsis literally — it reads as "loading". Instead adopt Claude Code's personality-loading approach for the companion context: single rotating word, warm tone (e.g., "listening...", "feeling...", "weaving..."). This is more on-brand for an emotional companion.

---

1.2 Suggestion Chips / Starter Prompts

Observed behavior:

When shown:

Implication for Silent Infinity:

Starter chips in emotional companion context should be topic-seeded ("Tell me about your day", "I'm feeling overwhelmed", "Help me process something"), shown only on empty state, 3–4 maximum. Use pill shape for brevity. Follow Claude's pattern: disappear on first send.

---

1.3 Message Bubbles

Assistant messages:

User messages:

Differentiation pattern:

The pattern is asymmetric — assistant is full-width naked text, user is a contained pill bubble. This creates visual hierarchy (assistant content dominates the page canvas; user input is contained/subordinate).

Implication for Silent Infinity:

This is the RIGHT pattern for an emotional companion. The AI's response should feel like it fills the space; the user's message should feel like it was received and held. Adopt this exact asymmetric layout.

---

1.4 Compose Box

Shape and behavior:

Buttons inside compose box (left side):

Buttons inside compose box (right side):

Keyboard shortcuts:

Voice mode activation:

---

1.5 Color Palette — Verified Values

Primary brand palette (sourced from Pi generative-ui Claude guidelines, Anthropic assets):

| Token | Name | Hex | Usage |

|-------|------|-----|-------|

| c-coral-400 | Crail / Terracotta | #D85A30 | Primary accent, CTA buttons |

| c-coral-200 | Light Coral | #F0997B | Hover states |

| c-coral-600 | Dark Coral | #993C1D | Pressed states |

| c-gray-50 | Pampas | #F4F3EE | Page background (light mode) |

| c-gray-100 | Cloudy / Warm Gray | #D3D1C7 | Hairlines, card borders |

| c-gray-200 | Mid Gray | #B4B2A9 | Input borders default |

| c-gray-600 | Charcoal | #5F5E5A | Secondary text |

| c-gray-800 | Near Black | #444441 | Body text |

| c-gray-900 | Deep Charcoal | #2C2C2A | Page background (dark mode) |

| White | White | #FFFFFF | Card/panel surfaces |

Dark mode specifics (confirmed):

Accent / oklch equivalents (for CSS-native implementation):

Typography:

Purple used for:

---

1.6 Feedback Widget

Mechanism: Inline thumbs-up / thumbs-down at bottom of each assistant message, in a message action toolbar

- Dropdown: "What type of issue do you wish to report?"

- Text area: "What was unsatisfying about this response?"

- Submit button

When it appears: after each complete assistant turn; visible throughout conversation not just at end

Implication for Silent Infinity:

Adopt the same pattern. Thumbs inline, thumbs-down opens a lightweight bottom sheet (not full modal on mobile) with 2–3 tap options (too long / felt off / missed the point) and optional free text. Do NOT use stars — they create indecision. Binary + optional text is the minimum viable signal.

---

1.7 Spacing, Typography Ratios

Based on observed screenshots and design comparisons:

| Property | Value |

|----------|-------|

| Page horizontal padding | 20–24px mobile / 40–48px desktop |

| Message vertical gap | 24–32px between turns |

| Compose box padding | 12–16px vertical, 16–20px horizontal |

| Border radius (compose) | 24px |

| Border radius (user bubble) | 18px |

| Border radius (cards/chips) | 12px |

| Line height (body) | 1.65–1.75 |

| Font size (body) | 15–16px |

| Font size (meta/labels) | 12–13px |

| Max content width | ~680px centered |

---

PART 1 — Top 3 R-Numbers to Ship

R0212 — Claude-palette adoption (Effort: S, ~2 days)

Replace any existing brand colors in Silent Infinity with the verified Claude-adjacent warm palette: #F4F3EE page background, #2B2A27 dark mode, coral #D85A30 for primary CTA. Update CSS variables in one pass. Immediate visual lift.

R0213 — Asymmetric bubble layout (Effort: S, ~1–2 days)

Implement user-right-contained-bubble vs. AI-full-width-naked-text pattern. This is the single biggest visual differentiation that makes it feel "premium AI" rather than SMS-clone. Max content width 680px centered.

R0214 — Compose box + inline feedback (Effort: M, ~3–4 days)

Auto-expanding textarea with 24px radius, coral send button appears on input, sound-wave voice icon right side, thumbs inline below each AI message, thumbs-down bottom sheet with 3 tap options. Covers feedback loop in same R-number.

---

PART 2 — Client-Side STT (A013) Implementation Spec

2.1 Web Speech API — Browser Support 2026

Current state (April 2026):

| Browser | Support | Notes |

|---------|---------|-------|

| Chrome desktop (v33+) | Partial | Cloud STT via Google servers by default |

| Chrome v139+ desktop | Partial + on-device | processLocally: true flag enables on-device (shipped August 2025) |

| Edge (v79+) | Partial | Chromium-based, same as Chrome |

| Firefox (all) | Not supported | Never implemented SpeechRecognition |

| Safari macOS (v14.1+) | Partial | Sends audio to Apple servers; shows permission modal |

| Safari iOS (v14.5+) | Partial | Sends to Apple servers; still server-based despite label |

| Chrome iOS | Not supported | WebKit restriction — same as Safari iOS |

| Chrome Android (v134+) | Partial | Google cloud STT |

| Firefox Android | Not supported | |

| Samsung Internet | Partial | Chromium-based |

Global reach: ~88% of users are on browsers with at least partial SpeechRecognition support (CanIUse, 2026). "Partial" means the API exists but implementation varies.

Critical caveat: The processLocally: true property is Chrome 139+ desktop ONLY (shipped August 2025). iOS Safari's "partial support" still routes audio to Apple servers — it is NOT on-device from the web.

---

2.2 iOS Safari — Actual Behavior

Bottom line for Silent Infinity iOS web: Web Speech API on iOS = Apple cloud. You get the browser convenience but not true local processing. For true on-device iOS STT you'd need a React Native / native layer.

---

2.3 Android Chrome — Actual Behavior

---

2.4 Capability Detection


// Tier-1: Check API availability
const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
const isSupported = !!SpeechRecognition;

// Tier-2: Check on-device availability (Chrome 139+ only)
async function checkOnDevice(lang = 'en-US') {
  if (!SpeechRecognition || !SpeechRecognition.available) return 'unavailable';
  const result = await SpeechRecognition.available({ langs: [lang], processLocally: true });
  return result; // 'available' | 'downloadable' | 'downloading' | 'unavailable'
}

// Tier-3: Install language pack if downloadable
async function ensureOnDevice(lang = 'en-US') {
  const status = await checkOnDevice(lang);
  if (status === 'downloadable' || status === 'downloading') {
    await SpeechRecognition.install({ langs: [lang], processLocally: true });
  }
  return status !== 'unavailable';
}

// Platform detection for routing
function getSTTTier() {
  const ua = navigator.userAgent;
  const isIOS = /iPad|iPhone|iPod/.test(ua);
  const isChrome139Plus = /Chrome\/(\d+)/.exec(ua)?.[1] >= 139;

  if (!SpeechRecognition) return 'server-only'; // Firefox, older browsers
  if (isIOS) return 'browser-cloud-apple';       // iOS: browser routes to Apple
  if (isChrome139Plus) return 'browser-ondevice-capable'; // Can use processLocally
  return 'browser-cloud-google';                 // Chrome <139, Edge, etc.
}

---

2.5 Continuous vs. Single-Shot


// Single-shot (default): captures until pause detected, fires one result
recognition.continuous = false;
recognition.interimResults = false; // Wait for final only

// Continuous: keeps listening, fires interim + final results
recognition.continuous = true;
recognition.interimResults = true; // Show live transcription as typed

// For Silent Infinity voice chat: use continuous = true, interimResults = true
// Display interim results in the compose box as "ghost text"
// Commit final transcript on end/silence detection

Single-shot is best for: push-to-talk, quick commands

Continuous is best for: free-flow voice conversation, showing the user their words appear live

Silent Infinity should use continuous with interim results for the emotional intimacy of seeing your words appear in real time.

---

2.6 Hybrid Architecture: Client STT + Server Audio for Emotion

The "best of both worlds" design:


User speaks
    |
    +-- MediaRecorder (raw audio PCM/webm chunks, 250ms intervals)
    |       |
    |       +-- WebSocket to server --> Hume EVI or Deepgram emotion pipeline
    |                                   (vocal prosody, arousal, valence)
    |
    +-- SpeechRecognition API (client-side, on-device if available)
            |
            +-- interim results --> compose box ghost text (live)
            +-- final transcript --> sent to LLM for response

Why this works:

Constraint: Hume EVI is a speech-to-speech pipeline; if you want emotion scores only (not full EVI response), you'd use the Expression Measurement API separately, not EVI. The Expression Measurement API accepts audio and returns arousal/valence/expression scores — this is the right tool for the hybrid.


// Dual-stream hybrid implementation sketch
async function startHybridVoice() {
  const stream = await navigator.mediaDevices.getUserMedia({ audio: true });

  // Stream 1: Client STT via Web Speech API
  const recognition = new (window.SpeechRecognition || window.webkitSpeechRecognition)();
  recognition.continuous = true;
  recognition.interimResults = true;
  recognition.lang = 'en-US';
  if ('processLocally' in recognition) recognition.processLocally = true; // Chrome 139+

  recognition.onresult = (event) => {
    const interim = Array.from(event.results)
      .filter(r => !r.isFinal)
      .map(r => r[0].transcript).join('');
    const final = Array.from(event.results)
      .filter(r => r.isFinal)
      .map(r => r[0].transcript).join('');
    updateComposeGhostText(interim);
    if (final) commitTranscript(final);
  };
  recognition.start();

  // Stream 2: Raw audio to server (Hume Expression Measurement)
  const recorder = new MediaRecorder(stream, { mimeType: 'audio/webm;codecs=opus' });
  const humeSocket = new WebSocket('wss://api.hume.ai/v0/stream/models?...');

  recorder.ondataavailable = (e) => {
    if (humeSocket.readyState === WebSocket.OPEN && e.data.size > 0) {
      // Convert blob to base64, send to Hume Expression API
      const reader = new FileReader();
      reader.onload = () => {
        humeSocket.send(JSON.stringify({
          data: reader.result.split(',')[1], // base64 audio
          models: { prosody: {} }
        }));
      };
      reader.readAsDataURL(e.data);
    }
  };
  recorder.start(250); // 250ms chunks

  return { recognition, recorder, stop: () => {
    recognition.stop();
    recorder.stop();
    stream.getTracks().forEach(t => t.stop());
  }};
}

---

2.7 Graceful Fallback Architecture


class SilentInfinitySTT {
  async initialize() {
    this.tier = getSTTTier();

    if (this.tier === 'server-only') {
      // Firefox, no API: fall to server STT (Deepgram/Whisper via MediaRecorder)
      this.strategy = 'server';
    } else if (this.tier === 'browser-ondevice-capable') {
      // Chrome 139+: try on-device first
      const onDeviceReady = await ensureOnDevice('en-US');
      this.strategy = onDeviceReady ? 'local' : 'browser-cloud';
    } else {
      // Browser-routed cloud (Chrome <139, Safari, iOS): still better than our server
      // — we don't pay Deepgram, browser handles it natively
      this.strategy = 'browser-cloud';
    }
  }

  async start(onInterim, onFinal, onError) {
    if (this.strategy === 'server') {
      return this._startServerSTT(onInterim, onFinal, onError);
    }
    return this._startBrowserSTT(onInterim, onFinal, onError, this.strategy === 'local');
  }

  _startBrowserSTT(onInterim, onFinal, onError, processLocally = false) {
    const SR = window.SpeechRecognition || window.webkitSpeechRecognition;
    const r = new SR();
    r.continuous = true;
    r.interimResults = true;
    r.lang = 'en-US';
    if (processLocally && 'processLocally' in r) r.processLocally = true;

    r.onresult = (e) => {
      const last = e.results[e.results.length - 1];
      last.isFinal ? onFinal(last[0].transcript, last[0].confidence)
                   : onInterim(last[0].transcript);
    };
    r.onerror = (e) => {
      // If on-device fails, fall back to browser cloud
      if (processLocally && e.error === 'no-speech') {
        this._startBrowserSTT(onInterim, onFinal, onError, false);
      } else {
        onError(e.error);
      }
    };
    r.start();
    return { stop: () => r.stop() };
  }

  async _startServerSTT(onInterim, onFinal, onError) {
    // Existing Deepgram/Whisper server path — keep as is
    const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
    const recorder = new MediaRecorder(stream);
    const ws = new WebSocket(`${API_BASE}/voice/stream`); // existing endpoint

    recorder.ondataavailable = (e) => ws.send(e.data);
    ws.onmessage = (e) => {
      const { type, transcript, confidence } = JSON.parse(e.data);
      type === 'interim' ? onInterim(transcript) : onFinal(transcript, confidence);
    };
    recorder.start(100);
    return { stop: () => { recorder.stop(); ws.close(); stream.getTracks().forEach(t => t.stop()); }};
  }
}

---

2.8 Cost Model — How Much We Save

Current server-side STT costs (Deepgram Nova-3, PAYG):

| Metric | Value |

|--------|-------|

| Cost per minute | $0.0077 |

| Cost per hour | $0.46 |

| 1,000 hours/month | $462/month |

| 10,000 hours/month | ~$4,200/month (with Growth Plan discount ~$3,900) |

OpenAI Whisper (alternative server path):

Web Speech API cost: $0.00/min — zero direct cost to us

Break-even math:

At 1,000 monthly active users with average 5 minutes of voice per session and 10 sessions/month:

The real savings multiplier: beyond direct cost, local STT also eliminates:

---

2.9 Privacy Posture

| Path | What leaves device | Who processes |

|------|-------------------|---------------|

| Web Speech API (cloud) | Raw audio | Google (Chrome) or Apple (Safari) |

| Web Speech API (processLocally, Chrome 139+) | Nothing | On-device language model |

| Server STT (Deepgram) | Raw audio | Deepgram cloud |

| Server STT (Whisper self-hosted) | Raw audio | Your server only |

Winner for privacy: processLocally: true on Chrome 139+ — audio never leaves device

iOS reality: both Web Speech API AND server STT send audio off-device; only difference is Apple vs. your server

Marketing angle: "Your voice stays on your device" is only true for Chrome 139+ users. For iOS, accurate messaging is: "Your voice is processed securely and never stored."

---

2.10 What We Lose vs. Server STT

| Feature | Web Speech API | Deepgram | Whisper |

|---------|----------------|----------|---------|

| Word-level timestamps | No | Yes | Yes (with verbose_json) |

| Speaker diarization | No | Yes (add-on) | No (base model) |

| Confidence per word | No (phrase-level only) | Yes | No |

| Noise robustness | Medium | High | High |

| Custom vocabulary | No | Yes (keywords) | No |

| Offline capability | Chrome 139+ only | No | Self-hosted yes |

| Emotion/prosody | No | No | No |

| Latency (first word) | Very low (<200ms) | ~300–500ms | ~500ms+ |

| Multi-language auto-detect | No | Yes | Yes |

Key losses for Silent Infinity specifically:

1. No word timestamps — means no karaoke-style word highlight replay, no precise sync to TTS playback

2. No speaker diarization — not relevant for 1:1 companion use case; skip

3. Weaker noise robustness — could cause issues in noisy environments; Chrome's on-device model has smaller vocabulary

Recommendation: Accept these trade-offs for the base voice path. Keep Deepgram/Whisper as the server-side fallback for users on Firefox, iOS PWA, or when on-device quality is poor.

---

PART 2 — Top 3 R-Numbers to Ship

A013-R1 — Client STT with tier detection (Effort: M, ~3–4 days)

Implement SilentInfinitySTT class with 3-tier detection (local / browser-cloud / server fallback). Add processLocally: true for Chrome 139+ with language pack install flow. Wire to existing /voice page. Show live interim results as ghost text in compose box. Zero new dependencies.

A013-R2 — Dual-stream hybrid: local STT + emotion audio (Effort: M–L, ~5–7 days)

Parallel MediaRecorder stream for Hume Expression Measurement API alongside local SpeechRecognition. Server receives emotion scores (arousal/valence/prosody) per audio chunk; client-side transcript goes to LLM separately. This achieves zero transcript cost + full emotional context. Requires Hume Expression API integration (separate from EVI).

A013-R3 — iOS fallback quality UX (Effort: S, ~1–2 days)

Since iOS can't do true on-device STT from web, add a clear UX signal: "Using Apple's voice services" disclaimer when on iOS Safari (Web Speech API cloud path) vs. silent routing to server STT. Let user choose. This is a trust/transparency feature. Affects perceived privacy posture significantly.

---

COMBINED SHIPPING PRIORITY

Recommended execution order given Harnoor's "do local voice chat first" directive:

| # | R-Number | What | Effort | Unblocks |

|---|----------|------|--------|----------|

| 1 | A013-R1 | Client STT tier detection + compose ghost text | M (3–4d) | All voice features |

| 2 | R0212 | Claude palette CSS vars | S (2d) | Visual feel |

| 3 | R0213 | Asymmetric bubble layout | S (1–2d) | Chat feel |

| 4 | A013-R2 | Dual-stream emotion hybrid | M–L (5–7d) | Emotional intelligence |

| 5 | R0214 | Compose box + inline feedback | M (3–4d) | User signal loop |

| 6 | A013-R3 | iOS STT transparency UX | S (1–2d) | Trust/privacy |

---

Sources Consulted

1. MDN Web Docs — Web Speech API: https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API

2. MDN — Using the Web Speech API: https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API/Using_the_Web_Speech_API

3. CanIUse — Speech Recognition: https://caniuse.com/speech-recognition

4. blog.addpipe.com — Deep Dive Web Speech API: https://blog.addpipe.com/a-deep-dive-into-the-web-speech-api/

5. Medium — On-Device Speech UIs in Chrome 139: https://medium.com/@roman_fedyskyi/on-device-speech-uis-in-chrome-139-4b9f0397b9c9

6. MDN — SpeechRecognition.processLocally: https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition/processLocally

7. xjavascript.com — iOS Speech Recognition Support: https://www.xjavascript.com/blog/add-ios-speech-recognition-support-for-web-app/

8. Apple Developer — WWDC25 SpeechAnalyzer: https://developer.apple.com/videos/play/wwdc2025/277/

9. Hume AI — EVI Architecture: https://dev.hume.ai/docs/speech-to-speech-evi/overview

10. Deepgram Pricing: https://brasstranscripts.com/blog/deepgram-pricing-per-minute-2025-real-time-vs-batch

11. OpenAI Whisper Pricing: https://costgoat.com/pricing/openai-transcription

12. IntuitionLabs — AI UI Comparison 2025: https://intuitionlabs.ai/articles/conversational-ai-ui-comparison-2025

13. Guideflow — Claude.ai Feedback UI: https://www.guideflow.com/tutorial/how-to-give-negative-feedback-on-a-response-in-claudeai

14. Claude Help Center — Voice Mode: https://support.claude.com/en/articles/11101966-using-voice-mode

15. Pi Generative-UI — Claude Color Palette: https://github.com/Michaelliv/pi-generative-ui/blob/main/.pi/extensions/generative-ui/claude-guidelines/sections/color_palette.md

16. Begins With AI — Claude Logo Colors: https://beginswithai.com/claude-ai-logo-color-codes-fonts-downloadable-assets/

17. KeyDiscussions — Claude Feedback Privacy: https://keydiscussions.com/2025/09/29/dont-even-dismiss-the-how-is-claude-doing-this-session-prompt-as-it-may-compromise-your-chats-privacy/

18. LinkedIn — Claude Code Loading Phrases: https://www.linkedin.com/posts/sparky-witte_i-recently-started-using-claude-code-and-activity-7416496806093672448-0a3F

---

End of memo — SCOUT, 2026-04-25