ALL MEMOS Download .docx

Silent Infinity — Crisis Detection & Response Architecture

Status: public documentation (Kimi audit R12)

Last updated: 2026-04-23

Audience: users · researchers · regulators · clinical advisors

Summary

Silent Infinity is not a crisis service. It is a contemplative AI chat

that implements multiple layers of crisis detection so that when a user

expresses danger to self or others, the session shifts into a supportive

protocol and crisis resources are offered prominently.

This document is the open publication of that architecture, per Kimi.ai

PhD audit R12 ("Open Crisis Protocol Documentation") and medRxiv 2026

recommendations on LLM-based suicide intervention.

Architecture Overview


User turn
  │
  ▼
┌─────────────────────────────────────────────────┐
│ Layer 1 — Regex Catalog (primary safety gate)   │
│   guardrails.py                                  │
│   Built-in patterns + external JSON catalog      │
│   Microseconds · fail-safe · always applied      │
└──────────────────┬──────────────────────────────┘
                   │ flag + resources-ready
                   ▼
┌─────────────────────────────────────────────────┐
│ Layer 2 — Haiku Classifier (parallel validator) │
│   feedback_monitor.classify_crisis               │
│   Fire-and-forget tool_use call                  │
│   {severity 0-4, signals, action, fpr}           │
│   ~200ms · does NOT block response               │
└──────────────────┬──────────────────────────────┘
                   │ logged + metric
                   ▼
┌─────────────────────────────────────────────────┐
│ Layer 3 — Main Sonnet Response                   │
│   System prompt includes crisis-handling rules   │
│   If Layer 1 flagged: resources appended after   │
│   "done" event so user sees them prominently     │
│   Voice mode (W7): 1.5s silence prefix           │
└─────────────────────────────────────────────────┘

Layer 1 — Regex Catalog

Location: src/guardrails.py

Latency: microseconds

Coverage: built-in fallback patterns (always applied) + external JSON

catalog at patterns/crisis_patterns.json (20 patterns, severity 1-4).

Purpose: fast, deterministic, fail-safe. If this layer fires, the

turn is tagged crisis=True and the mirror's response is followed by

a structured crisis event containing emergency resources.

Known gap (being addressed): pure regex cannot distinguish metaphor

from intent. "This job is killing me" false-positives; "I don't see a

point anymore" false-negatives. This is why Layer 2 exists.

Layer 2 — Haiku Classifier (Bedrock tool_use)

Location: src/feedback_monitor.py::classify_crisis

Latency: ~200ms (fire-and-forget, never blocks main response)

Model: Claude Haiku 4.5 on Bedrock

Schema (structured tool_use):


{
  "severity": 0-4,
  "signals": ["list of specific triggering phrases"],
  "recommended_action": "none|warmer_tone|offer_resources|emergency_path",
  "false_positive_risk": 0.0-1.0,
  "reasoning": "one-sentence explanation"
}

Severity scale:

Divergence signals emitted to CloudWatch:

An alarm fires when CrisisRegexMiss ≥ 3 in any 1-hour window (SI-Crisis-RegexMiss CloudWatch alarm).

Layer 3 — Main Response (Sonnet 4.6)

Location: src/bedrock_client.py::invoke_stream

System prompt: prompts/system_v1.md sections <safety_boundaries> and crisis-handling rules

When Layer 1 flagged crisis, the handler appends a structured crisis event

after the mirror's response:


{
  "type": "crisis",
  "resources": [
    {"label": "988 — Suicide & Crisis Lifeline (US, free)", "href": "tel:988"},
    {"label": "Crisis Text Line", "href": "sms:741741"},
    {"label": "findahelpline.com (global)", "href": "https://findahelpline.com"},
    {"label": "If in immediate danger, call 911"}
  ]
}

The UI renders these as prominent chips below the reply.

Voice Mode — W7 Silence Prefix

When user's STT transcript matches deep-disclosure patterns (grief,

trauma, crisis, divorce), the first TTS chunk is prefixed with 1500ms

of silence (<break time="1500ms"/>). The AI's voice response begins

with witnessing silence rather than rushing into words.

Explicit non-claims

Silent Infinity is not:

The system prompt prohibits:

What happens when regex + classifier disagree

Regex fires, classifier false-positive-risk high: response still

includes resources (safer to over-offer), metric logged for catalog tuning.

Classifier flags ≥ 2, regex didn't: response does NOT auto-append

resources (trust the primary gate for tier-1 behavior), BUT CrisisRegexMiss

metric fires, triggering an operator review to update the regex catalog.

Observability

All three layers write to CloudWatch namespaces:

Crisis-path adversarial tests exist in the nightly prompt-eval dataset

(eval/conversations/03-crisis-ideation.yaml). Zero tolerance for

false-negatives on explicit self-harm language.

Contributing patterns

External parties who want to contribute crisis patterns can submit them

via the Feedback link on silentinfinity.com. We are actively building

a peer-reviewed catalog.

Ethics

Built per:

Open questions

1. Catalog tuning cadence: how often should we update regex patterns based on CrisisRegexMiss alarms? Currently ad-hoc.

2. Haiku model drift: what happens when Haiku 4.5 is retired? We'll need a re-baseline.

3. Cross-language: current catalogs are English-only (Kimi audit R9).

Change log