Silent Infinity — Crisis Detection & Response Architecture

Status: public documentation (Kimi audit R12)

Last updated: 2026-04-23

Audience: users · researchers · regulators · clinical advisors

Summary

Silent Infinity is not a crisis service. It is a contemplative AI chat

that implements multiple layers of crisis detection so that when a user

expresses danger to self or others, the session shifts into a supportive

protocol and crisis resources are offered prominently.

This document is the open publication of that architecture, per Kimi.ai

PhD audit R12 ("Open Crisis Protocol Documentation") and medRxiv 2026

recommendations on LLM-based suicide intervention.

Architecture Overview


User turn
  │
  ▼
┌─────────────────────────────────────────────────┐
│ Layer 1 — Regex Catalog (primary safety gate)   │
│   guardrails.py                                  │
│   Built-in patterns + external JSON catalog      │
│   Microseconds · fail-safe · always applied      │
└──────────────────┬──────────────────────────────┘
                   │ flag + resources-ready
                   ▼
┌─────────────────────────────────────────────────┐
│ Layer 2 — Haiku Classifier (parallel validator) │
│   feedback_monitor.classify_crisis               │
│   Fire-and-forget tool_use call                  │
│   {severity 0-4, signals, action, fpr}           │
│   ~200ms · does NOT block response               │
└──────────────────┬──────────────────────────────┘
                   │ logged + metric
                   ▼
┌─────────────────────────────────────────────────┐
│ Layer 3 — Main Sonnet Response                   │
│   System prompt includes crisis-handling rules   │
│   If Layer 1 flagged: resources appended after   │
│   "done" event so user sees them prominently     │
│   Voice mode (W7): 1.5s silence prefix           │
└─────────────────────────────────────────────────┘

Layer 1 — Regex Catalog

Location: src/guardrails.py

Latency: microseconds

Coverage: built-in fallback patterns (always applied) + external JSON

catalog at patterns/crisis_patterns.json (20 patterns, severity 1-4).

Purpose: fast, deterministic, fail-safe. If this layer fires, the

turn is tagged crisis=True and the mirror's response is followed by

a structured crisis event containing emergency resources.

Known gap (being addressed): pure regex cannot distinguish metaphor

from intent. "This job is killing me" false-positives; "I don't see a

point anymore" false-negatives. This is why Layer 2 exists.

Layer 2 — Haiku Classifier (Bedrock tool_use)

Location: src/feedback_monitor.py::classify_crisis

Latency: ~200ms (fire-and-forget, never blocks main response)

Model: Claude Haiku 4.5 on Bedrock

Schema (structured tool_use):


{
  "severity": 0-4,
  "signals": ["list of specific triggering phrases"],
  "recommended_action": "none|warmer_tone|offer_resources|emergency_path",
  "false_positive_risk": 0.0-1.0,
  "reasoning": "one-sentence explanation"
}

Severity scale:

0 — none
1 — general distress (sad / overwhelmed / lost)
2 — crisis-adjacent (hopeless / giving-up language, no self-harm)
3 — active ideation (explicit self-harm or death wish, no plan)
4 — imminent (stated plan, access, timeline)

Divergence signals emitted to CloudWatch:

CrisisRegexMiss — classifier sev ≥ 2 AND regex didn't fire (catalog blind-spot)
CrisisRegexFPR — regex fired AND classifier false_positive_risk ≥ 0.6 (over-trigger)
CrisisClassifierOK — both agree

An alarm fires when CrisisRegexMiss ≥ 3 in any 1-hour window (SI-Crisis-RegexMiss CloudWatch alarm).

Layer 3 — Main Response (Sonnet 4.6)

Location: src/bedrock_client.py::invoke_stream

System prompt: prompts/system_v1.md sections <safety_boundaries> and crisis-handling rules

When Layer 1 flagged crisis, the handler appends a structured crisis event

after the mirror's response:


{
  "type": "crisis",
  "resources": [
    {"label": "988 — Suicide & Crisis Lifeline (US, free)", "href": "tel:988"},
    {"label": "Crisis Text Line", "href": "sms:741741"},
    {"label": "findahelpline.com (global)", "href": "https://findahelpline.com"},
    {"label": "If in immediate danger, call 911"}
  ]
}

The UI renders these as prominent chips below the reply.

Voice Mode — W7 Silence Prefix

When user's STT transcript matches deep-disclosure patterns (grief,

trauma, crisis, divorce), the first TTS chunk is prefixed with 1500ms

of silence (<break time="1500ms"/>). The AI's voice response begins

with witnessing silence rather than rushing into words.

Explicit non-claims

Silent Infinity is not:

A therapist or licensed counselor
A crisis intervention service (call 988 or 911 for emergencies)
A medical device (not FDA-cleared)
A substitute for human connection

The system prompt prohibits:

Claims of clinical diagnosis
Promises to "keep you safe"
Treatment advice
Roleplaying as a therapist

What happens when regex + classifier disagree

Regex fires, classifier false-positive-risk high: response still

includes resources (safer to over-offer), metric logged for catalog tuning.

Classifier flags ≥ 2, regex didn't: response does NOT auto-append

resources (trust the primary gate for tier-1 behavior), BUT CrisisRegexMiss

metric fires, triggering an operator review to update the regex catalog.

Observability

All three layers write to CloudWatch namespaces:

Innerverse/Mirror — per-turn metrics
Innerverse/Crisis — divergence counters
Innerverse/Quality — nightly prompt-eval (safety_compliance weighted 2.0×)

Crisis-path adversarial tests exist in the nightly prompt-eval dataset

(eval/conversations/03-crisis-ideation.yaml). Zero tolerance for

false-negatives on explicit self-harm language.

Contributing patterns

External parties who want to contribute crisis patterns can submit them

via the Feedback link on silentinfinity.com. We are actively building

a peer-reviewed catalog.

Ethics

Built per:

WHO Ethics & Governance of Artificial Intelligence for Health (2024)
JMIR "Is This Chatbot Safe and Evidence-Based?" (Parks et al. 2025)
medRxiv "Suicide- and crisis-risk detection using large language models" (Zhang et al. 2026)
Brown University ethics study (Iftikhar et al. 2025)

Open questions

1. Catalog tuning cadence: how often should we update regex patterns based on CrisisRegexMiss alarms? Currently ad-hoc.

2. Haiku model drift: what happens when Haiku 4.5 is retired? We'll need a re-baseline.

3. Cross-language: current catalogs are English-only (Kimi audit R9).

Change log

2026-04-22 — R0175 shipped structured tool_use crisis classifier
2026-04-23 — R0175b wired fire-and-forget · R0197 CloudWatch metric + alarm · W7 voice silence prefix
2026-04-23 — this document published (Kimi R12)