Status: public documentation (Kimi audit R12)
Last updated: 2026-04-23
Audience: users · researchers · regulators · clinical advisors
Silent Infinity is not a crisis service. It is a contemplative AI chat
that implements multiple layers of crisis detection so that when a user
expresses danger to self or others, the session shifts into a supportive
protocol and crisis resources are offered prominently.
This document is the open publication of that architecture, per Kimi.ai
PhD audit R12 ("Open Crisis Protocol Documentation") and medRxiv 2026
recommendations on LLM-based suicide intervention.
User turn
│
▼
┌─────────────────────────────────────────────────┐
│ Layer 1 — Regex Catalog (primary safety gate) │
│ guardrails.py │
│ Built-in patterns + external JSON catalog │
│ Microseconds · fail-safe · always applied │
└──────────────────┬──────────────────────────────┘
│ flag + resources-ready
▼
┌─────────────────────────────────────────────────┐
│ Layer 2 — Haiku Classifier (parallel validator) │
│ feedback_monitor.classify_crisis │
│ Fire-and-forget tool_use call │
│ {severity 0-4, signals, action, fpr} │
│ ~200ms · does NOT block response │
└──────────────────┬──────────────────────────────┘
│ logged + metric
▼
┌─────────────────────────────────────────────────┐
│ Layer 3 — Main Sonnet Response │
│ System prompt includes crisis-handling rules │
│ If Layer 1 flagged: resources appended after │
│ "done" event so user sees them prominently │
│ Voice mode (W7): 1.5s silence prefix │
└─────────────────────────────────────────────────┘
Location: src/guardrails.py
Latency: microseconds
Coverage: built-in fallback patterns (always applied) + external JSON
catalog at patterns/crisis_patterns.json (20 patterns, severity 1-4).
Purpose: fast, deterministic, fail-safe. If this layer fires, the
turn is tagged crisis=True and the mirror's response is followed by
a structured crisis event containing emergency resources.
Known gap (being addressed): pure regex cannot distinguish metaphor
from intent. "This job is killing me" false-positives; "I don't see a
point anymore" false-negatives. This is why Layer 2 exists.
Location: src/feedback_monitor.py::classify_crisis
Latency: ~200ms (fire-and-forget, never blocks main response)
Model: Claude Haiku 4.5 on Bedrock
Schema (structured tool_use):
{
"severity": 0-4,
"signals": ["list of specific triggering phrases"],
"recommended_action": "none|warmer_tone|offer_resources|emergency_path",
"false_positive_risk": 0.0-1.0,
"reasoning": "one-sentence explanation"
}
Severity scale:
Divergence signals emitted to CloudWatch:
CrisisRegexMiss — classifier sev ≥ 2 AND regex didn't fire (catalog blind-spot)CrisisRegexFPR — regex fired AND classifier false_positive_risk ≥ 0.6 (over-trigger)CrisisClassifierOK — both agreeAn alarm fires when CrisisRegexMiss ≥ 3 in any 1-hour window (SI-Crisis-RegexMiss CloudWatch alarm).
Location: src/bedrock_client.py::invoke_stream
System prompt: prompts/system_v1.md sections <safety_boundaries> and crisis-handling rules
When Layer 1 flagged crisis, the handler appends a structured crisis event
after the mirror's response:
{
"type": "crisis",
"resources": [
{"label": "988 — Suicide & Crisis Lifeline (US, free)", "href": "tel:988"},
{"label": "Crisis Text Line", "href": "sms:741741"},
{"label": "findahelpline.com (global)", "href": "https://findahelpline.com"},
{"label": "If in immediate danger, call 911"}
]
}
The UI renders these as prominent chips below the reply.
When user's STT transcript matches deep-disclosure patterns (grief,
trauma, crisis, divorce), the first TTS chunk is prefixed with 1500ms
of silence (<break time="1500ms"/>). The AI's voice response begins
with witnessing silence rather than rushing into words.
Silent Infinity is not:
The system prompt prohibits:
Regex fires, classifier false-positive-risk high: response still
includes resources (safer to over-offer), metric logged for catalog tuning.
Classifier flags ≥ 2, regex didn't: response does NOT auto-append
resources (trust the primary gate for tier-1 behavior), BUT CrisisRegexMiss
metric fires, triggering an operator review to update the regex catalog.
All three layers write to CloudWatch namespaces:
Innerverse/Mirror — per-turn metricsInnerverse/Crisis — divergence countersInnerverse/Quality — nightly prompt-eval (safety_compliance weighted 2.0×)Crisis-path adversarial tests exist in the nightly prompt-eval dataset
(eval/conversations/03-crisis-ideation.yaml). Zero tolerance for
false-negatives on explicit self-harm language.
External parties who want to contribute crisis patterns can submit them
via the Feedback link on silentinfinity.com. We are actively building
a peer-reviewed catalog.
Built per:
1. Catalog tuning cadence: how often should we update regex patterns based on CrisisRegexMiss alarms? Currently ad-hoc.
2. Haiku model drift: what happens when Haiku 4.5 is retired? We'll need a re-baseline.
3. Cross-language: current catalogs are English-only (Kimi audit R9).