CC Pattern 5 — Crisis Detection as Structured Tool Use

status: DESIGN

drafted: 2026-04-22T22:28 UTC

author: TITAN

target-release: R0175

The pattern (from Claude Code)

Claude Code uses tool_choice with an input_schema to force classification into a validated structure. Instead of "please output JSON," the model is forced through a typed contract. This gives:

Guaranteed schema adherence (no parse errors, no missing fields)
Stronger reasoning (model is literally calling a function, not emitting text)
Better calibrated scores (schema constrains value ranges)

Current state (Silent Infinity)

guardrails.py — pure regex (_CRISIS_PATTERNS built-in + _EXTERNAL_PATTERNS JSON catalog). Fast (μs) but brittle. Binary output. No severity. No signals.

feedback_monitor.py — Haiku Sentinel extracts crisis_adjacent: 0-4 via "please output JSON" prompting. Works ~95% but occasionally returns malformed JSON.

Upgrade — R0175 Crisis Classifier v2

1. Regex stays primary (fast + safe + free)

No regression. Same input_guardrail signature. Same crisis-resources emit path.

2. Add fire-and-forget Haiku Crisis Classifier (tool_use)


# feedback_monitor.py — new function
def classify_crisis(user_message: str) -> dict:
    """R0175 — structured crisis classification via tool_use. Fire-and-forget,
    not in critical path. Returns {severity, signals, recommended_action}."""

    tool_def = {
        "name": "classify_crisis",
        "description": "Classify crisis signals in a user message.",
        "input_schema": {
            "type": "object",
            "properties": {
                "severity": {
                    "type": "integer",
                    "enum": [0, 1, 2, 3, 4],
                    "description": "0=none, 1=distress, 2=adjacent, 3=active ideation, 4=imminent"
                },
                "signals": {
                    "type": "array",
                    "items": {"type": "string"},
                    "description": "Specific phrases or themes that suggest crisis"
                },
                "recommended_action": {
                    "type": "string",
                    "enum": ["none", "warmer_tone", "offer_resources", "emergency_path"]
                },
                "false_positive_risk": {
                    "type": "number",
                    "description": "0-1 — likelihood this is metaphor/joke/not actual crisis"
                }
            },
            "required": ["severity", "signals", "recommended_action", "false_positive_risk"]
        }
    }

    body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 400,
        "temperature": 0.0,
        "system": "You are CRISIS SENTINEL. Classify the user message via the classify_crisis tool. Be conservative — do not flag metaphor as crisis.",
        "messages": [{"role": "user", "content": user_message}],
        "tools": [tool_def],
        "tool_choice": {"type": "tool", "name": "classify_crisis"},
    }
    # ... invoke + parse tool_use block from response.content

3. Wire into handler

Run Haiku classifier in parallel thread when regex DOES fire (to validate, reduce false positives)
Run Haiku classifier in parallel thread when regex does NOT fire but session markers suggest risk (e.g. very short replies, repeated "I don't know", low affect)
If classifier severity ≥ 3 and regex missed → emit CW alarm crisis_regex_miss
Log classifier decisions to DynamoDB silentinfinity-observations for offline review

4. Metric: crisis_regex_miss_per_day

Goal: 0 per day once catalog stabilizes
Dashboard: CloudWatch widget, auto-email if > 5/day

Effort

~4 hours (Haiku-as-classifier wrapper + handler wire + 1 test + CW metric)

Why this matters for wellness

The regex catalog handles "I want to kill myself" well. It FAILS on:

Metaphor: "this job is killing me" → false positive
Oblique: "I don't see a point anymore" → false negative
Cultural idiom: "I'm done" (depending on context) → ambiguous

Structured tool_use with an explicit false_positive_risk field forces the model to reason about metaphor vs intent — exactly where regex breaks.

Dependencies

Bedrock Haiku 4.5 with tool_use (supported, tested)
Existing feedback_monitor infrastructure
No new DynamoDB tables

Risk

Added Bedrock cost: ~$0.0003 per classification × N/day. At 1000 DAU with 10 turns = 10000 calls = $3/day. Worth it.
Added latency only if we run in critical path — this design keeps it fire-and-forget.

Next step

When ready to ship: build the classify_crisis function following feedback_monitor.py conventions (invoke_model + json.loads on response.content[0]["input"] where content[0].type == "tool_use"). Wire to handler at the same point feedback_monitor currently fires.