ALL MEMOS Download .docx

CC Pattern 5 — Crisis Detection as Structured Tool Use

status: DESIGN

drafted: 2026-04-22T22:28 UTC

author: TITAN

target-release: R0175

The pattern (from Claude Code)

Claude Code uses tool_choice with an input_schema to force classification into a validated structure. Instead of "please output JSON," the model is forced through a typed contract. This gives:

Current state (Silent Infinity)

guardrails.py — pure regex (_CRISIS_PATTERNS built-in + _EXTERNAL_PATTERNS JSON catalog). Fast (μs) but brittle. Binary output. No severity. No signals.

feedback_monitor.py — Haiku Sentinel extracts crisis_adjacent: 0-4 via "please output JSON" prompting. Works ~95% but occasionally returns malformed JSON.

Upgrade — R0175 Crisis Classifier v2

1. Regex stays primary (fast + safe + free)

No regression. Same input_guardrail signature. Same crisis-resources emit path.

2. Add fire-and-forget Haiku Crisis Classifier (tool_use)


# feedback_monitor.py — new function
def classify_crisis(user_message: str) -> dict:
    """R0175 — structured crisis classification via tool_use. Fire-and-forget,
    not in critical path. Returns {severity, signals, recommended_action}."""

    tool_def = {
        "name": "classify_crisis",
        "description": "Classify crisis signals in a user message.",
        "input_schema": {
            "type": "object",
            "properties": {
                "severity": {
                    "type": "integer",
                    "enum": [0, 1, 2, 3, 4],
                    "description": "0=none, 1=distress, 2=adjacent, 3=active ideation, 4=imminent"
                },
                "signals": {
                    "type": "array",
                    "items": {"type": "string"},
                    "description": "Specific phrases or themes that suggest crisis"
                },
                "recommended_action": {
                    "type": "string",
                    "enum": ["none", "warmer_tone", "offer_resources", "emergency_path"]
                },
                "false_positive_risk": {
                    "type": "number",
                    "description": "0-1 — likelihood this is metaphor/joke/not actual crisis"
                }
            },
            "required": ["severity", "signals", "recommended_action", "false_positive_risk"]
        }
    }

    body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 400,
        "temperature": 0.0,
        "system": "You are CRISIS SENTINEL. Classify the user message via the classify_crisis tool. Be conservative — do not flag metaphor as crisis.",
        "messages": [{"role": "user", "content": user_message}],
        "tools": [tool_def],
        "tool_choice": {"type": "tool", "name": "classify_crisis"},
    }
    # ... invoke + parse tool_use block from response.content

3. Wire into handler

4. Metric: crisis_regex_miss_per_day

Effort

~4 hours (Haiku-as-classifier wrapper + handler wire + 1 test + CW metric)

Why this matters for wellness

The regex catalog handles "I want to kill myself" well. It FAILS on:

Structured tool_use with an explicit false_positive_risk field forces the model to reason about metaphor vs intent — exactly where regex breaks.

Dependencies

Risk

Next step

When ready to ship: build the classify_crisis function following feedback_monitor.py conventions (invoke_model + json.loads on response.content[0]["input"] where content[0].type == "tool_use"). Wire to handler at the same point feedback_monitor currently fires.