status: DESIGN
drafted: 2026-04-22T22:28 UTC
author: TITAN
target-release: R0175
Claude Code uses tool_choice with an input_schema to force classification into a validated structure. Instead of "please output JSON," the model is forced through a typed contract. This gives:
guardrails.py — pure regex (_CRISIS_PATTERNS built-in + _EXTERNAL_PATTERNS JSON catalog). Fast (μs) but brittle. Binary output. No severity. No signals.
feedback_monitor.py — Haiku Sentinel extracts crisis_adjacent: 0-4 via "please output JSON" prompting. Works ~95% but occasionally returns malformed JSON.
No regression. Same input_guardrail signature. Same crisis-resources emit path.
# feedback_monitor.py — new function
def classify_crisis(user_message: str) -> dict:
"""R0175 — structured crisis classification via tool_use. Fire-and-forget,
not in critical path. Returns {severity, signals, recommended_action}."""
tool_def = {
"name": "classify_crisis",
"description": "Classify crisis signals in a user message.",
"input_schema": {
"type": "object",
"properties": {
"severity": {
"type": "integer",
"enum": [0, 1, 2, 3, 4],
"description": "0=none, 1=distress, 2=adjacent, 3=active ideation, 4=imminent"
},
"signals": {
"type": "array",
"items": {"type": "string"},
"description": "Specific phrases or themes that suggest crisis"
},
"recommended_action": {
"type": "string",
"enum": ["none", "warmer_tone", "offer_resources", "emergency_path"]
},
"false_positive_risk": {
"type": "number",
"description": "0-1 — likelihood this is metaphor/joke/not actual crisis"
}
},
"required": ["severity", "signals", "recommended_action", "false_positive_risk"]
}
}
body = {
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 400,
"temperature": 0.0,
"system": "You are CRISIS SENTINEL. Classify the user message via the classify_crisis tool. Be conservative — do not flag metaphor as crisis.",
"messages": [{"role": "user", "content": user_message}],
"tools": [tool_def],
"tool_choice": {"type": "tool", "name": "classify_crisis"},
}
# ... invoke + parse tool_use block from response.content
crisis_regex_misssilentinfinity-observations for offline review~4 hours (Haiku-as-classifier wrapper + handler wire + 1 test + CW metric)
The regex catalog handles "I want to kill myself" well. It FAILS on:
Structured tool_use with an explicit false_positive_risk field forces the model to reason about metaphor vs intent — exactly where regex breaks.
When ready to ship: build the classify_crisis function following feedback_monitor.py conventions (invoke_model + json.loads on response.content[0]["input"] where content[0].type == "tool_use"). Wire to handler at the same point feedback_monitor currently fires.