Purpose: Automated production audit of response quality + anti-pattern detection. Run against LogGroup /aws/lambda/innerverse-mirror every Tuesday 09:00 ET as part of the HERALD rhythm. Results emailed to harnoors@gmail.com.
Metrics source: _emit_emf_metrics() emission path in handler.py (R0156). Every chat turn now includes ResponseLenChars, ResponseParaCount, FrameworkMentionCount.
---
filter @message like /ResponseLenChars/
| parse @message '"ResponseLenChars": *,' as responseLen
| stats count() as total,
count(responseLen < 200) as short_count,
(count(responseLen < 200) * 100.0 / count()) as short_pct
by bin(1h)
| sort @timestamp desc
Healthy target: short_pct < 10% per hour. Spike = sage prompt compliance slipping.
---
filter @message like /FrameworkMentionCount/
| parse @message '"FrameworkMentionCount": *,' as frameworks
| parse @message '"ResponseLenChars": *,' as responseLen
| filter responseLen > 200
| stats count() as total,
count(frameworks = 0) as unsourced,
(count(frameworks = 0) * 100.0 / count()) as unsourced_pct
by bin(1h)
| sort @timestamp desc
Healthy target: unsourced_pct < 20% per hour. If higher, add 2-3 example few-shots to the prompt.
---
filter @message like /ResponseParaCount/
| parse @message '"ResponseParaCount": *,' as paras
| stats count() as total,
count(paras = 1) as one_para,
count(paras = 2) as two_para,
count(paras >= 3) as three_plus
by bin(6h)
| sort @timestamp desc
Healthy target: three_plus >= 80%. Enforcement rule (C) of the core_behavior_rule mandates 3+ paragraphs in reflective mode.
---
filter @message like /VoiceTurnTotalMs/
| parse @message '"VoiceTurnTotalMs": *,' as totalMs
| parse @message '"SttMs": *,' as sttMs
| parse @message '"LlmFirstTokenMs": *,' as llmMs
| parse @message '"TtsFirstAudioMs": *,' as ttsMs
| stats
avg(totalMs) as avg_total_ms,
pct(totalMs, 50) as p50_total_ms,
pct(totalMs, 95) as p95_total_ms,
pct(sttMs, 50) as p50_stt_ms,
pct(ttsMs, 50) as p50_tts_ms
by bin(1h)
| sort @timestamp desc
Healthy target: p50_total_ms < 4000, p50_stt_ms < 2000, p50_tts_ms < 3500. R0148 PCM16 path should hold these.
---
filter @message like /CacheHit/
| parse @message '"CacheHit": *,' as hit
| stats avg(hit) * 100 as cache_hit_pct,
count() as total
by bin(1h)
| sort @timestamp desc
Healthy target: cache_hit_pct > 70% after warmup. Low hit rate = system prompt cache checkpoint not firing.
---
filter ErrorCode exists
| stats count() as n by ErrorCode, bin(6h)
| sort @timestamp desc
Healthy target: no single ErrorCode > 5% of all invokes. stt_empty should drop to ~0% now that PCM16 landed (R0148).
---
These queries are registered in herald-cron-registration.json under the tuesday-redteam cron entry. HERALD's wrapper:
1. Runs each query via aws logs start-query
2. Waits for Complete status
3. Aggregates results into a single weekly email
4. Compares each metric to the "healthy target" threshold
5. Flags any breach in the email subject