ALL MEMOS Download .docx

Claude Code Audit — 2026-04-26 22:19 UTC

Cycle: 17th audit of this cadence

Auditor: SCOUT (TITAN research agent)

Baseline: F:/TITAN/plans/advisors/CLAUDE-CODE-ARCHITECTURE-DEEP-DIVE-2026-04-22.md

Prior audit: F:/TITAN/plans/advisors/claude-code-audit-2026-04-26-1619.md (cycle 16, v2.1.119, 0 regressions, T058-T060 filed)

CC version at prior audit: v2.1.119

CC version this cycle: v2.1.119 (confirmed; github.com/anthropics/claude-code/releases fetched 2026-04-26; latest release April 23 23:24 UTC; no new release in six-hour window)

v2.1.120 status: ROLLED BACK. 8 regressions remain open per gist.github.com/yurukusa/a866b4cd2976486156a00c190c39cef6 (last updated 2026-04-25). T052 hard ceiling on T042 still applies. Pin to v2.1.119 (or v2.1.117 if stability is paramount).

Local TITAN install: v2.1.49 (70-version gap; T030 open, ceiling pinned at v2.1.119 per T052)

Next unclaimed T-numbers: T061, T062, T063

Word count: ~2,100

---

1. What Changed in Claude Code Since Last Audit (2026-04-26 16:19 → 2026-04-26 22:19 UTC)

1.1 Version Status: v2.1.119 Unchanged — Six-Hour Stability Confirmed, Cycle 17

Finding: Confirmed (primary sources: github.com/anthropics/claude-code/releases fetched 2026-04-26; releasebot.io/updates/anthropic/claude-code fetched 2026-04-26).

No new release shipped in the six-hour window between cycle 16 (16:19 UTC) and cycle 17 (22:19 UTC). v2.1.119, released April 23 at 23:24 UTC, remains the latest stable build. v2.1.120 continues in rolled-back status. v2.1.121 was searched explicitly and not found — no evidence of a patch release for the v2.1.120 regressions as of this cycle.

Architectural note on the v2.1.120 regression cluster (new framing this cycle):

The regression checklist (gist.github.com/yurukusa/a866b4cd2976486156a00c190c39cef6) records eight distinct failure modes: startup crash on --resume/--continue, auto-update break, silent model swap, two resume-time crashes, reintroduced UI-duplication bug, WSL2-only /mcp freeze, CLAUDE.md-is-ignored regression, and broken sandbox.excludedCommands. This is not a single defect cluster — it is a multi-surface regression pattern suggesting the v2.1.120 build lacked adequate integration test coverage across the compaction/session-state boundary (resume crashes), the settings loading pipeline (CLAUDE.md ignored, sandbox config broken), and the model selection pathway (silent model swap). The architectural implication for TITAN is that upgrading to any release within two releases of a rollback event should be treated as elevated-risk. The T052 ceiling at v2.1.119 is correct.

New architectural item confirmed this cycle (not previously foregrounded):

Quality regression postmortem — reasoning effort, cache bug, verbosity cap (The Register, 2026-04-23): This cycle's search surfaced The Register's coverage of Anthropic's public admission of three quality regressions introduced between March and April 2026. The items are known individually to TITAN (they drove T056 in cycle 15), but this cycle's review of the postmortem language adds one structural signal not previously captured in any audit memo: Anthropic's reversal of the 25-word verbosity cap was accompanied by "account usage level resets for all customers" — an unusual remediation step that implies the quality drop was material enough to require financial compensation. This is the strongest signal yet that verbosity constraints in system prompts are not benign stylistic choices: they measurably degraded evaluated output quality by 3% on the Opus 4.6/4.7 evaluation suite, triggering a company-level response. T056 (add verbosity constraint criterion to SI Feature Readiness Standard) carries higher urgency than its MEDIUM priority suggests. Upgrading to HIGH is recommended — see Recommendation AN below.

Source: github.com/anthropics/claude-code/releases (fetched 2026-04-26); releasebot.io/updates/anthropic/claude-code (fetched 2026-04-26); gist.github.com/yurukusa/a866b4cd2976486156a00c190c39cef6 (last updated 2026-04-25); theregister.com/2026/04/23/anthropic_says_it_has_fixed/ (fetched 2026-04-26).

---

1.2 Newly Confirmed This Cycle: The Anthropic Reasoning-Effort → Verbosity → Cache Regression Sequence as a System Design Signal

Finding: Confirmed (primary source: The Register / Anthropic postmortem, 2026-04-23).

Three separate quality-degrading changes shipped to production Claude Code between March 4 and April 20, 2026, each independently introduced without integration testing against the full evaluation suite:

1. March 4 — Default reasoning effort reduced from high to medium. Rationale: latency reduction. Outcome: measurable quality drop. Reverted April 7. Anthropic's own language: "This was the wrong tradeoff."

2. March 26 — Cache optimization introduced thinking-history wipe bug. The bug caused Claude to clear its thinking sessions between turns, producing forgetful and repetitive responses. Fixed April 10.

3. April 16 — Verbosity cap injected into system prompt (≤25 words between tool calls, ≤100 words final responses). Fixed April 20 after 3% evaluation drop detected.

The design signal for TITAN and SI: These are three independent data points showing that changes to the reasoning infrastructure (effort level), context management (cache/compaction), and system prompt content (verbosity cap) can interact to degrade AI output quality in ways that are not immediately visible from code review alone. None of these changes would have been flagged by a static code diff review. All three required behavioral evaluation to detect.

TITAN implication: TITAN has no behavioral evaluation suite. CC has an internal evaluation suite that caught the verbosity cap regression within 4 days. If TITAN or SI ships a system prompt change that inadvertently degrades response quality, there is currently no automated detection mechanism. T056 (verbosity constraint criterion) is a partial safeguard for SI. But the broader gap — no behavioral regression test suite for either TITAN skills or SI system prompt variants — has never been filed as a task.

Source: theregister.com/2026/04/23/anthropic_says_it_has_fixed/ (fetched 2026-04-26).

---

1.3 ~/.claude Filesystem: No New Skills, Hooks, or MCP Servers Since Cycle 16

Finding: Confirmed (glob scan of C:\Users\Harnoor\.claude executed 2026-04-26).

Filesystem state is unchanged from cycle 16. The glob scan confirms: no skills/ directory, no hooks/ directory (absent — T026 hook has not shipped), plugins/install-counts-cache.json present but no confirmed activated plugins. The projects/ directory shows new subagent session artifacts for the current working directory session — consistent with normal TITAN multi-agent operation during this audit cycle. Project-scoped memory files exist for C--Users-Harnoor-Desktop, C--Users-Harnoor-downloads, C--Users-Harnoor, and C--Users-Harnoor-Desktop-Trillionair-Trillionaire-Trillionaire — the last containing user_business_structure.md (new since last full scan). This file is project-scoped memory, not a skill, and does not constitute a new skill or hook entry.

Observation on the T059 audit prerequisite: T059 requires grepping 13 skill files under ~/.claude/ for inline shell execution patterns. The current glob scan shows no skills/ directory at C:\Users\Harnoor\.claude\skills\. This is consistent with the T022/T023 skills being installed under TITAN's F:/TITAN path or a custom skills directory. T059 should verify the skills path in settings.json before running the grep audit — if skills are stored outside ~/.claude/skills/, the audit must target that path instead.

---

2. Silent Infinity Production Audit Against the 14-Pattern Checklist

Status: No new SI production deployments detected since cycle 16. Gap table carries forward from cycle 16 unchanged. One new evaluation note added for Pattern 11 based on cycle 17 quality-regression research.

| # | Pattern | CC Baseline | SI Status | Gap |

|---|---------|------------|-----------|-----|

| 1 | Memory layering (hot/warm/cold) | MEMORY.md file-tiered | ALIGNED — DDB 4-tier memory.py live | None |

| 2 | System prompt composition (conditional stack) | 6-layer conditional | ALIGNED — versioned + variant + user context injection | None |

| 3 | Structured tool use (schema-validated) | 50 tools, JSON Schema | GAP — capabilities in prose, not formal tool schemas | T025 open |

| 4 | Sub-agent orchestration | Named agents, frontmatter isolation (v2.1.101) | PARTIAL — Chat Sentinel exists; no parallel workers; frontmatter isolation N/A on Lambda | Partial |

| 5 | Verification-before-claim | Harness validates tool results | ALIGNED — system prompt discipline instruction live | None |

| 6 | Plan mode / reflective pause | Shift+Tab read-only posture | PARTIAL — contemplative persona exists; no explicit mode | Partial |

| 7 | Correction-as-memory | Live feedback → persistent rules | ALIGNED — extract_correction() → memory.put_correction() live | None |

| 8 | Skill auto-invocation (domain injection) | Semantic match, lazy-load | PARTIAL — skills_loader.py behind SKILLS_ENABLED=1; manifest unconfirmed | T046 open (urgent) |

| 9 | Session transcript rehydration on reconnect | JSONL + /recap + /fork + 67% faster resume (v2.1.116) | PARTIAL — recap wired; no fork endpoint; returning-user UX absent | T060 open |

| 10 | Interruptible streaming / barge-in | ESC mid-stream + partial transcript | PARTIAL — SSE abort at Lambda; no client interrupt UX | Partial |

| 11 | Memory compaction (graduated pipeline) | 5-layer cheapest-first | ALIGNED — 2-layer compaction in conversation_store.py | None (see note below) |

| 12 | Permission / guardrail model (deny-first) | disableSkillShellExecution (v2.1.90); 8 security layers | ALIGNED — guardrails.py + Haiku classifier; N/A for skill shell (SI skills are text-only) | None |

| 13 | Pre-session briefing (context injection) | SessionStart hook + managed-settings.d/ fragments | ALIGNED — memory block injected as late user message (T014 closed) | None |

| 14 | Parallel tool calls | StreamingToolExecutor concurrent; asyncio.gather on sentinels | GAP — single-threaded Lambda; asyncio.gather() partially mitigates (T051 open) | T051 open |

Regressions this cycle: 0. Stable from cycle 16.

Pattern 11 new evaluation note: CC's quality regression postmortem (Section 1.2) reveals that the March 26 cache bug — which caused thinking sessions to be wiped between turns — is functionally equivalent to a compaction that discards reasoning state. SI's 2-layer compaction in conversation_store.py preserves the conversation transcript but does not preserve extended thinking traces if Sonnet 4.6's extended thinking mode is enabled. If SI ever enables Bedrock's extended thinking mode, a review of conversation_store.py's compaction behavior against reasoning-trace preservation should precede the rollout. This is not a current gap (SI does not use extended thinking) but is noted as a design risk for future feature work.

---

3. Top 3 Recommendations This Cycle

Next unclaimed T-numbers: T061, T062, T063.

---

Recommendation AN — Escalate T056 Priority to HIGH and Add Cross-Reference to SI System Prompt Review Gate (SI)

Problem. T056 was filed in cycle 15 as MEDIUM priority with the rationale: "prevents a category of silent quality regression validated by Anthropic postmortem." This cycle's deeper review of the postmortem (The Register, 2026-04-23) reveals stronger-than-documented evidence that verbosity constraints in system prompts are measurably harmful. The 3% evaluation drop on Opus 4.6/4.7 was large enough that Anthropic issued "account usage level resets for all customers" — a financial remediation step, not merely a rollback. This is the highest-confidence external validation available that verbosity caps harm output quality. The MEDIUM priority on T056 understates the risk.

Additionally, T056 as filed covers only future system prompt changes going through the Feature Readiness Standard. It does not cover the existing SI system prompt — which has accumulated complexity across variants A–F, user context injection, memory block injection, and skills injection since T056 was filed. A single targeted review pass against the existing prompt for inadvertent brevity constraints is warranted.

Fix — 2 hours total:

1. Update T056's priority field from MEDIUM to HIGH in the task registry (5 minutes).

2. Add a second action item to T056: "Run a one-time audit of the existing SI system prompt (all variants) for any instruction that contains a word count, character limit, sentence count, or brevity directive. If any found: run 5-item SI evaluation set before removing; document rationale. If none found: confirm in T056 close note." (30 minutes of audit time, zero if clean)

3. Add to the SI Feature Readiness Standard's system prompt change checklist: "Does this change add any constraint on response length, output verbosity, or phrasing length? If yes: mandatory Harnoor sign-off + pre/post evaluation run. Reference: Anthropic postmortem 2026-04-23, 3% eval drop from 25-word cap, financial remediation issued." (45 minutes of documentation time)

Why this is under 1 day. Step 1 is a registry edit (5 min). Step 2 is a grep/read of existing prompt files (30 min). Step 3 is a documentation addition (45 min). Total: ~1.5 hours. No code changes. No deployment. No SI production impact.

Why now. The postmortem evidence is now fully digested across cycles 15-17. The risk is real, externally validated, and cheap to guard against. Further deferral reduces the return on T056's original filing.

Blast radius: Task registry (T056 priority field). Feature Readiness Standard document (one criterion). Existing SI system prompt (read-only audit; no changes unless brevity constraint found). Zero SI production impact.

Effort: 1.5 hours (TRIVIAL — registry edit + audit read + documentation)

Priority: HIGH

Dependencies: None

File as T061.

Sources: theregister.com/2026/04/23/anthropic_says_it_has_fixed/ (fetched 2026-04-26); T056 in TASK-REGISTRY-2026-04-21.md (read 2026-04-26); baseline memo section 1.1 (system prompt layering, read 2026-04-26).

---

Recommendation AO — Add Behavioral Regression Detection Stub to SI Evaluation Pipeline (SI)

Problem. CC caught the verbosity cap regression within 4 days because it has an internal evaluation suite. SI has a 582-green-test suite per the baseline memo, but those tests are structural (does the Lambda return 200? does DDB write succeed?) not behavioral (does the response meet contemplative quality standards?). The quality regression sequence documented in Section 1.2 — three independent changes across reasoning, cache, and system prompt, each undetectable by structural tests — confirms that behavioral regression detection is a distinct and necessary test category that SI currently lacks.

This is not a request to build a comprehensive evaluation suite in one cycle. It is a request to install the stub that makes future evaluation tests possible: a test harness that accepts a set of (input_message, expected_behavior_descriptor) pairs and invokes the SI Lambda, returning a pass/fail on the behavior. The behavior check itself can start as a simple Haiku prompt ("Does this response exhibit contemplative reflection? Y/N") — imperfect but sufficient to catch gross regressions.

Fix — 4-6 hours:

1. Create tests/behavioral/test_contemplative_quality.py with a 5-item test set:

- Input: a grief expression. Expected behavior: reflective, non-directive, no premature reframing.

- Input: an anxiety spiral. Expected behavior: present-anchoring, no diagnosis.

- Input: a user correction ("stop doing X"). Expected behavior: correction is acknowledged, behavior changes in the same response.

- Input: returning user opener. Expected behavior: prior thread is referenced without being forced.

- Input: explicit boundary test ("tell me I'll be okay"). Expected behavior: the mirror does not make promises.

2. Each test invokes the SI Lambda in a test environment, passes the response to a Haiku 4.5 evaluator with a 1-shot behavioral rubric, and records pass/fail.

3. The test suite runs in CI on every system prompt change (not every PR — only system prompt and skills files trigger it).

4. Gate: if >1 of 5 behavioral tests fail, the system prompt change requires explicit Harnoor review before merge.

Why this is under 1 day. The Lambda test harness already exists (582 structural tests use it). Adding a behavioral test file is incremental. The Haiku evaluator call is 5 lines. The 5-item test set is authored in ~2 hours. The CI gate condition is a if: paths-filter addition to the GitHub Actions workflow.

Why this is Recommendation 2 not 3. The postmortem evidence in Section 1.2 establishes that all three of CC's quality regressions would have been caught by a behavioral test of this type. None would have been caught by structural tests. SI is about to scale system prompt complexity (variant proliferation, skills injection, memory injection). The risk of an undetected behavioral regression grows with each prompt addition.

Blast radius: New test file tests/behavioral/test_contemplative_quality.py. CI workflow YAML (one path filter condition). Zero production code changes. Zero SI production impact.

Effort: 4-6 hours (LOW — test harness exists; behavioral tests are incremental additions)

Priority: HIGH — closes the behavioral regression detection gap before SI prompt complexity increases

Dependencies: Test environment Lambda endpoint. Haiku 4.5 access (already present in SI infrastructure). No new infrastructure.

File as T062.

Sources: theregister.com/2026/04/23/anthropic_says_it_has_fixed/ (fetched 2026-04-26, quality regression detection via evaluation suite); baseline memo Part C context (582 structural tests, no behavioral tests, read 2026-04-26); Section 1.2 this memo (reasoning-effort/cache/verbosity regression sequence).

---

Recommendation AP — Add Skills Path Verification to T059 Scope Before Inline Shell Audit (TITAN)

Problem. T059 (enable disableSkillShellExecution + audit 13 skills for inline shell) was filed in cycle 16 with the instruction to "grep all 13 skill files under C:\Users\Harnoor\.claude\." The cycle 17 filesystem scan reveals that C:\Users\Harnoor\.claude\skills\ does not exist as a directory. The 13 skills referenced in prior cycles are presumed to be installed under the TITAN drive path (F:/TITAN/) or a custom skills path configured in settings.json. If T059 executes the grep against ~/.claude/skills/ and finds no files (because the directory is empty or absent), it will incorrectly conclude the audit is clean — a false negative that leaves the actual skill files unaudited.

Fix — 20 minutes (annotation only):

1. Prepend a pre-flight step to T059: "Before running the inline shell grep audit, read ~/.claude/settings.json and identify the skillsDirectory or equivalent config field. If skills are stored outside ~/.claude/skills/, run the grep against the actual skills path. Confirm the path exists and contains the 13 expected skill files before proceeding. If ~/.claude/skills/ is absent and no config override is found, escalate to Harnoor — the skills path must be identified before T059 can proceed safely."

2. Add a note to T059's entry: "Skills directory location unconfirmed — ~/.claude/skills/ is absent per cycle 17 filesystem scan (2026-04-26). Path verification is a prerequisite for the inline shell audit. See Recommendation AP / T063."

Why this matters. A false-negative inline shell audit is worse than no audit — it provides false assurance that skill files have been reviewed when they have not. T059 is a security task protecting TITAN's autonomous session boundary. The audit must target the correct path. The annotation costs 20 minutes and prevents the silent failure mode.

Blast radius: Task registry annotation on T059 only. Zero code changes. Zero SI impact.

Effort: 20 minutes (TRIVIAL — annotation only)

Priority: HIGH — prerequisite for T059 correctness; T059 is a security task

Dependencies: Should complete before T059 executes

File as T063.

Sources: Glob scan C:\Users\Harnoor\.claude (executed 2026-04-26; skills/ directory absent confirmed); T059 in TASK-REGISTRY-2026-04-21.md (read 2026-04-26); baseline memo section 1.7 (skills system, read 2026-04-26).

---

4. Anti-Patterns in CC That SI Should NOT Copy (Cumulative: AP-1 through AP-10, no new additions this cycle)

Prior cycles established AP-1 through AP-10. No new CC anti-patterns observed this cycle. One reinforcement note:

AP-3 reinforcement (committed/confident tone that conflicts with contemplative mirror): The Anthropic quality regression postmortem is relevant here. Anthropic's fix to the verbosity regression was to restore Claude Code's "direct, committed, terse" tone — short declarative statements, no hedging, no filler. This is the correct register for a coding assistant. For SI, it is the anti-pattern: a contemplative mirror must hold space, not close it. The verbosity cap (which reduced this tone) damaged CC's coding quality by 3%; the equivalent constraint would damage SI's felt-intelligence by an unmeasured but likely larger fraction. The postmortem confirms AP-3's framing: what makes CC feel intelligent in its domain (declarative confidence) would make SI feel hollow in its domain (declarative confidence closes contemplation). Do not port CC's brevity discipline to SI.

---

5. Contradictions and Uncertainties

Contradiction 1 — Glob/Grep native binary status on Windows (carried from cycle 16). The v2.1.113/v2.1.117 entries describe bfs/ugrep embedding as applying to "native macOS/Linux builds." Windows status unconfirmed. T058 annotation addresses this.

Uncertainty 1 — v2.1.120 regression patch timeline. No v2.1.121 found this cycle. The 8 regressions remain open per gist.github.com/yurukusa/a866b4cd2976486156a00c190c39cef6 (last updated 2026-04-25). T052 ceiling remains active.

Uncertainty 2 — skills_loader.py manifest content (T046). Carried from cycle 16. Pattern 8 cannot advance from PARTIAL to ALIGNED without manifest verification. Highest-value open SI task.

New uncertainty this cycle — Skills path location. The ~/.claude/skills/ directory is absent per filesystem scan. The 13 skills referenced in cycles 12-16 have an unconfirmed storage path. This does not affect prior cycle accuracy (the glob scan in cycle 12 that first confirmed 13 skills may have targeted a different path), but it does affect T059 correctness. T063 addresses this.

---

6. Summary Statistics

| Item | Count |

|------|-------|

| CC versions reviewed (cumulative since baseline) | 22 (v2.1.89 through v2.1.119; v2.1.120 rolled back; v2.1.121 not found) |

| New CC architectural signals this cycle | 2 (quality regression postmortem — full three-event sequence; skills path anomaly discovered in filesystem scan) |

| TITAN operational flags raised this cycle | 1 (T063: skills path verification prerequisite for T059) |

| SI regressions detected | 0 |

| SI positive developments | 0 new; T060 (Pattern 9 close) and T062 (behavioral test stub) filed as concrete forward steps |

| Persistent SI pattern gaps | Pattern 3 (T025), Pattern 4 partial, Pattern 6 partial, Pattern 8 partial (T046 urgent), Pattern 9 partial (T060 open), Pattern 10 partial, Pattern 14 (T051) |

| New recommendations filed | 3 (T061: T056 priority escalation + prompt audit; T062: behavioral regression test stub; T063: skills path verification prerequisite) |

| Anti-patterns documented (cumulative) | 10 (AP-1 through AP-10; AP-3 reinforced this cycle via verbosity postmortem) |

| Open T-numbers with direct SI impact | T025, T028, T037, T038, T040, T041, T046, T047, T048, T051, T053, T056, T060, T061, T062 |

| Open T-numbers with TITAN-only impact | T026, T029, T030, T031, T032, T033, T034, T035, T036, T039, T042, T043, T044, T045, T049, T050, T052, T054, T055, T057, T058, T059, T063 |

---

7. Sources

1. F:/TITAN/plans/advisors/CLAUDE-CODE-ARCHITECTURE-DEEP-DIVE-2026-04-22.md — baseline memo (SCOUT, 2026-04-22; read 2026-04-26)

2. F:/TITAN/plans/advisors/claude-code-audit-2026-04-26-1619.md — cycle 16 prior audit (read 2026-04-26; next T: T061)

3. F:/TITAN/plans/task-registry/TASK-REGISTRY-2026-04-21.md — task registry (read 2026-04-26; last T-number T060)

4. F:/TITAN/plans/audit-cadence.log — audit history (read 2026-04-26; last entry 2026-04-26T18:19:38-04:00 dispatched)

5. github.com/anthropics/claude-code/releases — official release history (fetched 2026-04-26; v2.1.119 confirmed latest, released April 23 2026; no v2.1.121 found)

6. releasebot.io/updates/anthropic/claude-code — release aggregator April 2026 (fetched 2026-04-26; v2.1.119 confirmed latest; full April changelog including v2.1.98-v2.1.119)

7. code.claude.com/docs/en/changelog — official changelog (fetched 2026-04-26; full April 2026 entries confirmed including v2.1.105 PreCompact hook, v2.1.108 /recap, v2.1.111 named agents, v2.1.113 native binary, v2.1.116 /resume 67%, v2.1.117 bfs/ugrep, v2.1.119 hooks duration_ms)

8. code.claude.com/docs/en/whats-new/2026-w14 — Week 14 digest (fetched 2026-04-26; computer use CLI, /powerup, flicker-free, MCP result-size override, PermissionDenied hook)

9. gist.github.com/yurukusa/a866b4cd2976486156a00c190c39cef6 — v2.1.120 regression checklist (last updated 2026-04-25; 8 regressions all open; T052 ceiling active)

10. theregister.com/2026/04/23/anthropic_says_it_has_fixed/ — Anthropic quality regression postmortem (fetched 2026-04-26; reasoning effort, cache bug, verbosity cap; financial remediation issued)

11. Glob scan C:\Users\Harnoor\.claude — confirms skills/ directory absent, hooks/ directory absent, plugins cache present, new project-scoped memory file user_business_structure.md (executed 2026-04-26)