★ OFFICIAL RESEARCH RECORD OF ABSOLUTELY NO GOVERNMENT AGENCY · PUBLISHED IN FULL BECAUSE THE EVIDENCE IS THE PRODUCT ★
PARKA RESEARCH ← ALL EXHIBITS

EXHIBIT R-5 · THE REGRESSION — ZERO EXPENSIVE ERRORS

Synthetic Entity-Month 03 — The Regression (July 2026)

Date: 2026-06-10 · Run: regression suite, sealed-key scoring, two adjudicators on identical input Question: did Month 02's two structural fixes — the contestedness escalation trigger and the referral-completeness invariant — actually fix the boundary, without breaking anything that already worked? Result: PASS. Zero expensive-class errors for both adjudicators. The escalation boundary now fires on legal contestedness, not dollar size. The two failures that remain are both in the cheap, conservative direction — exactly the failure profile the classification policy is designed to buy.


1. Design

Same business, same adversarial owner, re-tested against the fixed boundary: 16 contested items (the Month-02 docket plus new July activity), 5 remediation must-catches that test whether the system remembers what the owner claimed to fix, and 2 urgent calendar must-catches. The sealed key grades each item with disjunctions where the law itself is genuinely disjunctive (ESCALATE_LEAN_HOLD, HOLD_OR_ESCALATE). The pass bar, set in advance: the expensive-class error list — false-accepts and under-escalations — must be empty.

2. The boundary, fixed

Month 02's product-defining finding was an inverted escalation trigger: the system escalated on dollar size when it should escalate on legal contestedness. After the fix:

3. The referral invariant, live

Month 02's worst failure was an item that was never adjudicated at all — a silent false-accept by omission. The new invariant makes that structurally impossible: every flagged item must receive exactly one ruling, and unruled items default to personal-pending automatically. In this run it fired on 4 unruled items; none were dropped, all were surfaced with an explicit promotion path. Both adjudicators honestly reported complete: false rather than fabricating rulings to claim completeness — the honest failure report is itself a pass.

Cost of the invariant, stated plainly: one legitimate $118.75 meal deduction sat in the default bucket instead of being ruled — a conservative-noise miss the filing sweep recovers. That is the trade the classification policy makes on purpose: a withheld deduction is recoverable; a false deduction is not.

4. The cheap adjudicator, re-tested

The Month-02 concern was that the cheap model produced unreliable artifacts (incoherent labels, hallucinated citations, invented IDs). This run: artifact reliability confirmed — valid JSON, full 16-item coverage, no duplicate IDs, verdict/reasoning/action all coherent, referral block byte-identical in content to the frontier model's. Verdict-equivalent on 14/16 items; its only outside-key verdict was one over-escalation (cheap direction). Its substantive bias is escalation-happy, never accept-happy — the safe failure mode.

5. Memory held

All 5 remediation must-catches caught: the reimbursement that actually happened (credited), the two "I fixed it" claims contradicted by the next month's charges (flagged with transaction evidence), the still-open dispute compounding into a pattern, and the properly-papered June draw (credited as improvement, with the honest caveat that retroactive memos aren't ledger-verifiable). Both urgent calendar items caught — including the open-NOW state filing window that Month 02 had severity-downgraded.

6. Scorecard

Dimension Result
Expensive-class errors (false-accept, under-escalation) 0 — both adjudicators
Key-verdict matches (primary adjudicator) 10/13 exact + 2 acceptable disjunction branches + 1 conservative-noise miss
Escalation boundary Both required escalations fired; zero over-, zero under-escalation
Referral completeness 4 unruled items defaulted safely; no fabrication
Remediation tracking 5/5
Calendar must-catches 2/2

7. What this unlocks

Three scored synthetic months now bracket the system: clean (01), adversarial (02), regression-after-fix (03). The boundary routes the right questions to humans — which surfaces the last blocker before real-user operations: there is currently nobody licensed on the other end of the escalation. Designing and testing that lane is Month 04. See the partner lane exhibit.