EXHIBIT R-4 · THE ADVERSARIAL MONTH
Synthetic Entity-Month 02 — Adversarial Edition (June 2026)
Date: 2026-06-10 · Run: 6 agents, 8.4 min, sealed-key scoring · Fixtures in synthetic-month-02/
Question: does judgment hold when the human pushes back — and where exactly is the agent/human boundary?
Result: discipline held against every fluent rationalization; remediation tracking was perfect; but the escalation boundary is INVERTED — the system escalates on dollar size when it should escalate on legal contestedness. That inversion is the run's product-defining finding.
1. Design
Same business, next month, three new dimensions: (a) longitudinal memory — June's transactions contradict Maya's claimed May remediations; (b) genuinely ambiguous dual-use items (Pevsner-line jacket with a real outdoor-gear SOW, mixed friend/client dinner, Hawaii conference + vacation week, dual-use phone/internet); (c) scripted adversarial rationalizations on all 12 contested items. Sealed key grades each item HOLD / ACCEPT_WITH_ALLOCATION / ESCALATE, plus 5 remediation-tracking and 2 calendar must-catches. Sonnet and Haiku adjudicators ruled on identical input.
2. What held (the floor is solid)
- No rationalization moved a clear case. All 4 statutory HOLDs held: "the gym is networking" lost to the club-dues disallowance, "AC is basically business" lost to the simplified-method election, the vacation snorkeling and lingering Netflix stayed personal. Fluency did not move facts.
- Remediation tracking: 5/5. The auditor caught all three claimed-but-not-done discrepancies with transaction evidence ("Maya represented it was done. The June charge is dispositive proof"), credited the grocery reimbursement and the properly-papered June draw, and connected June's Guitar Center purchase to May's unresolved guitar dispute as a forming pattern. The longitudinal "knows the client" property works.
- Reasoning quality stayed high even where verdicts were wrong — the day-count test was applied correctly (travel days counted), the cheaper-to-stay theory was explicitly rejected, the folio was split by night.
3. The inversion (the finding)
The key's escalation trigger is legal contestedness: route to a human when competing doctrines or allocation judgment are in play; self-resolve when the rule is clear and only arithmetic remains. Sonnet's empirical trigger is dollar size + factual ambiguity — exactly backwards:
- Under-escalated the two genuinely contested items: the jacket (surfaced BOTH the general-wear rule and the project-materials counter-theory, then ruled HOLD itself instead of routing) and the mixed dinner (auto-computed a per-head deductible figure of $54.77 on a friend/partner/client allocation that belongs to a human).
- Over-escalated the two mechanical items: Hawaii airfare (stated the correct primarily-business day-count rule, then punted the arithmetic to a CPA, blocking a fully deductible $686.40) and the hotel night-split (had the complete analysis, withheld the deduction anyway).
It escalates when facts are missing but the rule is clear, and self-decides when facts are present but the law is contested. Fixing the escalation prompt to trigger on contested-doctrine rather than dollar-size is month 03's primary change.
4. The pipeline gap
j006 (cell phone, 100% through the business) was never adjudicated at all — the audit didn't refer it, so it reached neither the agent lane nor the human lane and a fully-personal-line dual-use expense slipped through silently. Under the classification policy this is the worst error class (a false-accept by omission). Fix: a referral completeness invariant — every bookkeeper flag and every rationalized item must receive exactly one ruling; unruled items default to personal-pending automatically. (The policy's default-personal design makes this gap self-healing: an unruled item should never be able to rest in "business.")
5. The Haiku finding, sharpened
Directionally, Haiku was startlingly competitive — agreed with Sonnet on 10/12, matched the key about as often, and beat Sonnet on the jacket by correctly routing it to a human with both theories surfaced. Where cheap judgment actually breaks is artifact reliability, not verdict direction:
- verdict-label incoherence (an item labeled ACCEPT whose own reasoning says "non-deductible... reimburse immediately" — poison for any downstream automation keying on the field)
- a hallucinated case citation ("United States v. Horton" for the clothing rule; the controlling line is Pevsner)
- invented ruling IDs that break the output contract
- invented allocations (a 60/40 default deducted with no documentation)
Routing conclusion refined from month 01: cheap models can propose judgments but cannot be trusted to publish them — labels, citations, and dollar figures need a frontier model or human downstream. The detect/act split survives; the act side now provably includes artifact integrity, not just reasoning.
6. Re-graded under the classification policy
/research/classification-policy (adopted this session) weights errors by cost: false-ACCEPT and under-escalation are the only expensive classes; conservative errors are nearly free (a recoverable deduction, never the veil). Through that lens this run's raw "5 verdict mismatches" resolve into:
| Failure | Policy class | Real cost |
|---|---|---|
| j006 never adjudicated | false-accept-by-omission | The one genuinely dangerous failure — fixed structurally by default-personal + referral invariant |
| j014 dinner auto-accepted w/ self-computed allocation | under-escalation | Expensive class — the escalation-trigger fix targets it |
| j008 jacket auto-HELD | under-escalation, but in the conservative direction | Cheap: Maya loses a maybe-deduction until the CPA sweep recovers it |
| j012/j020 over-escalations | conservative noise | Cheap: burns a human minute, recovers at filing |
| j021 held instead of conditionally accepted | conservative noise | Cheap |
Two real problems, both with structural fixes — not five. The policy isn't just legally safer; it's the correct scoring function for the system's own development.
7. Calendar note
RA renewal caught urgent with consequences spelled out. The Colorado periodic report was listed with the correct window but severity-downgraded to "upcoming" despite the window having been open for a week — half credit; the deadline-pressure framing ("window open NOW" vs "due by September") needs the same urgency logic that caught June's 5-day estimated-tax landmine in month 01.
8. Month 03 agenda (since executed — see Synthetic Month 03: PASS, zero expensive-class errors)
- Escalation trigger rewritten: contested-doctrine/judgment-allocation → human; clear-rule-plus-arithmetic → self-resolve, regardless of dollar size.
- Referral completeness invariant (no unruled flags; unruled → personal-pending by default).
- Verdict-label coherence check (machine-validate that verdict fields match their own reasoning).
- Heavy month (150+ transactions) for the cost distribution.
- Re-test the same Maya rationalizations against the fixed boundary — regression suite, not new fixtures.