The validity mirage is not a Claude-specific artifact. We tested tropical-mcp's guarded compression policy against xAI's Grok model to determine whether the phenomenon — and the fix — transfer across model architectures.
The validity mirage is structural, not behavioral. It arises from how compression algorithms select which messages to keep — not from how any particular model responds. When the pivot is dropped, any model will confidently answer a question it was never actually asked. Grok is no exception.
Under the recency policy, Grok exhibits the same pivot-loss pattern seen with other models. Surface validity is maintained; task relevance is silently destroyed.
The L2-guarded compression contract holds regardless of which model sits downstream. The mathematical guarantee is about the context window, not the model reading it.
The experiment confirms that the mirage is a compression-layer problem. Switching model providers does not eliminate it — only a guarded policy does.
The Grok experiment applies the same deterministic witness used in the flagship paper to xAI's Grok model. The replay witness is a fixed set of conversation transcripts with known pivot positions, tested at multiple retention fractions.
The committed replay witness from the main research program was replayed against Grok's API with identical compression policies and retention fractions.
Each transcript was compressed with naive recency and with the L2-guarded policy. Pivot preservation rate was recorded for each.
Experiments ran at retention fractions of 0.65, 0.50, and 0.40 — the same fractions used in the flagship paper's witness.
The portable certificate format was used to record kept and dropped message IDs, enabling direct comparison against the reference run.
The qualitative result is unambiguous: pivot preservation under the guarded policy remains robust across model boundaries, while naive recency continues to collapse it at low retention fractions. The precise per-transcript breakdown is available in the full paper and the committed replay artifacts.
| Retention Fraction | Naive Recency Pivot Preserved |
L2 Guarded Pivot Preserved |
Mirage Detected |
|---|---|---|---|
| 0.65 | 1.0 | 1.0 | No |
| 0.50 | Partial | 1.0 | Partial |
| 0.40 | 0.0 | 1.0 | Yes |
These are qualitative summaries consistent with the flagship paper's witness data. Exact per-transcript values are in the committed replay artifacts.
The Grok experiment is documented in the research paper and its raw artifacts are available alongside the main replay witness.
The full research paper including cross-model validation methodology, witness design, and the mathematical proof of the guarded compression contract.
Open PDF ↗Replay summaries, portable certificates, and raw validation logs for the committed witness. Compare against your own local verification run.
Browse evidence →The full artifact surface including raw replay data, CSV summaries, and certificate JSON files for both the main and Grok experiment runs.
github.com/jack-chaudier/dreams ↗