This page is the curated research-facing entry point for the validation summary, replay witness, and
certificate artifact. Use it before dropping into raw markdown or JSON.
The research program lives in MirageKit, the public artifact surface lives in
dreams, and the evaluation MCP implementation lives in
tropical-mcp.
The replay rates below are exact proportions on a deterministic witness with n=3 variants per policy and retention fraction.
The strongest public evidence right now is structural: under compression pressure, recency can preserve
answerability while silently losing the governing pivot, while the guarded policy preserves the protected
arc on the committed replay witness. The goal here is not to look exhaustive. It is to make the current
public record inspectable in one pass.
01
Implementation checks
Build, tests, and functional validation
The implementation repo is published with the raw outputs needed to inspect software quality and the
installable wheel path, not just a summary claim.
ruff check . and mypy src/tropical_mcp run on the canonical implementation repo
pytest publishes a pass count in the mirrored validation log
uv build verifies wheel + sdist packaging
./scripts/validate_installed_wheel.sh confirms the built wheel still validates after install
uv run tropical-mcp-full-validate exercises the MCP-facing validation path
At retention fractions 0.65, 0.5, and 0.4, the committed replay keeps
the primary arc intact under l2_guarded while recency collapses it. Paper-level model counts and
incident counts live in the PDFs and are intentionally separate from this witness.
l2_guarded pivot preservation stays at 1.0
recency pivot preservation falls to 0.0
The overview page witness cards are rendered from the same committed replay data published here
The witness is intentionally small so every reported value remains inspectable
The public certificate captures the recency-vs-guarded kept and dropped IDs, audit flags, and
contract/protection status in a portable shape that can be compared against a local verification run.
Pivot and protected IDs are recorded from the full context
Each policy exposes its kept and dropped IDs separately
Feasibility, breach, and contract status stay visible in the audit trail
Suggested reading order
Start with the flagship paper, then this dossier, then the live replay witness. Drop into the raw logs or
JSON only when you want to inspect the committed record directly.
Reproduce
Verification paths
Install and register tropical-mcp, then choose between the minimal smoke path and the fuller reviewer
workflow. The public certificate on this page is meant to be compared against the latter, not treated as a
substitute for it.
Smoke test: confirm the runtime with runtime_info(), then run compact_auto(...) and certificate(...) on a small explicit witness payload.
Research workflow: run diagnose(...) to inspect feasible slots, capture a context_anchor(...), then compare compact_auto(...), certificate(...), and telemetry_summary(...).
Compare the resulting certificate and replay behavior against the public artifacts on this page.