MirageKit Evidence Dossier | replay witness, certificate, and validation logs

Validation Surface

What the public bundle currently supports

The strongest public evidence right now is structural: under compression pressure, recency can preserve answerability while silently losing the governing pivot, while the guarded policy preserves the protected arc on the committed replay witness. The goal here is not to look exhaustive. It is to make the current public record inspectable in one pass.

At A Glance

Current public record

Strongest claim Naive recency can preserve answerability while losing pivot integrity; the guarded policy keeps the protected arc on the committed witness.
Witness scope The rendered replay rates are exact proportions on a deterministic witness with n=3 variants per policy and retention fraction.
Public artifacts Lint, typing, test, build, installed-wheel, functional validation, replay summaries, and a portable certificate are all published directly from committed files.

Version map
DOI-backed archive: dreams v0.1.1
Mirrored implementation release: tropical-mcp v0.2.1

Implementation checks

Build, tests, and functional validation

The implementation repo is published with the raw outputs needed to inspect software quality and the installable wheel path, not just a summary claim.

ruff check . and mypy src/tropical_mcp run on the canonical implementation repo
pytest publishes a pass count in the mirrored validation log
uv build verifies wheel + sdist packaging
./scripts/validate_installed_wheel.sh confirms the built wheel still validates after install
uv run tropical-mcp-full-validate exercises the MCP-facing validation path

Ruff log ↗ Mypy log ↗ Pytest log ↗ Build log ↗ Installed-wheel log ↗ Validation summary ↗ Full validation report ↗

Replay witness

Guarded compaction preserves the pivot

At retention fractions 0.65, 0.5, and 0.4, the committed replay keeps the primary arc intact under l2_guarded while recency collapses it. Paper-level model counts and incident counts live in the PDFs and are intentionally separate from this witness.

l2_guarded pivot preservation stays at 1.0
recency pivot preservation falls to 0.0
The overview page witness cards are rendered from the same committed replay data published here
The witness is intentionally small so every reported value remains inspectable

Open live witness view ↖ Replay summary CSV ↗ Replay summary JSON ↗

Portable artifact

Certificate snapshot

The public certificate captures the recency-vs-guarded kept and dropped IDs, audit flags, and contract/protection status in a portable shape that can be compared against a local verification run.

Pivot and protected IDs are recorded from the full context
Each policy exposes its kept and dropped IDs separately
Feasibility, breach, and contract status stay visible in the audit trail

Certificate JSON ↗ Verification snippet ↗

Suggested reading order Start with the flagship paper, then this dossier, then the live replay witness. Drop into the raw logs or JSON only when you want to inspect the committed record directly.

Reproduce

Verification paths

Install and register tropical-mcp, then choose between the minimal smoke path and the fuller reviewer workflow. The public certificate on this page is meant to be compared against the latter, not treated as a substitute for it.

Smoke test: confirm the runtime with runtime_info(), then run compact_auto(...) and certificate(...) on a small explicit witness payload.
Research workflow: run diagnose(...) to inspect feasible slots, capture a context_anchor(...), then compare compact_auto(...), certificate(...), and telemetry_summary(...).
Compare the resulting certificate and replay behavior against the public artifacts on this page.

Codex quick-start ↗ Return to overview ↖

verify

# minimal smoke path
runtime_info()
compact_auto(
  messages=messages,
  token_budget=45,
  k_target=2,
  mode="adaptive",
)
certificate(
  messages=messages,
  token_budget=45,
  k=2,
)

# fuller reviewer workflow
runtime_info()
diagnose(messages=messages, k_max=2)
context_anchor(messages=messages, k=0)
compact_auto(
  messages=messages,
  token_budget=45,
  k_target=2,
  mode="adaptive",
)
certificate(
  messages=messages,
  token_budget=45,
  k=2,
)
telemetry_summary(limit=5)