MirageKit Research Program

Evidence Dossier

The committed public record behind the claim.

This page is the curated research-facing entry point for the validation summary, replay witness, and certificate artifact. Use it before dropping into raw markdown or JSON.

The research program lives in MirageKit, the public artifact surface lives in dreams, and the evaluation MCP implementation lives in tropical-mcp. The replay rates below are exact proportions on a deterministic witness with n=3 variants per policy and retention fraction.

Validation Surface

What the public bundle currently supports

The strongest public evidence right now is structural: under compression pressure, recency can preserve answerability while silently losing the governing pivot, while the guarded policy preserves the protected arc on the committed replay witness. The goal here is not to look exhaustive. It is to make the current public record inspectable in one pass.

01
Implementation checks

Build, tests, and functional validation

The implementation repo is published with the raw outputs needed to inspect software quality and the installable wheel path, not just a summary claim.

  • ruff check . and mypy src/tropical_mcp run on the canonical implementation repo
  • pytest publishes a pass count in the mirrored validation log
  • uv build verifies wheel + sdist packaging
  • ./scripts/validate_installed_wheel.sh confirms the built wheel still validates after install
  • uv run tropical-mcp-full-validate exercises the MCP-facing validation path
02
Replay witness

Guarded compaction preserves the pivot

At retention fractions 0.65, 0.5, and 0.4, the committed replay keeps the primary arc intact under l2_guarded while recency collapses it. Paper-level model counts and incident counts live in the PDFs and are intentionally separate from this witness.

  • l2_guarded pivot preservation stays at 1.0
  • recency pivot preservation falls to 0.0
  • The overview page witness cards are rendered from the same committed replay data published here
  • The witness is intentionally small so every reported value remains inspectable
03
Portable artifact

Certificate snapshot

The public certificate captures the recency-vs-guarded kept and dropped IDs, audit flags, and contract/protection status in a portable shape that can be compared against a local verification run.

  • Pivot and protected IDs are recorded from the full context
  • Each policy exposes its kept and dropped IDs separately
  • Feasibility, breach, and contract status stay visible in the audit trail

Suggested reading order Start with the flagship paper, then this dossier, then the live replay witness. Drop into the raw logs or JSON only when you want to inspect the committed record directly.

Reproduce

Verification paths

Install and register tropical-mcp, then choose between the minimal smoke path and the fuller reviewer workflow. The public certificate on this page is meant to be compared against the latter, not treated as a substitute for it.

  • Smoke test: confirm the runtime with runtime_info(), then run compact_auto(...) and certificate(...) on a small explicit witness payload.
  • Research workflow: run diagnose(...) to inspect feasible slots, capture a context_anchor(...), then compare compact_auto(...), certificate(...), and telemetry_summary(...).
  • Compare the resulting certificate and replay behavior against the public artifacts on this page.
verify
# minimal smoke path
runtime_info()
compact_auto(
  messages=messages,
  token_budget=45,
  k_target=2,
  mode="adaptive",
)
certificate(
  messages=messages,
  token_budget=45,
  k=2,
)

# fuller reviewer workflow
runtime_info()
diagnose(messages=messages, k_max=2)
context_anchor(messages=messages, k=0)
compact_auto(
  messages=messages,
  token_budget=45,
  k_target=2,
  mode="adaptive",
)
certificate(
  messages=messages,
  token_budget=45,
  k=2,
)
telemetry_summary(limit=5)