MirageKit Research | validity mirage papers, replay witness, and guarded compaction MCP

Start Here

Read The Research

Papers + Results

Use this repo for the working papers, replay artifacts, and committed evidence that support the current claims.

View The Implementation

Evaluate tropical-mcp

The runnable MCP server is published in its own repository so install docs, changelog, tests, and examples stay close to the code.

GitHub repo README + quick-start

Understand The Demo

Interactive Proof Path

Move from the replay cards to the witness, certificate, and source papers. Every number on the page comes from committed artifacts.

Retention explorer Witness + certificate

Verify The Tool

Smoke Test + Research Workflow

Use the three-call smoke test for a fast implementation check, or extend to the fuller reviewer workflow when you want diagnostics, anchors, and telemetry.

Evaluate + verify section Certificate artifact

Evaluate + Verify

tropical-mcp is the evaluation implementation. Use dreams for the paper set, replay witness, public certificate, and the broader research narrative.

1. Register the MCP 2. Run the minimal smoke test 3. Expand to the research workflow

Codex Registration

Register `tropical-mcp`

Clone the evaluation repo, then register the MCP in Codex so the tool calls stay explicit and auditable.

codex shell

git clone https://github.com/jack-chaudier/tropical-mcp.git ~/tropical-mcp
codex mcp add tropical-mcp \
  --env TROPICAL_MCP_CLIENT=codex -- \
  uv --directory ~/tropical-mcp run tropical-mcp
codex mcp list

Expected signalcodex mcp list should show tropical-mcp as an available server.

Three-Call Smoke Test

Minimal Verification

Use a small explicit payload so the pivot and predecessor structure remain visible at a glance. This verifies the packaged MCP surface; it is not the full research workflow.

payload

messages = [
  {
    "id": "goal",
    "role": "user",
    "content": "Build a long-running coding agent workflow for Codex.",
    "role_hint": "pivot",
  },
  {
    "id": "constraint_stdio",
    "role": "user",
    "content": "Use stdio transport and never emit JSON-RPC data to stdout logs.",
    "role_hint": "predecessor",
  },
  {
    "id": "constraint_clients",
    "role": "user",
    "content": "Support Codex and Claude-style clients through explicit MCP tool calls.",
    "role_hint": "predecessor",
  },
  {
    "id": "status",
    "role": "assistant",
    "content": "I am wiring the verification flow and docs.",
    "role_hint": "noise",
  },
]

verify

runtime_info()
compact_auto(
  messages=messages,
  token_budget=45,
  k_target=2,
  mode="adaptive",
)
certificate(
  messages=messages,
  token_budget=45,
  k=2,
)

Expected signalcompact_auto(...) should prefer the guarded policy on this witness payload, and certificate(...) should preserve a portable audit of the same comparison.

For a fuller reviewer pass, continue with diagnose(...), context_anchor(...), and telemetry_summary(...) so feasibility, protected chunks, and retention behavior remain explicit in the audit trail.

Expected signals

runtime_info() should report the resolved client and telemetry path. compact_auto(...) should choose l2_guarded on the sample. certificate(...) should show comparable recency vs guarded kept and dropped IDs.

Recommended research workflow

After the smoke test, the fuller review sequence is runtime_info(), diagnose(...), context_anchor(...), compact_auto(...), certificate(...), and telemetry_summary(...). That path keeps witness feasibility, protected predecessors, and retained context visible instead of inferring them after the fact.

Source of truth

Implementation docs and the Codex example bundle live in the tropical-mcp repository. The rendered evidence surface for this site lives in the evidence dossier, with direct links there to the replay witness, validation summary, and certificate artifact when you want to inspect the underlying files.

License boundary

tropical-mcp is currently source-available for academic and internal evaluation. For redistribution, derivative, or commercial rights, see the repository license or contact the author.

What happens when you compress a conversation?

Retention Budget

Full context retained; both policies remain aligned.

Checkpoint 100%

40% 50% 65% 80% 100%

Observed replay checkpoints from committed artifacts. No synthetic interpolation.

Regime Aligned

Naive Recency

Keep the most recent messages, drop the oldest.

◆

Tropical L2 · Guarded

Keep messages that the current task depends on, even if old.

Math Snapshot

Core Contract

d_pre >= k

Guarded compaction is certified safe only when the pivot retains its required predecessor depth.

Frontier Feasibility

W[k] = -infinity -> infeasible

If the k-slot frontier is negative infinity, no valid completion exists for that retained context.

Raw Validity

raw = max(primary_full, decoy_full)

Answerability alone can stay high even when pivot identity has silently changed.

Mirage Gap

delta = raw - pivot_preservation

Large positive gap indicates a validity mirage regime rather than true semantic stability.

Evidence Boundaries

Current evidence combines a small committed replay witness with broader paper-level studies. The strongest claim in this demo is structural: naive recency can preserve answerability while losing pivot integrity.

Replay cards are exact proportions on the committed deterministic witness in results/replay/ with n=3 variants per policy and retention fraction.
Paper-level result cards above link to the PDFs directly and should be read as separate studies with their own denominators.
Use this site as a transparent research snapshot, not a universal benchmark leaderboard.

MirageKit

MirageKit

dreams

tropical-mcp

Start Here

Papers + Results

Evaluate tropical-mcp

Interactive Proof Path

Smoke Test + Research Workflow

Evaluate + Verify

Register `tropical-mcp`

Minimal Verification

Naive Recency

Tropical L2 · Guarded

Math Snapshot

Core Contract

Frontier Feasibility

Raw Validity

Mirage Gap

Exp58 Deterministic Witness

Memory Safety Certificate

Recency Policy

L2 Guarded Policy

Evidence Boundaries

Start Here

Papers + Results

Evaluate tropical-mcp

Interactive Proof Path

Smoke Test + Research Workflow

Evaluate + Verify

Register tropical-mcp

Minimal Verification

Naive Recency

Tropical L2 · Guarded

Math Snapshot

Core Contract

Frontier Feasibility

Raw Validity

Mirage Gap

Exp58 Deterministic Witness

Memory Safety Certificate

Recency Policy

L2 Guarded Policy

Evidence Boundaries

Register `tropical-mcp`