Use the live tab to inspect one real decision trace from a seeded game.
This site is a research artifact, not a product mockup. Hanabi is the controlled teamwork environment. The actual claim is about whether an AI teammate can time coordination cues well enough to help without creating unnecessary communication overhead.
Use the benchmark tab to rerun all policy conditions over repeated deterministic seeds.
Use the final tab to inspect the formulas, seed list, and exportable evidence snapshot behind the displayed result.
Selective awareness is only interesting if the benchmark keeps three checks visible at the same time.
The selective policy should remain close to or better than the performance-only baseline.
The system should reduce hints that add little or no new coordination value.
Lower redundancy is not enough if one partner ends up carrying most of the signaling work.
The simulator runs from deterministic seeds, the benchmark averages are computed from repeated runs, and the final screen exposes the formulas and exportable evidence instead of relying on presentation-only text.
Press Step or Autoplay to run the current game.
Switch between the current decision, the partner estimate, the metrics, and the turn log instead of reading everything at once.
The app will show the live action rationale here, including the partner-state estimate and whether a cue was redundant.
Run the benchmark to populate the summary, the full table, and the cross-play matrix.
Run a game or benchmark first to populate the computed evidence below.
Saved entries preserve the computed summary so the result can be re-checked later from the same metrics and seed configuration.
This application runs a full two-player Hanabi simulation in the browser using standard rules, deterministic seeded decks, live agent policies, and turn-by-turn state updates. It is not a static dashboard. The agents actually act inside the simulator and can be benchmarked under repeated seeds.
The storage layer is only used to persist evidence snapshots. The core game logic, policy actions, and benchmark metrics are computed from the simulation itself rather than loaded from a fixed dataset.
The selective agent is not a trained neural model and not an NLP-style Markov chain. The Hanabi state changes come directly from the game rules. On each turn, the policy evaluates play, discard, and hint actions from the current state.
A hint scores higher when it reveals a playable card, makes a safe discard explicit, protects a critical card, or meaningfully reduces partner uncertainty. The score drops when the hint is redundant, costly, or shifts too much signaling burden onto one player.
This demo focuses on a bounded Hanabi benchmark with a unified pipeline, four-policy comparison, and controlled simulation testing. A full probabilistic formulation would be closer to a POMDP than a simple Markov chain, but that is beyond the scope of this project.
It should be framed as a simulation-first benchmark rather than a replacement for a hosted human-agent evaluation protocol.