Project SYNAPSE

Selective Coordination in Human-AI Teams

Team SYNAPSE: Ishan Shivansh Bangroo and Abolfazl Ansari

A reproducible Hanabi benchmark for studying when an AI teammate should speak up, stay quiet, and coordinate without creating redundant signaling overhead.

Problem AI teammates can over-talk, under-talk, or time cues well.
Method Run one live game, then compare all four policies over many seeds.
Goal Keep score strong while reducing redundant coordination overhead.
Research question Does selective awareness improve coordination without shifting burden elsewhere?
Suggested reading path Start with one live run to inspect a single decision. Then use the benchmark to compare policies across repeated seeds. Finish on the evaluation screen to review formulas, saved evidence, and reproducibility.
Simulation, Storage, And Interpretation Checking whether saved evidence can persist beyond the browser preview.
Step 1
Understand the claim before running anything

This site is a research artifact, not a product mockup. Hanabi is the controlled teamwork environment. The actual claim is about whether an AI teammate can time coordination cues well enough to help without creating unnecessary communication overhead.

A
Watch one live run

Use the live tab to inspect one real decision trace from a seeded game.

B
Compare all policies

Use the benchmark tab to rerun all policy conditions over repeated deterministic seeds.

C
Verify reproducibility

Use the final tab to inspect the formulas, seed list, and exportable evidence snapshot behind the displayed result.

Success Rule
What counts as a credible result

Selective awareness is only interesting if the benchmark keeps three checks visible at the same time.

Score stays competitive

The selective policy should remain close to or better than the performance-only baseline.

Redundant cueing drops

The system should reduce hints that add little or no new coordination value.

Burden does not just shift

Lower redundancy is not enough if one partner ends up carrying most of the signaling work.

Trust check

The simulator runs from deterministic seeds, the benchmark averages are computed from repeated runs, and the final screen exposes the formulas and exportable evidence instead of relying on presentation-only text.

Step 2B
Watch one decision unfold

Press Step or Autoplay to run the current game.

Score 0
Info tokens8
Fuse tokens3
Deck remaining40
Turn0
One thing to focus on at a time

Switch between the current decision, the partner estimate, the metrics, and the turn log instead of reading everything at once.

Awaiting run No action yet

The app will show the live action rationale here, including the partner-state estimate and whether a cue was redundant.

Step 3B
Inspect one benchmark view at a time

Run the benchmark to populate the summary, the full table, and the cross-play matrix.

Run the benchmark to generate the computed result summary.
Step 4B
Reproducibility check and interpretation

Run a game or benchmark first to populate the computed evidence below.

No evidence selected yet.
Computed note
The reproducibility note will summarize the seed configuration and how the displayed metric values were computed.
Optional interpretation
Ask for an interpretation after running a game or benchmark. Any answer here is derived from the computed evidence shown above.
Saved snapshots

Saved entries preserve the computed summary so the result can be re-checked later from the same metrics and seed configuration.

What the system computes

This application runs a full two-player Hanabi simulation in the browser using standard rules, deterministic seeded decks, live agent policies, and turn-by-turn state updates. It is not a static dashboard. The agents actually act inside the simulator and can be benchmarked under repeated seeds.

The storage layer is only used to persist evidence snapshots. The core game logic, policy actions, and benchmark metrics are computed from the simulation itself rather than loaded from a fixed dataset.

How selective awareness decides

The selective agent is not a trained neural model and not an NLP-style Markov chain. The Hanabi state changes come directly from the game rules. On each turn, the policy evaluates play, discard, and hint actions from the current state.

A hint scores higher when it reveals a playable card, makes a safe discard explicit, protects a critical card, or meaningfully reduces partner uncertainty. The score drops when the hint is redundant, costly, or shifts too much signaling burden onto one player.

Scope boundary

This demo focuses on a bounded Hanabi benchmark with a unified pipeline, four-policy comparison, and controlled simulation testing. A full probabilistic formulation would be closer to a POMDP than a simple Markov chain, but that is beyond the scope of this project.

It should be framed as a simulation-first benchmark rather than a replacement for a hosted human-agent evaluation protocol.