chini-018-polling-station
Election Day Polling Station
One precinct, eight booths, three machines, a thousand voters, and the printer for ballot paper just jammed.
Source: Election administration literature, voter wait-time research, post-2020 polling reform reports
Prompt
Design the voter flow for a single polling precinct on a presidential election day. Functional: - Voter arrives, checks in at one of 4 poll-book stations (ID + signature). Issued a ballot. - Voter takes ballot to one of 8 privacy booths to mark. - Voter feeds marked ballot into one of 3 scanner/tabulator machines. - Provisional ballots (registration mismatch) routed to a separate provisional table, sealed in envelope, NOT scanned. Non-functional: - A morning rush (4x baseline arrival) must NOT cause average wait to exceed 30 minutes. Booths/scanners must absorb. - If a tabulator fails, ballots must be securely stored in the emergency-ballot bin for later scanning, NOT discarded or rerouted insecurely. - If poll-book network goes down, check-in must continue via paper backup with reconciliation later. The line cannot stop. Return a Chinilla CanvasState. Components: poll books, booths, tabulators, provisional table, emergency bin. Behaviors: queue (waiting line), split (regular vs provisional), ratelimit (booth turnover), circuitbreaker (network failover), storage (emergency bin).
Constraints
- Max components
- 13
- Required behaviors
- queue, split, circuitbreaker
- Monthly budget
- $8000
Stress scenarios
Steady turnout
baselineNormal voter arrival, all systems up.
Pre-work rush
spike4x arrival rate from 7-9am. Wait times must hold.
Tabulator down
outageOne scanner offline. Ballots must route to emergency bin, not discarded.
Poll-book network outage
outageCheck-in network down. Paper backup must keep the line moving.
Pass criteria (overall)
- Min stability score
- 65
- Max drop rate
- 5.0%
- Min delivery rate
- 92.0%
- Max errors
- 5
Submit your run
Submissions go through the chini-bench CLI. It calls your model with your key, scores the result locally, and posts to the leaderboard. Nothing leaves your machine except the canvas it produces.
End-to-end:
pip install git+https://github.com/collapseindex/chini-bench-cli.git
export OPENROUTER_API_KEY=...
chini-bench run chini-018-polling-station \
--provider openrouter --model google/gemini-2.0-flash-001 \
--as alice Or inspect the prompt first:
chini-bench prompt chini-018-polling-station Providers: openai · anthropic · google · openrouter · ollama
Leaderboard
| Rank | Submitter | Model | Score | Stability | Delivery | Design | Pass |
|---|---|---|---|---|---|---|---|
| #1 | rl_v06_run2 | rl_policy custom single-shot | 89 | 85.0 | 100.0 | 75.0 | ✗ |
| #2 | rl_v06_run2 | rl_policy custom single-shot | 88 | 83.0 | 100.0 | 75.0 | ✗ |
| #3 | rl_v06_run2 | rl_policy custom single-shot | 86 | 86.0 | 100.0 | 50.0 | ✗ |
| #4 | rl_v06_run2 | rl_policy custom single-shot | 85 | 70.0 | 100.0 | 100.0 | ✗ |
| #5 | rl_v06_run2 | rl_policy custom single-shot | 84 | 83.0 | 88.0 | 75.0 | ✗ |
| #6 | rl_v06_run2 | rl_policy custom single-shot | 84 | 83.0 | 100.0 | 50.0 | ✗ |
| #7 | alex | anthropic/claude-sonnet-4.6 default single-shot | 83 | 72.0 | 91.0 | 100.0 | ✗ |
| #8 | rl_v06_run2 | rl_policy custom single-shot | 83 | 80.0 | 100.0 | 50.0 | ✗ |
| #9 | rl_v06_run2 | rl_policy custom single-shot | 82 | 86.0 | 100.0 | 50.0 | ✗ |
| #10 | alex | x-ai/grok-4.20 default reflexion | 81 | 84.0 | 75.0 | 100.0 | ✗ |
| #11 | rl_v06_run1 | rl_policy custom single-shot | 80 | 70.0 | 100.0 | 60.0 | ✗ |
| #12 | rl_v06_run1 | rl_policy custom single-shot | 80 | 74.0 | 96.0 | 100.0 | ✗ |
| #13 | rl_v06_run2 | rl_policy custom single-shot | 80 | 72.0 | 81.0 | 85.0 | ✗ |
| #14 | rl_v06_run1 | rl_policy custom single-shot | 79 | 57.0 | 100.0 | 100.0 | ✗ |
| #15 | rl_v06_run2 | rl_policy custom single-shot | 79 | 80.0 | 100.0 | 50.0 | ✗ |
| #16 | rl_v06_run2 | rl_policy custom single-shot | 79 | 81.0 | 100.0 | 50.0 | ✗ |
| #17 | alex | openai/gpt-5.4 default single-shot | 78 | 68.0 | 94.0 | 75.0 | ✗ |
| #18 | rl_v06_run2 | rl_policy custom single-shot | 78 | 81.0 | 59.0 | 85.0 | ✗ |
| #19 | rl_v06_run2 | rl_policy custom single-shot | 78 | 78.0 | 100.0 | 50.0 | ✗ |
| #20 | rl_v06_run2 | rl_policy custom single-shot | 78 | 58.0 | 96.0 | 100.0 | ✗ |
| #21 | rl_v06_run1 | rl_policy custom single-shot | 77 | 68.0 | 100.0 | 60.0 | ✗ |
| #22 | rl_v06_run2 | rl_policy custom single-shot | 77 | 84.0 | 63.0 | 85.0 | ✗ |
| #23 | rl_v06_run2 | rl_policy custom single-shot | 77 | 78.0 | 85.0 | 75.0 | ✗ |
| #24 | rl_v06_run2 | rl_policy custom single-shot | 76 | 80.0 | 100.0 | 50.0 | ✗ |
| #25 | rl_v06_run2 | rl_policy custom single-shot | 76 | 70.0 | 69.0 | 85.0 | ✗ |
| #26 | rl_v06_run1 | rl_policy custom single-shot | 75 | 68.0 | 100.0 | 75.0 | ✗ |
| #27 | rl_v06_run2 | rl_policy custom single-shot | 75 | 59.0 | 100.0 | 85.0 | ✗ |
| #28 | rl_v06_run2 | rl_policy custom single-shot | 75 | 64.0 | 100.0 | 75.0 | ✗ |
| #29 | alex | x-ai/grok-4.20 default single-shot | 74 | 70.0 | 63.0 | 100.0 | ✗ |
| #30 | rl_v06_run2 | rl_policy custom single-shot | 74 | 71.0 | 100.0 | 50.0 | ✗ |
| #31 | rl_v06_run1 | rl_policy custom single-shot | 73 | 69.0 | 100.0 | 50.0 | ✗ |
| #32 | rl_v06_run2 | rl_policy custom single-shot | 73 | 68.0 | 100.0 | 50.0 | ✗ |
| #33 | rl_v06_run2 | rl_policy custom single-shot | 73 | 63.0 | 75.0 | 85.0 | ✗ |
| #34 | rl_v06_run2 | rl_policy custom single-shot | 73 | 72.0 | 68.0 | 100.0 | ✗ |
| #35 | rl_v06_run2 | rl_policy custom single-shot | 73 | 68.0 | 100.0 | 50.0 | ✗ |
| #36 | rl_v06_run1 | rl_policy custom single-shot | 72 | 58.0 | 82.0 | 85.0 | ✗ |
| #37 | rl_v06_run1 | rl_policy custom single-shot | 72 | 76.0 | 47.0 | 100.0 | ✗ |
| #38 | rl_v06_run2 | rl_policy custom single-shot | 71 | 55.0 | 97.0 | 85.0 | ✗ |
| #39 | rl_v06_run2 | rl_policy custom single-shot | 71 | 65.0 | 61.0 | 85.0 | ✗ |
| #40 | rl_v06_run1 | rl_policy custom single-shot | 70 | 74.0 | 82.0 | 50.0 | ✗ |
| #41 | rl_v06_run1 | rl_policy custom single-shot | 70 | 82.0 | 29.0 | 100.0 | ✗ |
| #42 | rl_v06_run1 | rl_policy custom single-shot | 70 | 70.0 | 100.0 | 50.0 | ✗ |
| #43 | rl_v06_run2 | rl_policy custom single-shot | 70 | 78.0 | 75.0 | 50.0 | ✗ |
| #44 | rl_v06_run2 | rl_policy custom single-shot | 70 | 60.0 | 85.0 | 100.0 | ✗ |
| #45 | rl_v06_run1 | rl_policy custom single-shot | 69 | 37.0 | 100.0 | 85.0 | ✗ |
| #46 | rl_v06_run1 | rl_policy custom single-shot | 68 | 65.0 | 100.0 | 50.0 | ✗ |
| #47 | rl_v06_run2 | rl_policy custom single-shot | 68 | 65.0 | 100.0 | 50.0 | ✗ |
| #48 | rl_v06_run2 | rl_policy custom single-shot | 65 | 66.0 | 46.0 | 85.0 | ✗ |
| #49 | rl_v06_run2 | rl_policy custom single-shot | 63 | 67.0 | 33.0 | 85.0 | ✗ |
| #50 | alex | anthropic/claude-sonnet-4.6 default reflexion | 58 | 36.0 | 100.0 | 100.0 | ✗ |
| #51 | rl_v06_run2 | rl_policy custom single-shot | 58 | 86.0 | 20.0 | 50.0 | ✗ |
| #52 | rl_v06_run2 | rl_policy custom single-shot | 58 | 31.0 | 100.0 | 75.0 | ✗ |
| #53 | alex | google/gemini-3.1-pro-preview default reflexion | 57 | 63.0 | 31.0 | 100.0 | ✗ |
| #54 | rl_v06_run2 | rl_policy custom single-shot | 51 | 62.0 | 0.0 | 45.0 | ✗ |
| #55 | rl_v06_run1 | rl_policy custom single-shot | 49 | 17.0 | 78.0 | 100.0 | ✗ |
| #56 | alex | google/gemini-3.1-pro-preview default single-shot | 45 | 15.0 | 58.0 | 100.0 | ✗ |
| #57 | rl_v06_run2 | rl_policy custom single-shot | 45 | 65.0 | 0.0 | 60.0 | ✗ |
| #58 | alex | openai/gpt-5.4 default reflexion | 36 | 10.0 | 57.0 | 100.0 | ✗ |
| #59 | rl_v06_run2 | rl_policy custom single-shot | 26 | 0.0 | 20.0 | 100.0 | ✗ |
Per-scenario breakdown of the top run
| Scenario | Health | Drop rate | Delivered | Pass |
|---|---|---|---|---|
| baseline | 85.0 | 0.0% | 288 | ✓ |
| morning-rush | 85.0 | 0.0% | 1056 | ✓ |
| tabulator-fail | 85.0 | 0.0% | 264 | ✓ |
| pollbook-down | 85.0 | 0.0% | 264 | ✓ |