Launch special: 50% off Pro monthly with code LAUNCH50 Upgrade now
Skip to main content
← All problems
chini-018-polling-station

Election Day Polling Station

One precinct, eight booths, three machines, a thousand voters, and the printer for ballot paper just jammed.

Source: Election administration literature, voter wait-time research, post-2020 polling reform reports

Prompt

Design the voter flow for a single polling precinct on a presidential election day.

Functional:
- Voter arrives, checks in at one of 4 poll-book stations (ID + signature). Issued a ballot.
- Voter takes ballot to one of 8 privacy booths to mark.
- Voter feeds marked ballot into one of 3 scanner/tabulator machines.
- Provisional ballots (registration mismatch) routed to a separate provisional table, sealed in envelope, NOT scanned.

Non-functional:
- A morning rush (4x baseline arrival) must NOT cause average wait to exceed 30 minutes. Booths/scanners must absorb.
- If a tabulator fails, ballots must be securely stored in the emergency-ballot bin for later scanning, NOT discarded or rerouted insecurely.
- If poll-book network goes down, check-in must continue via paper backup with reconciliation later. The line cannot stop.

Return a Chinilla CanvasState. Components: poll books, booths, tabulators, provisional table, emergency bin. Behaviors: queue (waiting line), split (regular vs provisional), ratelimit (booth turnover), circuitbreaker (network failover), storage (emergency bin).

Constraints

Max components
13
Required behaviors
queue, split, circuitbreaker
Monthly budget
$8000

Stress scenarios

Steady turnout

baseline

Normal voter arrival, all systems up.

Pre-work rush

spike

4x arrival rate from 7-9am. Wait times must hold.

Tabulator down

outage

One scanner offline. Ballots must route to emergency bin, not discarded.

Poll-book network outage

outage

Check-in network down. Paper backup must keep the line moving.

Pass criteria (overall)

Min stability score
65
Max drop rate
5.0%
Min delivery rate
92.0%
Max errors
5

Submit your run

Submissions go through the chini-bench CLI. It calls your model with your key, scores the result locally, and posts to the leaderboard. Nothing leaves your machine except the canvas it produces.

End-to-end:
pip install git+https://github.com/collapseindex/chini-bench-cli.git
export OPENROUTER_API_KEY=...

chini-bench run chini-018-polling-station \
  --provider openrouter --model google/gemini-2.0-flash-001 \
  --as alice
Or inspect the prompt first:
chini-bench prompt chini-018-polling-station
Providers: openai · anthropic · google · openrouter · ollama

Leaderboard

Rank Submitter Model Score Stability Delivery Design Pass
#1 rl_v06_run2
rl_policy
custom single-shot
89 85.0 100.0 75.0
#2 rl_v06_run2
rl_policy
custom single-shot
88 83.0 100.0 75.0
#3 rl_v06_run2
rl_policy
custom single-shot
86 86.0 100.0 50.0
#4 rl_v06_run2
rl_policy
custom single-shot
85 70.0 100.0 100.0
#5 rl_v06_run2
rl_policy
custom single-shot
84 83.0 88.0 75.0
#6 rl_v06_run2
rl_policy
custom single-shot
84 83.0 100.0 50.0
#7 alex
anthropic/claude-sonnet-4.6
default single-shot
83 72.0 91.0 100.0
#8 rl_v06_run2
rl_policy
custom single-shot
83 80.0 100.0 50.0
#9 rl_v06_run2
rl_policy
custom single-shot
82 86.0 100.0 50.0
#10 alex
x-ai/grok-4.20
default reflexion
81 84.0 75.0 100.0
#11 rl_v06_run1
rl_policy
custom single-shot
80 70.0 100.0 60.0
#12 rl_v06_run1
rl_policy
custom single-shot
80 74.0 96.0 100.0
#13 rl_v06_run2
rl_policy
custom single-shot
80 72.0 81.0 85.0
#14 rl_v06_run1
rl_policy
custom single-shot
79 57.0 100.0 100.0
#15 rl_v06_run2
rl_policy
custom single-shot
79 80.0 100.0 50.0
#16 rl_v06_run2
rl_policy
custom single-shot
79 81.0 100.0 50.0
#17 alex
openai/gpt-5.4
default single-shot
78 68.0 94.0 75.0
#18 rl_v06_run2
rl_policy
custom single-shot
78 81.0 59.0 85.0
#19 rl_v06_run2
rl_policy
custom single-shot
78 78.0 100.0 50.0
#20 rl_v06_run2
rl_policy
custom single-shot
78 58.0 96.0 100.0
#21 rl_v06_run1
rl_policy
custom single-shot
77 68.0 100.0 60.0
#22 rl_v06_run2
rl_policy
custom single-shot
77 84.0 63.0 85.0
#23 rl_v06_run2
rl_policy
custom single-shot
77 78.0 85.0 75.0
#24 rl_v06_run2
rl_policy
custom single-shot
76 80.0 100.0 50.0
#25 rl_v06_run2
rl_policy
custom single-shot
76 70.0 69.0 85.0
#26 rl_v06_run1
rl_policy
custom single-shot
75 68.0 100.0 75.0
#27 rl_v06_run2
rl_policy
custom single-shot
75 59.0 100.0 85.0
#28 rl_v06_run2
rl_policy
custom single-shot
75 64.0 100.0 75.0
#29 alex
x-ai/grok-4.20
default single-shot
74 70.0 63.0 100.0
#30 rl_v06_run2
rl_policy
custom single-shot
74 71.0 100.0 50.0
#31 rl_v06_run1
rl_policy
custom single-shot
73 69.0 100.0 50.0
#32 rl_v06_run2
rl_policy
custom single-shot
73 68.0 100.0 50.0
#33 rl_v06_run2
rl_policy
custom single-shot
73 63.0 75.0 85.0
#34 rl_v06_run2
rl_policy
custom single-shot
73 72.0 68.0 100.0
#35 rl_v06_run2
rl_policy
custom single-shot
73 68.0 100.0 50.0
#36 rl_v06_run1
rl_policy
custom single-shot
72 58.0 82.0 85.0
#37 rl_v06_run1
rl_policy
custom single-shot
72 76.0 47.0 100.0
#38 rl_v06_run2
rl_policy
custom single-shot
71 55.0 97.0 85.0
#39 rl_v06_run2
rl_policy
custom single-shot
71 65.0 61.0 85.0
#40 rl_v06_run1
rl_policy
custom single-shot
70 74.0 82.0 50.0
#41 rl_v06_run1
rl_policy
custom single-shot
70 82.0 29.0 100.0
#42 rl_v06_run1
rl_policy
custom single-shot
70 70.0 100.0 50.0
#43 rl_v06_run2
rl_policy
custom single-shot
70 78.0 75.0 50.0
#44 rl_v06_run2
rl_policy
custom single-shot
70 60.0 85.0 100.0
#45 rl_v06_run1
rl_policy
custom single-shot
69 37.0 100.0 85.0
#46 rl_v06_run1
rl_policy
custom single-shot
68 65.0 100.0 50.0
#47 rl_v06_run2
rl_policy
custom single-shot
68 65.0 100.0 50.0
#48 rl_v06_run2
rl_policy
custom single-shot
65 66.0 46.0 85.0
#49 rl_v06_run2
rl_policy
custom single-shot
63 67.0 33.0 85.0
#50 alex
anthropic/claude-sonnet-4.6
default reflexion
58 36.0 100.0 100.0
#51 rl_v06_run2
rl_policy
custom single-shot
58 86.0 20.0 50.0
#52 rl_v06_run2
rl_policy
custom single-shot
58 31.0 100.0 75.0
#53 alex
google/gemini-3.1-pro-preview
default reflexion
57 63.0 31.0 100.0
#54 rl_v06_run2
rl_policy
custom single-shot
51 62.0 0.0 45.0
#55 rl_v06_run1
rl_policy
custom single-shot
49 17.0 78.0 100.0
#56 alex
google/gemini-3.1-pro-preview
default single-shot
45 15.0 58.0 100.0
#57 rl_v06_run2
rl_policy
custom single-shot
45 65.0 0.0 60.0
#58 alex
openai/gpt-5.4
default reflexion
36 10.0 57.0 100.0
#59 rl_v06_run2
rl_policy
custom single-shot
26 0.0 20.0 100.0
Per-scenario breakdown of the top run
Scenario Health Drop rate Delivered Pass
baseline 85.0 0.0% 288
morning-rush 85.0 0.0% 1056
tabulator-fail 85.0 0.0% 264
pollbook-down 85.0 0.0% 264