chini-018-polling-station

Election Day Polling Station

One precinct, eight booths, three machines, a thousand voters, and the printer for ballot paper just jammed.

Source: Election administration literature, voter wait-time research, post-2020 polling reform reports

Prompt

Design the voter flow for a single polling precinct on a presidential election day.

Functional:
- Voter arrives, checks in at one of 4 poll-book stations (ID + signature). Issued a ballot.
- Voter takes ballot to one of 8 privacy booths to mark.
- Voter feeds marked ballot into one of 3 scanner/tabulator machines.
- Provisional ballots (registration mismatch) routed to a separate provisional table, sealed in envelope, NOT scanned.

Non-functional:
- A morning rush (4x baseline arrival) must NOT cause average wait to exceed 30 minutes. Booths/scanners must absorb.
- If a tabulator fails, ballots must be securely stored in the emergency-ballot bin for later scanning, NOT discarded or rerouted insecurely.
- If poll-book network goes down, check-in must continue via paper backup with reconciliation later. The line cannot stop.

Return a Chinilla CanvasState. Components: poll books, booths, tabulators, provisional table, emergency bin. Behaviors: queue (waiting line), split (regular vs provisional), ratelimit (booth turnover), circuitbreaker (network failover), storage (emergency bin).

Constraints

Max components: 13
Required behaviors: queue, split, circuitbreaker
Monthly budget: $8000

Stress scenarios

Steady turnout

baseline

Normal voter arrival, all systems up.

Pre-work rush

spike

4x arrival rate from 7-9am. Wait times must hold.

Tabulator down

outage

One scanner offline. Ballots must route to emergency bin, not discarded.

Poll-book network outage

outage

Check-in network down. Paper backup must keep the line moving.

Pass criteria (overall)

Min stability score: 65
Max drop rate: 5.0%
Min delivery rate: 92.0%
Max errors: 5

Submit your run

Submissions go through the chini-bench CLI. It calls your model with your key, scores the result locally, and posts to the leaderboard. Nothing leaves your machine except the canvas it produces.

End-to-end:

pip install git+https://github.com/collapseindex/chini-bench-cli.git
export OPENROUTER_API_KEY=...

chini-bench run chini-018-polling-station \
  --provider openrouter --model google/gemini-2.0-flash-001 \
  --as alice

Or inspect the prompt first:

chini-bench prompt chini-018-polling-station

Providers: openai · anthropic · google · openrouter · ollama

Leaderboard

Rank	Submitter	Model	Score	Stability	Delivery	Design	Pass
#1	rl_v06_run2	rl_policy custom single-shot	89	85.0	100.0	75.0	✗
#2	rl_v06_run2	rl_policy custom single-shot	88	83.0	100.0	75.0	✗
#3	rl_v06_run2	rl_policy custom single-shot	86	86.0	100.0	50.0	✗
#4	rl_v06_run2	rl_policy custom single-shot	85	70.0	100.0	100.0	✗
#5	rl_v06_run2	rl_policy custom single-shot	84	83.0	88.0	75.0	✗
#6	rl_v06_run2	rl_policy custom single-shot	84	83.0	100.0	50.0	✗
#7	alex	anthropic/claude-sonnet-4.6 default single-shot	83	72.0	91.0	100.0	✗
#8	rl_v06_run2	rl_policy custom single-shot	83	80.0	100.0	50.0	✗
#9	rl_v06_run2	rl_policy custom single-shot	82	86.0	100.0	50.0	✗
#10	alex	x-ai/grok-4.20 default reflexion	81	84.0	75.0	100.0	✗
#11	rl_v06_run1	rl_policy custom single-shot	80	70.0	100.0	60.0	✗
#12	rl_v06_run1	rl_policy custom single-shot	80	74.0	96.0	100.0	✗
#13	rl_v06_run2	rl_policy custom single-shot	80	72.0	81.0	85.0	✗
#14	rl_v06_run1	rl_policy custom single-shot	79	57.0	100.0	100.0	✗
#15	rl_v06_run2	rl_policy custom single-shot	79	80.0	100.0	50.0	✗
#16	rl_v06_run2	rl_policy custom single-shot	79	81.0	100.0	50.0	✗
#17	alex	openai/gpt-5.4 default single-shot	78	68.0	94.0	75.0	✗
#18	rl_v06_run2	rl_policy custom single-shot	78	81.0	59.0	85.0	✗
#19	rl_v06_run2	rl_policy custom single-shot	78	78.0	100.0	50.0	✗
#20	rl_v06_run2	rl_policy custom single-shot	78	58.0	96.0	100.0	✗
#21	rl_v06_run1	rl_policy custom single-shot	77	68.0	100.0	60.0	✗
#22	rl_v06_run2	rl_policy custom single-shot	77	84.0	63.0	85.0	✗
#23	rl_v06_run2	rl_policy custom single-shot	77	78.0	85.0	75.0	✗
#24	rl_v06_run2	rl_policy custom single-shot	76	80.0	100.0	50.0	✗
#25	rl_v06_run2	rl_policy custom single-shot	76	70.0	69.0	85.0	✗
#26	rl_v06_run1	rl_policy custom single-shot	75	68.0	100.0	75.0	✗
#27	rl_v06_run2	rl_policy custom single-shot	75	59.0	100.0	85.0	✗
#28	rl_v06_run2	rl_policy custom single-shot	75	64.0	100.0	75.0	✗
#29	alex	x-ai/grok-4.20 default single-shot	74	70.0	63.0	100.0	✗
#30	rl_v06_run2	rl_policy custom single-shot	74	71.0	100.0	50.0	✗
#31	rl_v06_run1	rl_policy custom single-shot	73	69.0	100.0	50.0	✗
#32	rl_v06_run2	rl_policy custom single-shot	73	68.0	100.0	50.0	✗
#33	rl_v06_run2	rl_policy custom single-shot	73	63.0	75.0	85.0	✗
#34	rl_v06_run2	rl_policy custom single-shot	73	72.0	68.0	100.0	✗
#35	rl_v06_run2	rl_policy custom single-shot	73	68.0	100.0	50.0	✗
#36	rl_v06_run1	rl_policy custom single-shot	72	58.0	82.0	85.0	✗
#37	rl_v06_run1	rl_policy custom single-shot	72	76.0	47.0	100.0	✗
#38	rl_v06_run2	rl_policy custom single-shot	71	55.0	97.0	85.0	✗
#39	rl_v06_run2	rl_policy custom single-shot	71	65.0	61.0	85.0	✗
#40	rl_v06_run1	rl_policy custom single-shot	70	74.0	82.0	50.0	✗
#41	rl_v06_run1	rl_policy custom single-shot	70	82.0	29.0	100.0	✗
#42	rl_v06_run1	rl_policy custom single-shot	70	70.0	100.0	50.0	✗
#43	rl_v06_run2	rl_policy custom single-shot	70	78.0	75.0	50.0	✗
#44	rl_v06_run2	rl_policy custom single-shot	70	60.0	85.0	100.0	✗
#45	rl_v06_run1	rl_policy custom single-shot	69	37.0	100.0	85.0	✗
#46	rl_v06_run1	rl_policy custom single-shot	68	65.0	100.0	50.0	✗
#47	rl_v06_run2	rl_policy custom single-shot	68	65.0	100.0	50.0	✗
#48	rl_v06_run2	rl_policy custom single-shot	65	66.0	46.0	85.0	✗
#49	rl_v06_run2	rl_policy custom single-shot	63	67.0	33.0	85.0	✗
#50	alex	anthropic/claude-sonnet-4.6 default reflexion	58	36.0	100.0	100.0	✗
#51	rl_v06_run2	rl_policy custom single-shot	58	86.0	20.0	50.0	✗
#52	rl_v06_run2	rl_policy custom single-shot	58	31.0	100.0	75.0	✗
#53	alex	google/gemini-3.1-pro-preview default reflexion	57	63.0	31.0	100.0	✗
#54	rl_v06_run2	rl_policy custom single-shot	51	62.0	0.0	45.0	✗
#55	rl_v06_run1	rl_policy custom single-shot	49	17.0	78.0	100.0	✗
#56	alex	google/gemini-3.1-pro-preview default single-shot	45	15.0	58.0	100.0	✗
#57	rl_v06_run2	rl_policy custom single-shot	45	65.0	0.0	60.0	✗
#58	alex	openai/gpt-5.4 default reflexion	36	10.0	57.0	100.0	✗
#59	rl_v06_run2	rl_policy custom single-shot	26	0.0	20.0	100.0	✗

Per-scenario breakdown of the top run

Scenario	Health	Drop rate	Delivered	Pass
baseline	85.0	0.0%	288	✓
morning-rush	85.0	0.0%	1056	✓
tabulator-fail	85.0	0.0%	264	✓
pollbook-down	85.0	0.0%	264	✓

How is this scored? →