chini-019-vaccine-rollout

County Vaccine Rollout

Cold chain from a -70C freezer to a 95-year-old's deltoid. Don't waste a single dose.

Source: Public health logistics, COVID vaccine distribution post-mortems, cold-chain pharmaceutical management

Prompt

Design a county-level vaccine distribution system from central storage to patient arms across 8 clinics.

Functional:
- Doses arrive in shipments at central freezer (-70C). Thawed doses have 6-hour viable window once moved to clinic refrigerators.
- Clinics submit daily forecasts; central scheduler ships doses to match. Patient appointments are pre-booked.
- Each clinic has 4 vaccinator stations, observation area (15-min post-shot wait), and a daily appointment cap.
- Walk-ins accepted only at end-of-day to use thawed doses that would otherwise expire.

Non-functional:
- A high-priority surge (4x demand for an eligible age group) must NOT cause cold-chain violations. Shipment cadence adapts.
- If a clinic refrigerator fails, doses must be redistributed to nearby clinics within the 6-hour window, NOT used past expiration.
- If appointment no-show rate spikes, walk-in protocol must absorb to prevent waste WITHOUT skipping the eligibility check.

Return a Chinilla CanvasState. Components: freezer, scheduler, clinics, stations, walk-in protocol. Behaviors: queue (appointment book), batch (shipment cadence), ratelimit (daily caps), circuitbreaker (refrigerator failover), split (priority vs walk-in routing), filter (eligibility check).

Constraints

Max components: 13
Required behaviors: queue, ratelimit, circuitbreaker
Monthly budget: $250000

Stress scenarios

Steady week

baseline

Normal demand, clinics operating, cold chain intact.

Eligibility expansion

spike

New age group eligible, demand 4x. Shipments must adapt without cold-chain violation.

Clinic refrigerator fails

outage

One clinic loses cold storage. Doses must move within 6 hours or be wasted.

Late shipment from freezer

latency

Central freezer shipment delayed. Clinic schedules must hold without dumping appointments.

Pass criteria (overall)

Min stability score: 60
Max drop rate: 8.0%
Min delivery rate: 88.0%
Max errors: 6

Submit your run

Submissions go through the chini-bench CLI. It calls your model with your key, scores the result locally, and posts to the leaderboard. Nothing leaves your machine except the canvas it produces.

End-to-end:

pip install git+https://github.com/collapseindex/chini-bench-cli.git
export OPENROUTER_API_KEY=...

chini-bench run chini-019-vaccine-rollout \
  --provider openrouter --model google/gemini-2.0-flash-001 \
  --as alice

Or inspect the prompt first:

chini-bench prompt chini-019-vaccine-rollout

Providers: openai · anthropic · google · openrouter · ollama

Leaderboard

Rank	Submitter	Model	Score	Stability	Delivery	Design	Pass
#1	rl_v06_run2	rl_policy custom single-shot	89	86.0	100.0	75.0	✗
#2	rl_v06_run2	rl_policy custom single-shot	88	75.0	100.0	100.0	✓
#3	rl_v06_run1	rl_policy custom single-shot	86	84.0	97.0	75.0	✗
#4	rl_v06_run2	rl_policy custom single-shot	86	83.0	100.0	75.0	✗
#5	rl_v06_run2	rl_policy custom single-shot	86	72.0	100.0	100.0	✓
#6	rl_v06_run2	rl_policy custom single-shot	86	86.0	100.0	50.0	✗
#7	rl_v06_run2	rl_policy custom single-shot	86	71.0	100.0	100.0	✗
#8	alex	x-ai/grok-4.20 default reflexion	85	69.0	100.0	100.0	✓
#9	rl_v06_run1	rl_policy custom single-shot	85	85.0	100.0	50.0	✗
#10	rl_v06_run1	rl_policy custom single-shot	83	74.0	87.0	85.0	✓
#11	rl_v06_run2	rl_policy custom single-shot	83	66.0	100.0	85.0	✓
#12	alex	google/gemini-3.1-pro-preview default single-shot	82	64.0	100.0	100.0	✗
#13	rl_v06_run2	rl_policy custom single-shot	82	83.0	90.0	75.0	✗
#14	rl_v06_run1	rl_policy custom single-shot	81	83.0	83.0	75.0	✗
#15	rl_v06_run2	rl_policy custom single-shot	81	62.0	99.0	100.0	✗
#16	alex	google/gemini-3.1-pro-preview default reflexion	80	69.0	84.0	100.0	✗
#17	rl_v06_run1	rl_policy custom single-shot	80	83.0	100.0	50.0	✗
#18	rl_v06_run2	rl_policy custom single-shot	80	79.0	67.0	100.0	✗
#19	rl_v06_run2	rl_policy custom single-shot	79	76.0	88.0	75.0	✗
#20	rl_v06_run2	rl_policy custom single-shot	79	65.0	100.0	85.0	✗
#21	rl_v06_run2	rl_policy custom single-shot	78	68.0	81.0	100.0	✗
#22	rl_v06_run2	rl_policy custom single-shot	75	64.0	100.0	75.0	✗
#23	rl_v06_run2	rl_policy custom single-shot	75	72.0	100.0	50.0	✗
#24	alex	openai/gpt-5.4 default reflexion	74	59.0	100.0	100.0	✗
#25	rl_v06_run1	rl_policy custom single-shot	74	54.0	99.0	100.0	✗
#26	rl_v06_run1	rl_policy custom single-shot	74	79.0	92.0	50.0	✗
#27	rl_v06_run1	rl_policy custom single-shot	74	71.0	100.0	50.0	✗
#28	rl_v06_run2	rl_policy custom single-shot	74	62.0	75.0	85.0	✗
#29	rl_v06_run2	rl_policy custom single-shot	73	80.0	57.0	75.0	✗
#30	rl_v06_run1	rl_policy custom single-shot	72	70.0	100.0	50.0	✗
#31	rl_v06_run1	rl_policy custom single-shot	72	44.0	100.0	85.0	✗
#32	rl_v06_run2	rl_policy custom single-shot	71	49.0	100.0	75.0	✗
#33	rl_v06_run2	rl_policy custom single-shot	71	66.0	60.0	85.0	✗
#34	rl_v06_run2	rl_policy custom single-shot	70	57.0	73.0	100.0	✗
#35	rl_v06_run2	rl_policy custom single-shot	65	58.0	65.0	75.0	✗
#36	rl_v06_run2	rl_policy custom single-shot	65	53.0	60.0	100.0	✗
#37	rl_v06_run2	rl_policy custom single-shot	65	54.0	60.0	100.0	✗
#38	rl_v06_run2	rl_policy custom single-shot	65	66.0	53.0	60.0	✗
#39	alex	anthropic/claude-sonnet-4.6 default single-shot	64	69.0	33.0	100.0	✗
#40	rl_v06_run2	rl_policy custom single-shot	62	69.0	25.0	100.0	✗
#41	rl_v06_run2	rl_policy custom single-shot	62	50.0	55.0	100.0	✗
#42	rl_v06_run2	rl_policy custom single-shot	62	66.0	31.0	100.0	✗
#43	rl_v06_run2	rl_policy custom single-shot	62	24.0	100.0	100.0	✗
#44	rl_v06_run2	rl_policy custom single-shot	59	46.0	52.0	100.0	✗
#45	rl_v06_run1	rl_policy custom single-shot	57	13.0	100.0	100.0	✗
#46	rl_v06_run2	rl_policy custom single-shot	57	22.0	100.0	100.0	✗
#47	rl_v06_run2	rl_policy custom single-shot	57	49.0	41.0	85.0	✗
#48	rl_v06_run2	rl_policy custom single-shot	55	52.0	31.0	100.0	✗
#49	rl_v06_run2	rl_policy custom single-shot	55	15.0	91.0	85.0	✗
#50	alex	openai/gpt-5.4 default single-shot	50	0.0	100.0	100.0	✗
#51	rl_v06_run1	rl_policy custom single-shot	50	0.0	100.0	100.0	✗
#52	rl_v06_run2	rl_policy custom single-shot	50	0.0	100.0	100.0	✗
#53	rl_v06_run1	rl_policy custom single-shot	48	45.0	18.0	85.0	✗
#54	rl_v06_run1	rl_policy custom single-shot	46	0.0	100.0	75.0	✗
#55	rl_v06_run2	rl_policy custom single-shot	46	18.0	55.0	100.0	✗
#56	rl_v06_run2	rl_policy custom single-shot	45	4.0	75.0	100.0	✗
#57	rl_v06_run2	rl_policy custom single-shot	42	13.0	50.0	85.0	✗
#58	rl_v06_run1	rl_policy custom single-shot	39	0.0	62.0	100.0	✗
#59	rl_v06_run2	rl_policy custom single-shot	39	0.0	64.0	100.0	✗
#60	rl_v06_run2	rl_policy custom single-shot	35	0.0	50.0	100.0	✗
#61	alex	x-ai/grok-4.20 default single-shot	29	0.0	31.0	100.0	✗
#62	alex	anthropic/claude-sonnet-4.6 default reflexion	14	0.0	0.0	75.0	✗

Per-scenario breakdown of the top run

Scenario	Health	Drop rate	Delivered	Pass
baseline	86.0	0.0%	96	✓
demand-surge	85.0	0.0%	352	✓
fridge-fail	86.0	0.0%	88	✓
shipment-delay	86.0	0.0%	88	✓

How is this scored? →