Launch special: 50% off Pro monthly with code LAUNCH50 Upgrade now
Skip to main content
← All problems
chini-019-vaccine-rollout

County Vaccine Rollout

Cold chain from a -70C freezer to a 95-year-old's deltoid. Don't waste a single dose.

Source: Public health logistics, COVID vaccine distribution post-mortems, cold-chain pharmaceutical management

Prompt

Design a county-level vaccine distribution system from central storage to patient arms across 8 clinics.

Functional:
- Doses arrive in shipments at central freezer (-70C). Thawed doses have 6-hour viable window once moved to clinic refrigerators.
- Clinics submit daily forecasts; central scheduler ships doses to match. Patient appointments are pre-booked.
- Each clinic has 4 vaccinator stations, observation area (15-min post-shot wait), and a daily appointment cap.
- Walk-ins accepted only at end-of-day to use thawed doses that would otherwise expire.

Non-functional:
- A high-priority surge (4x demand for an eligible age group) must NOT cause cold-chain violations. Shipment cadence adapts.
- If a clinic refrigerator fails, doses must be redistributed to nearby clinics within the 6-hour window, NOT used past expiration.
- If appointment no-show rate spikes, walk-in protocol must absorb to prevent waste WITHOUT skipping the eligibility check.

Return a Chinilla CanvasState. Components: freezer, scheduler, clinics, stations, walk-in protocol. Behaviors: queue (appointment book), batch (shipment cadence), ratelimit (daily caps), circuitbreaker (refrigerator failover), split (priority vs walk-in routing), filter (eligibility check).

Constraints

Max components
13
Required behaviors
queue, ratelimit, circuitbreaker
Monthly budget
$250000

Stress scenarios

Steady week

baseline

Normal demand, clinics operating, cold chain intact.

Eligibility expansion

spike

New age group eligible, demand 4x. Shipments must adapt without cold-chain violation.

Clinic refrigerator fails

outage

One clinic loses cold storage. Doses must move within 6 hours or be wasted.

Late shipment from freezer

latency

Central freezer shipment delayed. Clinic schedules must hold without dumping appointments.

Pass criteria (overall)

Min stability score
60
Max drop rate
8.0%
Min delivery rate
88.0%
Max errors
6

Submit your run

Submissions go through the chini-bench CLI. It calls your model with your key, scores the result locally, and posts to the leaderboard. Nothing leaves your machine except the canvas it produces.

End-to-end:
pip install git+https://github.com/collapseindex/chini-bench-cli.git
export OPENROUTER_API_KEY=...

chini-bench run chini-019-vaccine-rollout \
  --provider openrouter --model google/gemini-2.0-flash-001 \
  --as alice
Or inspect the prompt first:
chini-bench prompt chini-019-vaccine-rollout
Providers: openai · anthropic · google · openrouter · ollama

Leaderboard

Rank Submitter Model Score Stability Delivery Design Pass
#1 rl_v06_run2
rl_policy
custom single-shot
89 86.0 100.0 75.0
#2 rl_v06_run2
rl_policy
custom single-shot
88 75.0 100.0 100.0
#3 rl_v06_run1
rl_policy
custom single-shot
86 84.0 97.0 75.0
#4 rl_v06_run2
rl_policy
custom single-shot
86 83.0 100.0 75.0
#5 rl_v06_run2
rl_policy
custom single-shot
86 72.0 100.0 100.0
#6 rl_v06_run2
rl_policy
custom single-shot
86 86.0 100.0 50.0
#7 rl_v06_run2
rl_policy
custom single-shot
86 71.0 100.0 100.0
#8 alex
x-ai/grok-4.20
default reflexion
85 69.0 100.0 100.0
#9 rl_v06_run1
rl_policy
custom single-shot
85 85.0 100.0 50.0
#10 rl_v06_run1
rl_policy
custom single-shot
83 74.0 87.0 85.0
#11 rl_v06_run2
rl_policy
custom single-shot
83 66.0 100.0 85.0
#12 alex
google/gemini-3.1-pro-preview
default single-shot
82 64.0 100.0 100.0
#13 rl_v06_run2
rl_policy
custom single-shot
82 83.0 90.0 75.0
#14 rl_v06_run1
rl_policy
custom single-shot
81 83.0 83.0 75.0
#15 rl_v06_run2
rl_policy
custom single-shot
81 62.0 99.0 100.0
#16 alex
google/gemini-3.1-pro-preview
default reflexion
80 69.0 84.0 100.0
#17 rl_v06_run1
rl_policy
custom single-shot
80 83.0 100.0 50.0
#18 rl_v06_run2
rl_policy
custom single-shot
80 79.0 67.0 100.0
#19 rl_v06_run2
rl_policy
custom single-shot
79 76.0 88.0 75.0
#20 rl_v06_run2
rl_policy
custom single-shot
79 65.0 100.0 85.0
#21 rl_v06_run2
rl_policy
custom single-shot
78 68.0 81.0 100.0
#22 rl_v06_run2
rl_policy
custom single-shot
75 64.0 100.0 75.0
#23 rl_v06_run2
rl_policy
custom single-shot
75 72.0 100.0 50.0
#24 alex
openai/gpt-5.4
default reflexion
74 59.0 100.0 100.0
#25 rl_v06_run1
rl_policy
custom single-shot
74 54.0 99.0 100.0
#26 rl_v06_run1
rl_policy
custom single-shot
74 79.0 92.0 50.0
#27 rl_v06_run1
rl_policy
custom single-shot
74 71.0 100.0 50.0
#28 rl_v06_run2
rl_policy
custom single-shot
74 62.0 75.0 85.0
#29 rl_v06_run2
rl_policy
custom single-shot
73 80.0 57.0 75.0
#30 rl_v06_run1
rl_policy
custom single-shot
72 70.0 100.0 50.0
#31 rl_v06_run1
rl_policy
custom single-shot
72 44.0 100.0 85.0
#32 rl_v06_run2
rl_policy
custom single-shot
71 49.0 100.0 75.0
#33 rl_v06_run2
rl_policy
custom single-shot
71 66.0 60.0 85.0
#34 rl_v06_run2
rl_policy
custom single-shot
70 57.0 73.0 100.0
#35 rl_v06_run2
rl_policy
custom single-shot
65 58.0 65.0 75.0
#36 rl_v06_run2
rl_policy
custom single-shot
65 53.0 60.0 100.0
#37 rl_v06_run2
rl_policy
custom single-shot
65 54.0 60.0 100.0
#38 rl_v06_run2
rl_policy
custom single-shot
65 66.0 53.0 60.0
#39 alex
anthropic/claude-sonnet-4.6
default single-shot
64 69.0 33.0 100.0
#40 rl_v06_run2
rl_policy
custom single-shot
62 69.0 25.0 100.0
#41 rl_v06_run2
rl_policy
custom single-shot
62 50.0 55.0 100.0
#42 rl_v06_run2
rl_policy
custom single-shot
62 66.0 31.0 100.0
#43 rl_v06_run2
rl_policy
custom single-shot
62 24.0 100.0 100.0
#44 rl_v06_run2
rl_policy
custom single-shot
59 46.0 52.0 100.0
#45 rl_v06_run1
rl_policy
custom single-shot
57 13.0 100.0 100.0
#46 rl_v06_run2
rl_policy
custom single-shot
57 22.0 100.0 100.0
#47 rl_v06_run2
rl_policy
custom single-shot
57 49.0 41.0 85.0
#48 rl_v06_run2
rl_policy
custom single-shot
55 52.0 31.0 100.0
#49 rl_v06_run2
rl_policy
custom single-shot
55 15.0 91.0 85.0
#50 alex
openai/gpt-5.4
default single-shot
50 0.0 100.0 100.0
#51 rl_v06_run1
rl_policy
custom single-shot
50 0.0 100.0 100.0
#52 rl_v06_run2
rl_policy
custom single-shot
50 0.0 100.0 100.0
#53 rl_v06_run1
rl_policy
custom single-shot
48 45.0 18.0 85.0
#54 rl_v06_run1
rl_policy
custom single-shot
46 0.0 100.0 75.0
#55 rl_v06_run2
rl_policy
custom single-shot
46 18.0 55.0 100.0
#56 rl_v06_run2
rl_policy
custom single-shot
45 4.0 75.0 100.0
#57 rl_v06_run2
rl_policy
custom single-shot
42 13.0 50.0 85.0
#58 rl_v06_run1
rl_policy
custom single-shot
39 0.0 62.0 100.0
#59 rl_v06_run2
rl_policy
custom single-shot
39 0.0 64.0 100.0
#60 rl_v06_run2
rl_policy
custom single-shot
35 0.0 50.0 100.0
#61 alex
x-ai/grok-4.20
default single-shot
29 0.0 31.0 100.0
#62 alex
anthropic/claude-sonnet-4.6
default reflexion
14 0.0 0.0 75.0
Per-scenario breakdown of the top run
Scenario Health Drop rate Delivered Pass
baseline 86.0 0.0% 96
demand-surge 85.0 0.0% 352
fridge-fail 86.0 0.0% 88
shipment-delay 86.0 0.0% 88