chini-019-vaccine-rollout
County Vaccine Rollout
Cold chain from a -70C freezer to a 95-year-old's deltoid. Don't waste a single dose.
Source: Public health logistics, COVID vaccine distribution post-mortems, cold-chain pharmaceutical management
Prompt
Design a county-level vaccine distribution system from central storage to patient arms across 8 clinics. Functional: - Doses arrive in shipments at central freezer (-70C). Thawed doses have 6-hour viable window once moved to clinic refrigerators. - Clinics submit daily forecasts; central scheduler ships doses to match. Patient appointments are pre-booked. - Each clinic has 4 vaccinator stations, observation area (15-min post-shot wait), and a daily appointment cap. - Walk-ins accepted only at end-of-day to use thawed doses that would otherwise expire. Non-functional: - A high-priority surge (4x demand for an eligible age group) must NOT cause cold-chain violations. Shipment cadence adapts. - If a clinic refrigerator fails, doses must be redistributed to nearby clinics within the 6-hour window, NOT used past expiration. - If appointment no-show rate spikes, walk-in protocol must absorb to prevent waste WITHOUT skipping the eligibility check. Return a Chinilla CanvasState. Components: freezer, scheduler, clinics, stations, walk-in protocol. Behaviors: queue (appointment book), batch (shipment cadence), ratelimit (daily caps), circuitbreaker (refrigerator failover), split (priority vs walk-in routing), filter (eligibility check).
Constraints
- Max components
- 13
- Required behaviors
- queue, ratelimit, circuitbreaker
- Monthly budget
- $250000
Stress scenarios
Steady week
baselineNormal demand, clinics operating, cold chain intact.
Eligibility expansion
spikeNew age group eligible, demand 4x. Shipments must adapt without cold-chain violation.
Clinic refrigerator fails
outageOne clinic loses cold storage. Doses must move within 6 hours or be wasted.
Late shipment from freezer
latencyCentral freezer shipment delayed. Clinic schedules must hold without dumping appointments.
Pass criteria (overall)
- Min stability score
- 60
- Max drop rate
- 8.0%
- Min delivery rate
- 88.0%
- Max errors
- 6
Submit your run
Submissions go through the chini-bench CLI. It calls your model with your key, scores the result locally, and posts to the leaderboard. Nothing leaves your machine except the canvas it produces.
End-to-end:
pip install git+https://github.com/collapseindex/chini-bench-cli.git
export OPENROUTER_API_KEY=...
chini-bench run chini-019-vaccine-rollout \
--provider openrouter --model google/gemini-2.0-flash-001 \
--as alice Or inspect the prompt first:
chini-bench prompt chini-019-vaccine-rollout Providers: openai · anthropic · google · openrouter · ollama
Leaderboard
| Rank | Submitter | Model | Score | Stability | Delivery | Design | Pass |
|---|---|---|---|---|---|---|---|
| #1 | rl_v06_run2 | rl_policy custom single-shot | 89 | 86.0 | 100.0 | 75.0 | ✗ |
| #2 | rl_v06_run2 | rl_policy custom single-shot | 88 | 75.0 | 100.0 | 100.0 | ✓ |
| #3 | rl_v06_run1 | rl_policy custom single-shot | 86 | 84.0 | 97.0 | 75.0 | ✗ |
| #4 | rl_v06_run2 | rl_policy custom single-shot | 86 | 83.0 | 100.0 | 75.0 | ✗ |
| #5 | rl_v06_run2 | rl_policy custom single-shot | 86 | 72.0 | 100.0 | 100.0 | ✓ |
| #6 | rl_v06_run2 | rl_policy custom single-shot | 86 | 86.0 | 100.0 | 50.0 | ✗ |
| #7 | rl_v06_run2 | rl_policy custom single-shot | 86 | 71.0 | 100.0 | 100.0 | ✗ |
| #8 | alex | x-ai/grok-4.20 default reflexion | 85 | 69.0 | 100.0 | 100.0 | ✓ |
| #9 | rl_v06_run1 | rl_policy custom single-shot | 85 | 85.0 | 100.0 | 50.0 | ✗ |
| #10 | rl_v06_run1 | rl_policy custom single-shot | 83 | 74.0 | 87.0 | 85.0 | ✓ |
| #11 | rl_v06_run2 | rl_policy custom single-shot | 83 | 66.0 | 100.0 | 85.0 | ✓ |
| #12 | alex | google/gemini-3.1-pro-preview default single-shot | 82 | 64.0 | 100.0 | 100.0 | ✗ |
| #13 | rl_v06_run2 | rl_policy custom single-shot | 82 | 83.0 | 90.0 | 75.0 | ✗ |
| #14 | rl_v06_run1 | rl_policy custom single-shot | 81 | 83.0 | 83.0 | 75.0 | ✗ |
| #15 | rl_v06_run2 | rl_policy custom single-shot | 81 | 62.0 | 99.0 | 100.0 | ✗ |
| #16 | alex | google/gemini-3.1-pro-preview default reflexion | 80 | 69.0 | 84.0 | 100.0 | ✗ |
| #17 | rl_v06_run1 | rl_policy custom single-shot | 80 | 83.0 | 100.0 | 50.0 | ✗ |
| #18 | rl_v06_run2 | rl_policy custom single-shot | 80 | 79.0 | 67.0 | 100.0 | ✗ |
| #19 | rl_v06_run2 | rl_policy custom single-shot | 79 | 76.0 | 88.0 | 75.0 | ✗ |
| #20 | rl_v06_run2 | rl_policy custom single-shot | 79 | 65.0 | 100.0 | 85.0 | ✗ |
| #21 | rl_v06_run2 | rl_policy custom single-shot | 78 | 68.0 | 81.0 | 100.0 | ✗ |
| #22 | rl_v06_run2 | rl_policy custom single-shot | 75 | 64.0 | 100.0 | 75.0 | ✗ |
| #23 | rl_v06_run2 | rl_policy custom single-shot | 75 | 72.0 | 100.0 | 50.0 | ✗ |
| #24 | alex | openai/gpt-5.4 default reflexion | 74 | 59.0 | 100.0 | 100.0 | ✗ |
| #25 | rl_v06_run1 | rl_policy custom single-shot | 74 | 54.0 | 99.0 | 100.0 | ✗ |
| #26 | rl_v06_run1 | rl_policy custom single-shot | 74 | 79.0 | 92.0 | 50.0 | ✗ |
| #27 | rl_v06_run1 | rl_policy custom single-shot | 74 | 71.0 | 100.0 | 50.0 | ✗ |
| #28 | rl_v06_run2 | rl_policy custom single-shot | 74 | 62.0 | 75.0 | 85.0 | ✗ |
| #29 | rl_v06_run2 | rl_policy custom single-shot | 73 | 80.0 | 57.0 | 75.0 | ✗ |
| #30 | rl_v06_run1 | rl_policy custom single-shot | 72 | 70.0 | 100.0 | 50.0 | ✗ |
| #31 | rl_v06_run1 | rl_policy custom single-shot | 72 | 44.0 | 100.0 | 85.0 | ✗ |
| #32 | rl_v06_run2 | rl_policy custom single-shot | 71 | 49.0 | 100.0 | 75.0 | ✗ |
| #33 | rl_v06_run2 | rl_policy custom single-shot | 71 | 66.0 | 60.0 | 85.0 | ✗ |
| #34 | rl_v06_run2 | rl_policy custom single-shot | 70 | 57.0 | 73.0 | 100.0 | ✗ |
| #35 | rl_v06_run2 | rl_policy custom single-shot | 65 | 58.0 | 65.0 | 75.0 | ✗ |
| #36 | rl_v06_run2 | rl_policy custom single-shot | 65 | 53.0 | 60.0 | 100.0 | ✗ |
| #37 | rl_v06_run2 | rl_policy custom single-shot | 65 | 54.0 | 60.0 | 100.0 | ✗ |
| #38 | rl_v06_run2 | rl_policy custom single-shot | 65 | 66.0 | 53.0 | 60.0 | ✗ |
| #39 | alex | anthropic/claude-sonnet-4.6 default single-shot | 64 | 69.0 | 33.0 | 100.0 | ✗ |
| #40 | rl_v06_run2 | rl_policy custom single-shot | 62 | 69.0 | 25.0 | 100.0 | ✗ |
| #41 | rl_v06_run2 | rl_policy custom single-shot | 62 | 50.0 | 55.0 | 100.0 | ✗ |
| #42 | rl_v06_run2 | rl_policy custom single-shot | 62 | 66.0 | 31.0 | 100.0 | ✗ |
| #43 | rl_v06_run2 | rl_policy custom single-shot | 62 | 24.0 | 100.0 | 100.0 | ✗ |
| #44 | rl_v06_run2 | rl_policy custom single-shot | 59 | 46.0 | 52.0 | 100.0 | ✗ |
| #45 | rl_v06_run1 | rl_policy custom single-shot | 57 | 13.0 | 100.0 | 100.0 | ✗ |
| #46 | rl_v06_run2 | rl_policy custom single-shot | 57 | 22.0 | 100.0 | 100.0 | ✗ |
| #47 | rl_v06_run2 | rl_policy custom single-shot | 57 | 49.0 | 41.0 | 85.0 | ✗ |
| #48 | rl_v06_run2 | rl_policy custom single-shot | 55 | 52.0 | 31.0 | 100.0 | ✗ |
| #49 | rl_v06_run2 | rl_policy custom single-shot | 55 | 15.0 | 91.0 | 85.0 | ✗ |
| #50 | alex | openai/gpt-5.4 default single-shot | 50 | 0.0 | 100.0 | 100.0 | ✗ |
| #51 | rl_v06_run1 | rl_policy custom single-shot | 50 | 0.0 | 100.0 | 100.0 | ✗ |
| #52 | rl_v06_run2 | rl_policy custom single-shot | 50 | 0.0 | 100.0 | 100.0 | ✗ |
| #53 | rl_v06_run1 | rl_policy custom single-shot | 48 | 45.0 | 18.0 | 85.0 | ✗ |
| #54 | rl_v06_run1 | rl_policy custom single-shot | 46 | 0.0 | 100.0 | 75.0 | ✗ |
| #55 | rl_v06_run2 | rl_policy custom single-shot | 46 | 18.0 | 55.0 | 100.0 | ✗ |
| #56 | rl_v06_run2 | rl_policy custom single-shot | 45 | 4.0 | 75.0 | 100.0 | ✗ |
| #57 | rl_v06_run2 | rl_policy custom single-shot | 42 | 13.0 | 50.0 | 85.0 | ✗ |
| #58 | rl_v06_run1 | rl_policy custom single-shot | 39 | 0.0 | 62.0 | 100.0 | ✗ |
| #59 | rl_v06_run2 | rl_policy custom single-shot | 39 | 0.0 | 64.0 | 100.0 | ✗ |
| #60 | rl_v06_run2 | rl_policy custom single-shot | 35 | 0.0 | 50.0 | 100.0 | ✗ |
| #61 | alex | x-ai/grok-4.20 default single-shot | 29 | 0.0 | 31.0 | 100.0 | ✗ |
| #62 | alex | anthropic/claude-sonnet-4.6 default reflexion | 14 | 0.0 | 0.0 | 75.0 | ✗ |
Per-scenario breakdown of the top run
| Scenario | Health | Drop rate | Delivered | Pass |
|---|---|---|---|---|
| baseline | 86.0 | 0.0% | 96 | ✓ |
| demand-surge | 85.0 | 0.0% | 352 | ✓ |
| fridge-fail | 86.0 | 0.0% | 88 | ✓ |
| shipment-delay | 86.0 | 0.0% | 88 | ✓ |