Launch special: 50% off Pro monthly with code LAUNCH50 Upgrade now
Skip to main content
← All problems
chini-020-disaster-shelter

Disaster Shelter Intake

500 evacuees in 12 hours, finite cots, dietary restrictions, medical needs, families that must not be split.

Source: FEMA shelter operations, Red Cross intake protocols, post-Hurricane lessons learned

Prompt

Design the intake and resource allocation flow for a 500-person disaster shelter activated for a hurricane evacuation.

Functional:
- Evacuee arrives at the door. Intake records: family unit, medical needs, dietary restrictions, mobility status, pets.
- Routed to one of 4 sleeping zones: family, single adult, medical (oxygen/insulin/dialysis), accessibility.
- Resources: cots, blankets, meal service (3x daily), medical station, charging stations, pet area.
- Family units cannot be split across zones. Medical-need evacuees get priority for medical zone cots.

Non-functional:
- A late surge (4x arrival in the last 4 hours before the storm hits) must NOT cause families to be split or medical evacuees to be turned away.
- If meal service runs short on a dietary restriction (kosher, halal, allergen-free), the system must source from neighboring shelter or document the gap, NOT serve a non-compliant meal.
- If the medical zone hits capacity, scheduler must convert overflow space rather than turn away an insulin-dependent evacuee.

Return a Chinilla CanvasState. Components: intake desk, classifier, zones, meal service, medical station, overflow logic. Behaviors: split (zone routing), filter (dietary check), ratelimit (zone capacity), circuitbreaker (overflow conversion), queue (cot wait), batch (meal cadence).

Constraints

Max components
14
Required behaviors
split, filter, circuitbreaker
Monthly budget
$180000

Stress scenarios

Steady arrivals

baseline

Normal evacuee flow over 12 hours, mixed needs.

Pre-landfall surge

spike

Arrivals 4x in the final hours. Families must not be split, medical must not be turned away.

Medical zone full

outage

Medical zone at capacity. Overflow must be converted, not refused.

Halal meals short

outage

Dietary restriction supply low. Must source externally or document, not serve non-compliant.

Pass criteria (overall)

Min stability score
60
Max drop rate
8.0%
Min delivery rate
88.0%
Max errors
7

Submit your run

Submissions go through the chini-bench CLI. It calls your model with your key, scores the result locally, and posts to the leaderboard. Nothing leaves your machine except the canvas it produces.

End-to-end:
pip install git+https://github.com/collapseindex/chini-bench-cli.git
export OPENROUTER_API_KEY=...

chini-bench run chini-020-disaster-shelter \
  --provider openrouter --model google/gemini-2.0-flash-001 \
  --as alice
Or inspect the prompt first:
chini-bench prompt chini-020-disaster-shelter
Providers: openai · anthropic · google · openrouter · ollama

Leaderboard

Rank Submitter Model Score Stability Delivery Design Pass
#1 rl_v06_run1
rl_policy
custom single-shot
88 80.0 94.0 60.0
#2 rl_v06_run2
rl_policy
custom single-shot
88 79.0 96.0 85.0
#3 rl_v06_run1
rl_policy
custom single-shot
87 74.0 100.0 60.0
#4 rl_v06_run1
rl_policy
custom single-shot
87 80.0 89.0 60.0
#5 rl_v06_run1
rl_policy
custom single-shot
86 71.0 100.0 60.0
#6 rl_v06_run2
rl_policy
custom single-shot
86 80.0 100.0 85.0
#7 rl_v06_run1
rl_policy
custom single-shot
85 77.0 88.0 85.0
#8 rl_v06_run1
rl_policy
custom single-shot
84 73.0 98.0 85.0
#9 rl_v06_run2
rl_policy
custom single-shot
84 79.0 95.0 85.0
#10 rl_v06_run2
rl_policy
custom single-shot
84 80.0 93.0 60.0
#11 alex
openai/gpt-5.4
default single-shot
83 78.0 81.0 100.0
#12 alex
google/gemini-3.1-pro-preview
default single-shot
83 71.0 91.0 100.0
#13 rl_v06_run2
rl_policy
custom single-shot
83 73.0 100.0 85.0
#14 rl_v06_run2
rl_policy
custom single-shot
83 80.0 89.0 60.0
#15 rl_v06_run1
rl_policy
custom single-shot
82 80.0 87.0 85.0
#16 rl_v06_run2
rl_policy
custom single-shot
82 74.0 97.0 60.0
#17 rl_v06_run2
rl_policy
custom single-shot
82 79.0 88.0 60.0
#18 alex
anthropic/claude-sonnet-4.6
default single-shot
81 80.0 70.0 100.0
#19 rl_v06_run2
rl_policy
custom single-shot
81 67.0 96.0 60.0
#20 rl_v06_run2
rl_policy
custom single-shot
81 70.0 100.0 60.0
#21 rl_v06_run2
rl_policy
custom single-shot
81 77.0 87.0 60.0
#22 rl_v06_run1
rl_policy
custom single-shot
80 77.0 72.0 60.0
#23 rl_v06_run2
rl_policy
custom single-shot
80 80.0 78.0 60.0
#24 rl_v06_run1
rl_policy
custom single-shot
79 76.0 69.0 70.0
#25 rl_v06_run2
rl_policy
custom single-shot
79 66.0 86.0 85.0
#26 rl_v06_run2
rl_policy
custom single-shot
79 78.0 68.0 85.0
#27 rl_v06_run2
rl_policy
custom single-shot
79 78.0 66.0 70.0
#28 rl_v06_run2
rl_policy
custom single-shot
78 64.0 86.0 75.0
#29 rl_v06_run2
rl_policy
custom single-shot
77 79.0 72.0 60.0
#30 rl_v06_run2
rl_policy
custom single-shot
77 54.0 100.0 60.0
#31 rl_v06_run1
rl_policy
custom single-shot
76 70.0 87.0 85.0
#32 rl_v06_run1
rl_policy
custom single-shot
76 54.0 98.0 60.0
#33 rl_v06_run1
rl_policy
custom single-shot
76 73.0 94.0 50.0
#34 rl_v06_run1
rl_policy
custom single-shot
76 66.0 98.0 85.0
#35 rl_v06_run1
rl_policy
custom single-shot
76 77.0 71.0 60.0
#36 rl_v06_run2
rl_policy
custom single-shot
75 71.0 77.0 85.0
#37 rl_v06_run1
rl_policy
custom single-shot
74 78.0 51.0 60.0
#38 rl_v06_run2
rl_policy
custom single-shot
74 57.0 97.0 60.0
#39 rl_v06_run2
rl_policy
custom single-shot
73 69.0 75.0 85.0
#40 rl_v06_run1
rl_policy
custom single-shot
72 66.0 100.0 50.0
#41 rl_v06_run1
rl_policy
custom single-shot
71 68.0 100.0 50.0
#42 rl_v06_run1
rl_policy
custom single-shot
69 65.0 67.0 85.0
#43 rl_v06_run2
rl_policy
custom single-shot
69 65.0 67.0 60.0
#44 rl_v06_run2
rl_policy
custom single-shot
67 63.0 63.0 60.0
#45 alex
google/gemini-3.1-pro-preview
default reflexion
66 63.0 61.0 100.0
#46 rl_v06_run2
rl_policy
custom single-shot
65 57.0 66.0 60.0
#47 rl_v06_run2
rl_policy
custom single-shot
65 57.0 55.0 60.0
#48 rl_v06_run1
rl_policy
custom single-shot
64 71.0 67.0 50.0
#49 rl_v06_run2
rl_policy
custom single-shot
64 43.0 75.0 85.0
#50 alex
x-ai/grok-4.20
default reflexion
63 58.0 46.0 100.0
#51 rl_v06_run2
rl_policy
custom single-shot
61 60.0 50.0 85.0
#52 rl_v06_run2
rl_policy
custom single-shot
61 47.0 58.0 60.0
#53 rl_v06_run1
rl_policy
custom single-shot
59 21.0 94.0 100.0
#54 rl_v06_run2
rl_policy
custom single-shot
59 72.0 23.0 60.0
#55 alex
openai/gpt-5.4
default reflexion
57 26.0 100.0 100.0
#56 rl_v06_run1
rl_policy
custom single-shot
56 38.0 57.0 100.0
#57 rl_v06_run2
rl_policy
custom single-shot
56 62.0 15.0 60.0
#58 rl_v06_run2
rl_policy
custom single-shot
56 63.0 26.0 60.0
#59 alex
x-ai/grok-4.20
default single-shot
55 73.0 0.0 75.0
#60 rl_v06_run1
rl_policy
custom single-shot
55 17.0 100.0 60.0
#61 rl_v06_run2
rl_policy
custom single-shot
54 44.0 53.0 85.0
#62 rl_v06_run2
rl_policy
custom single-shot
54 39.0 48.0 100.0
#63 rl_v06_run2
rl_policy
custom single-shot
50 49.0 19.0 100.0
#64 rl_v06_run2
rl_policy
custom single-shot
49 52.0 10.0 70.0
#65 rl_v06_run2
rl_policy
custom single-shot
47 15.0 64.0 60.0
#66 rl_v06_run2
rl_policy
custom single-shot
46 22.0 64.0 75.0
#67 alex
anthropic/claude-sonnet-4.6
default reflexion
44 0.0 100.0 100.0
#68 rl_v06_run1
rl_policy
custom single-shot
44 0.0 91.0 85.0
#69 rl_v06_run2
rl_policy
custom single-shot
43 15.0 52.0 100.0
#70 rl_v06_run1
rl_policy
custom single-shot
42 21.0 82.0 60.0
Per-scenario breakdown of the top run
Scenario Health Drop rate Delivered Pass
baseline 84.0 1.0% 519
late-surge 83.0 1.3% 1894
medical-overflow 76.0 0.0% 308
meal-shortfall 76.0 0.0% 308