Launch special: 50% off Pro monthly with code LAUNCH50 Upgrade now
Skip to main content
← All problems
chini-015-er-triage

Emergency Department Triage

Five severity levels, finite beds, one CT scanner. The wrong queue means someone dies.

Source: Emergency medicine literature, ESI (Emergency Severity Index) protocol, hospital ops research

Prompt

Design the patient flow for a mid-size hospital emergency department.

Functional:
- Patient arrives at the door, screened by triage nurse, assigned ESI level 1-5 (1 = critical, 5 = minor).
- ESI 1-2 routes directly to a resus bay (4 beds). ESI 3-5 routes to a regular bay (12 beds) or fast-track (6 beds for ESI 4-5).
- Diagnostic resources: CT (1), X-ray (2), labs (shared). Shared across all bays.
- Disposition: admit (to inpatient floor, may board in ED if no upstream beds), discharge, or transfer.

Non-functional:
- A mass-casualty event (4x arrival rate) must NOT cause ESI 1-2 patients to wait. Lower-severity flow must be paced or diverted.
- If CT scanner is down, patients needing imaging must be queued for transport to imaging center, not blocked from triage.
- If inpatient floor is full, ED boarding cannot starve incoming critical patients of beds.

Return a Chinilla CanvasState. Components: door, triage, bays, resources, disposition. Behaviors: split (severity routing), queue (waiting room, boarding), ratelimit (low-severity pacing), circuitbreaker (CT failover), batch (lab orders).

Constraints

Max components
14
Required behaviors
split, queue, ratelimit
Monthly budget
$1200000

Stress scenarios

Steady arrivals

baseline

Normal mix of ESI levels, all resources up.

Bus accident

spike

Arrivals 4x baseline, severity skewed high. ESI 1-2 must NOT wait.

CT scanner offline

outage

CT down for maintenance. Imaging needs reroute, triage must keep flowing.

Inpatient floor full

latency

Admits can't move upstairs, board in ED. Door must keep accepting critical patients.

Pass criteria (overall)

Min stability score
60
Max drop rate
10.0%
Min delivery rate
85.0%
Max errors
8

Submit your run

Submissions go through the chini-bench CLI. It calls your model with your key, scores the result locally, and posts to the leaderboard. Nothing leaves your machine except the canvas it produces.

End-to-end:
pip install git+https://github.com/collapseindex/chini-bench-cli.git
export OPENROUTER_API_KEY=...

chini-bench run chini-015-er-triage \
  --provider openrouter --model google/gemini-2.0-flash-001 \
  --as alice
Or inspect the prompt first:
chini-bench prompt chini-015-er-triage
Providers: openai · anthropic · google · openrouter · ollama

Leaderboard

Rank Submitter Model Score Stability Delivery Design Pass
#1 rl_v06_run2
rl_policy
custom single-shot
92 83.0 100.0 75.0
#2 rl_v06_run2
rl_policy
custom single-shot
90 79.0 100.0 85.0
#3 rl_v06_run2
rl_policy
custom single-shot
89 77.0 100.0 85.0
#4 rl_v06_run1
rl_policy
custom single-shot
88 83.0 100.0 50.0
#5 rl_v06_run2
rl_policy
custom single-shot
88 83.0 92.0 75.0
#6 rl_v06_run2
rl_policy
custom single-shot
88 80.0 94.0 85.0
#7 rl_v06_run1
rl_policy
custom single-shot
87 73.0 100.0 75.0
#8 rl_v06_run1
rl_policy
custom single-shot
87 81.0 100.0 60.0
#9 rl_v06_run2
rl_policy
custom single-shot
87 83.0 100.0 75.0
#10 rl_v06_run2
rl_policy
custom single-shot
87 74.0 100.0 85.0
#11 rl_v06_run1
rl_policy
custom single-shot
86 80.0 88.0 75.0
#12 rl_v06_run1
rl_policy
custom single-shot
86 75.0 100.0 75.0
#13 rl_v06_run2
rl_policy
custom single-shot
86 72.0 100.0 75.0
#14 rl_v06_run2
rl_policy
custom single-shot
86 71.0 100.0 85.0
#15 rl_v06_run2
rl_policy
custom single-shot
86 83.0 94.0 60.0
#16 rl_v06_run1
rl_policy
custom single-shot
85 72.0 95.0 75.0
#17 rl_v06_run1
rl_policy
custom single-shot
85 78.0 100.0 60.0
#18 rl_v06_run2
rl_policy
custom single-shot
85 79.0 98.0 60.0
#19 rl_v06_run2
rl_policy
custom single-shot
85 83.0 92.0 50.0
#20 alex
anthropic/claude-sonnet-4.6
default single-shot
84 73.0 100.0 100.0
#21 rl_v06_run1
rl_policy
custom single-shot
84 67.0 100.0 85.0
#22 rl_v06_run2
rl_policy
custom single-shot
84 76.0 100.0 60.0
#23 rl_v06_run1
rl_policy
custom single-shot
83 74.0 98.0 60.0
#24 rl_v06_run2
rl_policy
custom single-shot
83 74.0 100.0 60.0
#25 rl_v06_run2
rl_policy
custom single-shot
83 73.0 100.0 60.0
#26 rl_v06_run2
rl_policy
custom single-shot
83 77.0 100.0 75.0
#27 alex
openai/gpt-5.4
default single-shot
80 59.0 100.0 100.0
#28 rl_v06_run1
rl_policy
custom single-shot
80 68.0 100.0 60.0
#29 rl_v06_run2
rl_policy
custom single-shot
80 59.0 100.0 85.0
#30 alex
x-ai/grok-4.20
default single-shot
79 66.0 100.0 100.0
#31 rl_v06_run1
rl_policy
custom single-shot
78 70.0 100.0 50.0
#32 rl_v06_run1
rl_policy
custom single-shot
75 70.0 65.0 75.0
#33 rl_v06_run1
rl_policy
custom single-shot
74 70.0 100.0 50.0
#34 rl_v06_run2
rl_policy
custom single-shot
74 79.0 60.0 60.0
#35 rl_v06_run1
rl_policy
custom single-shot
73 69.0 100.0 50.0
#36 rl_v06_run2
rl_policy
custom single-shot
73 56.0 82.0 85.0
#37 rl_v06_run1
rl_policy
custom single-shot
72 73.0 52.0 85.0
#38 rl_v06_run2
rl_policy
custom single-shot
72 67.0 73.0 60.0
#39 rl_v06_run2
rl_policy
custom single-shot
71 57.0 88.0 60.0
#40 rl_v06_run2
rl_policy
custom single-shot
71 73.0 47.0 85.0
#41 rl_v06_run1
rl_policy
custom single-shot
70 57.0 100.0 60.0
#42 rl_v06_run2
rl_policy
custom single-shot
69 73.0 55.0 60.0
#43 rl_v06_run1
rl_policy
custom single-shot
68 49.0 84.0 85.0
#44 alex
openai/gpt-5.4
default reflexion
67 50.0 100.0 100.0
#45 rl_v06_run2
rl_policy
custom single-shot
67 42.0 100.0 75.0
#46 rl_v06_run2
rl_policy
custom single-shot
66 49.0 71.0 85.0
#47 rl_v06_run2
rl_policy
custom single-shot
64 71.0 42.0 60.0
#48 rl_v06_run2
rl_policy
custom single-shot
64 58.0 62.0 60.0
#49 alex
google/gemini-3.1-pro-preview
default reflexion
63 25.0 100.0 100.0
#50 rl_v06_run1
rl_policy
custom single-shot
63 78.0 39.0 50.0
#51 rl_v06_run2
rl_policy
custom single-shot
62 34.0 100.0 60.0
#52 alex
google/gemini-3.1-pro-preview
default single-shot
61 22.0 100.0 100.0
#53 rl_v06_run2
rl_policy
custom single-shot
61 61.0 49.0 60.0
#54 rl_v06_run1
rl_policy
custom single-shot
58 15.0 100.0 85.0
#55 alex
x-ai/grok-4.20
default reflexion
56 44.0 61.0 100.0
#56 rl_v06_run2
rl_policy
custom single-shot
52 27.0 75.0 60.0
#57 rl_v06_run2
rl_policy
custom single-shot
52 19.0 100.0 60.0
#58 rl_v06_run2
rl_policy
custom single-shot
50 27.0 69.0 60.0
#59 rl_v06_run2
rl_policy
custom single-shot
46 10.0 70.0 85.0
#60 alex
anthropic/claude-sonnet-4.6
default reflexion
44 0.0 100.0 100.0
#61 rl_v06_run2
rl_policy
custom single-shot
44 19.0 47.0 85.0
#62 rl_v06_run2
rl_policy
custom single-shot
44 71.0 0.0 60.0
#63 rl_v06_run2
rl_policy
custom single-shot
39 14.0 39.0 85.0
Per-scenario breakdown of the top run
Scenario Health Drop rate Delivered Pass
baseline 85.0 0.0% 288
mass-casualty 85.0 0.0% 1056
ct-down 75.0 0.0% 220
boarding 85.0 0.0% 264