chini-015-er-triage

Emergency Department Triage

Five severity levels, finite beds, one CT scanner. The wrong queue means someone dies.

Source: Emergency medicine literature, ESI (Emergency Severity Index) protocol, hospital ops research

Prompt

Design the patient flow for a mid-size hospital emergency department.

Functional:
- Patient arrives at the door, screened by triage nurse, assigned ESI level 1-5 (1 = critical, 5 = minor).
- ESI 1-2 routes directly to a resus bay (4 beds). ESI 3-5 routes to a regular bay (12 beds) or fast-track (6 beds for ESI 4-5).
- Diagnostic resources: CT (1), X-ray (2), labs (shared). Shared across all bays.
- Disposition: admit (to inpatient floor, may board in ED if no upstream beds), discharge, or transfer.

Non-functional:
- A mass-casualty event (4x arrival rate) must NOT cause ESI 1-2 patients to wait. Lower-severity flow must be paced or diverted.
- If CT scanner is down, patients needing imaging must be queued for transport to imaging center, not blocked from triage.
- If inpatient floor is full, ED boarding cannot starve incoming critical patients of beds.

Return a Chinilla CanvasState. Components: door, triage, bays, resources, disposition. Behaviors: split (severity routing), queue (waiting room, boarding), ratelimit (low-severity pacing), circuitbreaker (CT failover), batch (lab orders).

Constraints

Max components: 14
Required behaviors: split, queue, ratelimit
Monthly budget: $1200000

Stress scenarios

Steady arrivals

baseline

Normal mix of ESI levels, all resources up.

Bus accident

spike

Arrivals 4x baseline, severity skewed high. ESI 1-2 must NOT wait.

CT scanner offline

outage

CT down for maintenance. Imaging needs reroute, triage must keep flowing.

Inpatient floor full

latency

Admits can't move upstairs, board in ED. Door must keep accepting critical patients.

Pass criteria (overall)

Min stability score: 60
Max drop rate: 10.0%
Min delivery rate: 85.0%
Max errors: 8

Submit your run

Submissions go through the chini-bench CLI. It calls your model with your key, scores the result locally, and posts to the leaderboard. Nothing leaves your machine except the canvas it produces.

End-to-end:

pip install git+https://github.com/collapseindex/chini-bench-cli.git
export OPENROUTER_API_KEY=...

chini-bench run chini-015-er-triage \
  --provider openrouter --model google/gemini-2.0-flash-001 \
  --as alice

Or inspect the prompt first:

chini-bench prompt chini-015-er-triage

Providers: openai · anthropic · google · openrouter · ollama

Leaderboard

Rank	Submitter	Model	Score	Stability	Delivery	Design	Pass
#1	rl_v06_run2	rl_policy custom single-shot	92	83.0	100.0	75.0	✓
#2	rl_v06_run2	rl_policy custom single-shot	90	79.0	100.0	85.0	✓
#3	rl_v06_run2	rl_policy custom single-shot	89	77.0	100.0	85.0	✓
#4	rl_v06_run1	rl_policy custom single-shot	88	83.0	100.0	50.0	✗
#5	rl_v06_run2	rl_policy custom single-shot	88	83.0	92.0	75.0	✗
#6	rl_v06_run2	rl_policy custom single-shot	88	80.0	94.0	85.0	✗
#7	rl_v06_run1	rl_policy custom single-shot	87	73.0	100.0	75.0	✓
#8	rl_v06_run1	rl_policy custom single-shot	87	81.0	100.0	60.0	✗
#9	rl_v06_run2	rl_policy custom single-shot	87	83.0	100.0	75.0	✗
#10	rl_v06_run2	rl_policy custom single-shot	87	74.0	100.0	85.0	✓
#11	rl_v06_run1	rl_policy custom single-shot	86	80.0	88.0	75.0	✗
#12	rl_v06_run1	rl_policy custom single-shot	86	75.0	100.0	75.0	✗
#13	rl_v06_run2	rl_policy custom single-shot	86	72.0	100.0	75.0	✓
#14	rl_v06_run2	rl_policy custom single-shot	86	71.0	100.0	85.0	✓
#15	rl_v06_run2	rl_policy custom single-shot	86	83.0	94.0	60.0	✗
#16	rl_v06_run1	rl_policy custom single-shot	85	72.0	95.0	75.0	✗
#17	rl_v06_run1	rl_policy custom single-shot	85	78.0	100.0	60.0	✗
#18	rl_v06_run2	rl_policy custom single-shot	85	79.0	98.0	60.0	✗
#19	rl_v06_run2	rl_policy custom single-shot	85	83.0	92.0	50.0	✗
#20	alex	anthropic/claude-sonnet-4.6 default single-shot	84	73.0	100.0	100.0	✗
#21	rl_v06_run1	rl_policy custom single-shot	84	67.0	100.0	85.0	✓
#22	rl_v06_run2	rl_policy custom single-shot	84	76.0	100.0	60.0	✗
#23	rl_v06_run1	rl_policy custom single-shot	83	74.0	98.0	60.0	✗
#24	rl_v06_run2	rl_policy custom single-shot	83	74.0	100.0	60.0	✗
#25	rl_v06_run2	rl_policy custom single-shot	83	73.0	100.0	60.0	✗
#26	rl_v06_run2	rl_policy custom single-shot	83	77.0	100.0	75.0	✗
#27	alex	openai/gpt-5.4 default single-shot	80	59.0	100.0	100.0	✗
#28	rl_v06_run1	rl_policy custom single-shot	80	68.0	100.0	60.0	✗
#29	rl_v06_run2	rl_policy custom single-shot	80	59.0	100.0	85.0	✗
#30	alex	x-ai/grok-4.20 default single-shot	79	66.0	100.0	100.0	✗
#31	rl_v06_run1	rl_policy custom single-shot	78	70.0	100.0	50.0	✗
#32	rl_v06_run1	rl_policy custom single-shot	75	70.0	65.0	75.0	✗
#33	rl_v06_run1	rl_policy custom single-shot	74	70.0	100.0	50.0	✗
#34	rl_v06_run2	rl_policy custom single-shot	74	79.0	60.0	60.0	✗
#35	rl_v06_run1	rl_policy custom single-shot	73	69.0	100.0	50.0	✗
#36	rl_v06_run2	rl_policy custom single-shot	73	56.0	82.0	85.0	✗
#37	rl_v06_run1	rl_policy custom single-shot	72	73.0	52.0	85.0	✗
#38	rl_v06_run2	rl_policy custom single-shot	72	67.0	73.0	60.0	✗
#39	rl_v06_run2	rl_policy custom single-shot	71	57.0	88.0	60.0	✗
#40	rl_v06_run2	rl_policy custom single-shot	71	73.0	47.0	85.0	✗
#41	rl_v06_run1	rl_policy custom single-shot	70	57.0	100.0	60.0	✗
#42	rl_v06_run2	rl_policy custom single-shot	69	73.0	55.0	60.0	✗
#43	rl_v06_run1	rl_policy custom single-shot	68	49.0	84.0	85.0	✗
#44	alex	openai/gpt-5.4 default reflexion	67	50.0	100.0	100.0	✗
#45	rl_v06_run2	rl_policy custom single-shot	67	42.0	100.0	75.0	✗
#46	rl_v06_run2	rl_policy custom single-shot	66	49.0	71.0	85.0	✗
#47	rl_v06_run2	rl_policy custom single-shot	64	71.0	42.0	60.0	✗
#48	rl_v06_run2	rl_policy custom single-shot	64	58.0	62.0	60.0	✗
#49	alex	google/gemini-3.1-pro-preview default reflexion	63	25.0	100.0	100.0	✗
#50	rl_v06_run1	rl_policy custom single-shot	63	78.0	39.0	50.0	✗
#51	rl_v06_run2	rl_policy custom single-shot	62	34.0	100.0	60.0	✗
#52	alex	google/gemini-3.1-pro-preview default single-shot	61	22.0	100.0	100.0	✗
#53	rl_v06_run2	rl_policy custom single-shot	61	61.0	49.0	60.0	✗
#54	rl_v06_run1	rl_policy custom single-shot	58	15.0	100.0	85.0	✗
#55	alex	x-ai/grok-4.20 default reflexion	56	44.0	61.0	100.0	✗
#56	rl_v06_run2	rl_policy custom single-shot	52	27.0	75.0	60.0	✗
#57	rl_v06_run2	rl_policy custom single-shot	52	19.0	100.0	60.0	✗
#58	rl_v06_run2	rl_policy custom single-shot	50	27.0	69.0	60.0	✗
#59	rl_v06_run2	rl_policy custom single-shot	46	10.0	70.0	85.0	✗
#60	alex	anthropic/claude-sonnet-4.6 default reflexion	44	0.0	100.0	100.0	✗
#61	rl_v06_run2	rl_policy custom single-shot	44	19.0	47.0	85.0	✗
#62	rl_v06_run2	rl_policy custom single-shot	44	71.0	0.0	60.0	✗
#63	rl_v06_run2	rl_policy custom single-shot	39	14.0	39.0	85.0	✗

Per-scenario breakdown of the top run

Scenario	Health	Drop rate	Delivered	Pass
baseline	85.0	0.0%	288	✓
mass-casualty	85.0	0.0%	1056	✓
ct-down	75.0	0.0%	220	✓
boarding	85.0	0.0%	264	✓

How is this scored? →