chini-027-911-dispatch

911 Dispatch

Cardiac arrest at 9:01am, fender bender at 9:02am, fire at 9:03am. Three calls, two ambulances, one decision per second.

Source: Civic operations, priority queueing, emergency services literature

Prompt

Design the dispatch system for a metropolitan 911 center handling police, fire, and EMS.

Functional:
- Calls arrive via 911. Operator triages by type and severity.
- Severity tags: Priority 1 (life-threatening, dispatch immediately), Priority 2 (urgent, dispatch within 5 min), Priority 3 (routine, queue acceptable).
- Resources: ambulances, fire engines, police units. Each is in one of three states: available, dispatched, unavailable.
- Hospital diversion: when a hospital is at capacity, EMS must reroute to next-nearest, costing minutes.
- Mutual aid: when local resources exhaust, request from neighboring jurisdiction (slow, costly, last resort).

Non-functional:
- 90th percentile dispatch decision under 60 seconds. P1 calls must never queue behind P3.
- Mass casualty event (3-4x call surge) cannot starve baseline P1 response. Triage must adapt.
- Dispatch radio failure must fall back to phone or in-person without losing active assignments.
- A spurious call (prank, butt-dial) must not consume an ambulance.

Return a CanvasState modeling priority queueing, resource pools, hospital availability, and mutual-aid escalation.

Constraints

Max components: 14
Required behaviors: queue, circuitbreaker, ratelimit, split
Monthly budget: $380000

Stress scenarios

Normal night

baseline

Standard call volume, mix of P1/P2/P3, no failures.

Mass casualty event

spike

Multi-vehicle crash on freeway. 3.5x call volume in 20 minutes. P1 cannot queue.

Dispatch radio failure

outage

Primary radio system fails. Fall back to phone without losing active assignments.

Receiving hospital on divert

latency

Nearest hospital ED at capacity. EMS reroutes, adds transport time.

Pass criteria (overall)

Min stability score: 70
Max drop rate: 5.0%
Min delivery rate: 92.0%
Max errors: 5

Submit your run

Submissions go through the chini-bench CLI. It calls your model with your key, scores the result locally, and posts to the leaderboard. Nothing leaves your machine except the canvas it produces.

End-to-end:

pip install git+https://github.com/collapseindex/chini-bench-cli.git
export OPENROUTER_API_KEY=...

chini-bench run chini-027-911-dispatch \
  --provider openrouter --model google/gemini-2.0-flash-001 \
  --as alice

Or inspect the prompt first:

chini-bench prompt chini-027-911-dispatch

Providers: openai · anthropic · google · openrouter · ollama

Leaderboard

Rank	Submitter	Model	Score	Stability	Delivery	Design	Pass
#1	alex	x-ai/grok-4.20 default single-shot	87	75.0	83.0	100.0	✗
#2	alex	openai/gpt-5.4 default single-shot	85	61.0	100.0	100.0	✗
#3	alex	anthropic/claude-sonnet-4.6 default reflexion	81	64.0	100.0	100.0	✗
#4	alex	google/gemini-3.1-pro-preview default single-shot	74	43.0	75.0	100.0	✗
#5	alex	openai/gpt-5.4 default reflexion	65	17.0	100.0	100.0	✗
#6	alex	anthropic/claude-sonnet-4.6 default single-shot	61	6.0	75.0	100.0	✗
#7	alex	x-ai/grok-4.20 default reflexion	55	0.0	75.0	100.0	✗
#8	alex	google/gemini-3.1-pro-preview default reflexion	44	3.0	26.0	100.0	✗

Per-scenario breakdown of the top run

Scenario	Health	Drop rate	Delivered	Pass
baseline	76.0	0.6%	225	✓
mass-casualty	79.0	1.0%	637	✓
radio-down	67.0	0.7%	96	✗
hospital-divert	77.0	0.5%	184	✓

How is this scored? →