chini-011-cafe-morning-rush

Cafe Morning Rush

One espresso machine, two baristas, a line out the door, and the milk steamer just died.

Source: Operations research, queueing theory, every barista who has ever worked a 7am shift

Prompt

Design the workflow for a small specialty cafe handling a 7am-9am rush.

Functional:
- Customers arrive at the register, place an order, and either take it to-go or sit down.
- Orders are routed to the bar (espresso drinks) or the kitchen pass (pastries, sandwiches).
- Drinks require the espresso machine (single shared resource) and a milk steamer for anything with milk.
- Completed orders are called out by name and handed off.

Non-functional:
- A 4x arrival burst at 8am must not cause customers to walk out (drop rate kept low).
- If the milk steamer fails mid-rush, milk drinks must reroute to a backup steamer or be politely refused before money changes hands. The shop cannot just stop serving.
- The single espresso machine must not become the bottleneck that backs up the entire queue. Batch where possible.

Return a Chinilla CanvasState. Components are people, machines, and physical stations. Behaviors are still the same primitives: queue (the line), retry (re-pull a bad shot), ratelimit (cap drink complexity at peak), circuitbreaker (steamer failover), storage (pastry case), split (drink vs food routing).

Constraints

Max components: 12
Required behaviors: queue, circuitbreaker, split
Monthly budget: $18000

Stress scenarios

Steady morning

baseline

Normal arrival rate, no failures. Most orders are drinks, some food.

8am rush

spike

Arrival rate quadruples for the peak window. Line should not collapse.

Milk steamer fails

outage

Primary steamer dies mid-rush. Milk drinks must reroute or be rejected at order time.

Espresso machine warming up

latency

Espresso machine takes longer than usual per shot. Bar must absorb without backing up the register.

Pass criteria (overall)

Min stability score: 65
Max drop rate: 8.0%
Min delivery rate: 88.0%
Max errors: 6

Submit your run

Submissions go through the chini-bench CLI. It calls your model with your key, scores the result locally, and posts to the leaderboard. Nothing leaves your machine except the canvas it produces.

End-to-end:

pip install git+https://github.com/collapseindex/chini-bench-cli.git
export OPENROUTER_API_KEY=...

chini-bench run chini-011-cafe-morning-rush \
  --provider openrouter --model google/gemini-2.0-flash-001 \
  --as alice

Or inspect the prompt first:

chini-bench prompt chini-011-cafe-morning-rush

Providers: openai · anthropic · google · openrouter · ollama

Leaderboard

Rank	Submitter	Model	Score	Stability	Delivery	Design	Pass
#1	rl_v06_run2	rl_policy custom single-shot	91	80.0	100.0	85.0	✓
#2	alex	anthropic/claude-sonnet-4.6 default single-shot	90	78.0	100.0	100.0	✓
#3	alex	x-ai/grok-4.20 default single-shot	89	75.0	100.0	100.0	✓
#4	alex	openai/gpt-5.4 default single-shot	89	82.0	100.0	100.0	✗
#5	rl_v06_run1	rl_policy custom single-shot	89	77.0	98.0	85.0	✓
#6	rl_v06_run2	rl_policy custom single-shot	87	82.0	100.0	50.0	✗
#7	rl_v06_run2	rl_policy custom single-shot	87	78.0	96.0	85.0	✗
#8	alex	google/gemini-3.1-pro-preview default reflexion	86	76.0	90.0	100.0	✓
#9	rl_v06_run2	rl_policy custom single-shot	86	76.0	90.0	85.0	✓
#10	rl_v06_run2	rl_policy custom single-shot	86	73.0	100.0	85.0	✗
#11	rl_v06_run2	rl_policy custom single-shot	86	77.0	90.0	85.0	✓
#12	rl_v06_run2	rl_policy custom single-shot	85	69.0	97.0	100.0	✗
#13	rl_v06_run1	rl_policy custom single-shot	84	75.0	100.0	60.0	✗
#14	rl_v06_run2	rl_policy custom single-shot	84	70.0	100.0	60.0	✗
#15	rl_v06_run2	rl_policy custom single-shot	84	81.0	87.0	100.0	✗
#16	alex	x-ai/grok-4.20 default reflexion	82	61.0	100.0	100.0	✗
#17	rl_v06_run2	rl_policy custom single-shot	82	68.0	97.0	60.0	✗
#18	rl_v06_run2	rl_policy custom single-shot	82	74.0	81.0	100.0	✗
#19	rl_v06_run2	rl_policy custom single-shot	82	71.0	100.0	60.0	✗
#20	rl_v06_run1	rl_policy custom single-shot	81	68.0	100.0	100.0	✗
#21	rl_v06_run1	rl_policy custom single-shot	81	75.0	100.0	75.0	✗
#22	rl_v06_run2	rl_policy custom single-shot	81	63.0	100.0	75.0	✗
#23	rl_v06_run2	rl_policy custom single-shot	80	72.0	100.0	50.0	✗
#24	rl_v06_run1	rl_policy custom single-shot	79	79.0	88.0	50.0	✗
#25	rl_v06_run1	rl_policy custom single-shot	79	66.0	83.0	100.0	✗
#26	rl_v06_run1	rl_policy custom single-shot	78	71.0	73.0	85.0	✗
#27	rl_v06_run2	rl_policy custom single-shot	78	80.0	63.0	85.0	✗
#28	rl_v06_run1	rl_policy custom single-shot	77	69.0	75.0	85.0	✗
#29	rl_v06_run2	rl_policy custom single-shot	77	72.0	77.0	75.0	✗
#30	rl_v06_run1	rl_policy custom single-shot	76	60.0	91.0	85.0	✗
#31	rl_v06_run2	rl_policy custom single-shot	76	70.0	70.0	85.0	✗
#32	rl_v06_run1	rl_policy custom single-shot	75	61.0	78.0	85.0	✗
#33	rl_v06_run2	rl_policy custom single-shot	74	58.0	93.0	60.0	✗
#34	rl_v06_run2	rl_policy custom single-shot	74	73.0	83.0	50.0	✗
#35	rl_v06_run2	rl_policy custom single-shot	74	58.0	79.0	85.0	✗
#36	rl_v06_run1	rl_policy custom single-shot	73	47.0	92.0	85.0	✗
#37	rl_v06_run2	rl_policy custom single-shot	73	68.0	63.0	85.0	✗
#38	rl_v06_run2	rl_policy custom single-shot	69	57.0	67.0	85.0	✗
#39	alex	google/gemini-3.1-pro-preview default single-shot	68	49.0	73.0	100.0	✗
#40	rl_v06_run2	rl_policy custom single-shot	68	44.0	81.0	100.0	✗
#41	rl_v06_run2	rl_policy custom single-shot	68	55.0	73.0	60.0	✗
#42	rl_v06_run2	rl_policy custom single-shot	67	62.0	63.0	75.0	✗
#43	rl_v06_run2	rl_policy custom single-shot	67	46.0	74.0	100.0	✗
#44	rl_v06_run2	rl_policy custom single-shot	67	52.0	68.0	100.0	✗
#45	rl_v06_run2	rl_policy custom single-shot	66	72.0	40.0	85.0	✗
#46	rl_v06_run2	rl_policy custom single-shot	66	73.0	44.0	85.0	✗
#47	rl_v06_run2	rl_policy custom single-shot	66	59.0	63.0	75.0	✗
#48	rl_v06_run2	rl_policy custom single-shot	66	55.0	67.0	60.0	✗
#49	rl_v06_run1	rl_policy custom single-shot	64	32.0	84.0	100.0	✗
#50	rl_v06_run2	rl_policy custom single-shot	64	40.0	74.0	100.0	✗
#51	rl_v06_run2	rl_policy custom single-shot	64	61.0	46.0	85.0	✗
#52	rl_v06_run1	rl_policy custom single-shot	60	70.0	47.0	50.0	✗
#53	alex	anthropic/claude-sonnet-4.6 default reflexion	56	8.0	100.0	100.0	✗
#54	rl_v06_run1	rl_policy custom single-shot	56	79.0	25.0	50.0	✗
#55	rl_v06_run2	rl_policy custom single-shot	55	36.0	53.0	100.0	✗
#56	rl_v06_run2	rl_policy custom single-shot	55	49.0	37.0	100.0	✗
#57	alex	openai/gpt-5.4 default reflexion	53	0.0	93.0	100.0	✗
#58	rl_v06_run2	rl_policy custom single-shot	51	43.0	33.0	100.0	✗
#59	rl_v06_run2	rl_policy custom single-shot	50	50.0	22.0	85.0	✗

Per-scenario breakdown of the top run

Scenario	Health	Drop rate	Delivered	Pass
baseline	83.0	0.4%	381	✓
eight-am-spike	80.0	1.6%	1370	✓
steamer-down	74.0	0.0%	293	✓
slow-machine	83.0	0.5%	349	✓

How is this scored? →