Launch special: 50% off Pro monthly with code LAUNCH50 Upgrade now
Skip to main content
← All problems
chini-011-cafe-morning-rush

Cafe Morning Rush

One espresso machine, two baristas, a line out the door, and the milk steamer just died.

Source: Operations research, queueing theory, every barista who has ever worked a 7am shift

Prompt

Design the workflow for a small specialty cafe handling a 7am-9am rush.

Functional:
- Customers arrive at the register, place an order, and either take it to-go or sit down.
- Orders are routed to the bar (espresso drinks) or the kitchen pass (pastries, sandwiches).
- Drinks require the espresso machine (single shared resource) and a milk steamer for anything with milk.
- Completed orders are called out by name and handed off.

Non-functional:
- A 4x arrival burst at 8am must not cause customers to walk out (drop rate kept low).
- If the milk steamer fails mid-rush, milk drinks must reroute to a backup steamer or be politely refused before money changes hands. The shop cannot just stop serving.
- The single espresso machine must not become the bottleneck that backs up the entire queue. Batch where possible.

Return a Chinilla CanvasState. Components are people, machines, and physical stations. Behaviors are still the same primitives: queue (the line), retry (re-pull a bad shot), ratelimit (cap drink complexity at peak), circuitbreaker (steamer failover), storage (pastry case), split (drink vs food routing).

Constraints

Max components
12
Required behaviors
queue, circuitbreaker, split
Monthly budget
$18000

Stress scenarios

Steady morning

baseline

Normal arrival rate, no failures. Most orders are drinks, some food.

8am rush

spike

Arrival rate quadruples for the peak window. Line should not collapse.

Milk steamer fails

outage

Primary steamer dies mid-rush. Milk drinks must reroute or be rejected at order time.

Espresso machine warming up

latency

Espresso machine takes longer than usual per shot. Bar must absorb without backing up the register.

Pass criteria (overall)

Min stability score
65
Max drop rate
8.0%
Min delivery rate
88.0%
Max errors
6

Submit your run

Submissions go through the chini-bench CLI. It calls your model with your key, scores the result locally, and posts to the leaderboard. Nothing leaves your machine except the canvas it produces.

End-to-end:
pip install git+https://github.com/collapseindex/chini-bench-cli.git
export OPENROUTER_API_KEY=...

chini-bench run chini-011-cafe-morning-rush \
  --provider openrouter --model google/gemini-2.0-flash-001 \
  --as alice
Or inspect the prompt first:
chini-bench prompt chini-011-cafe-morning-rush
Providers: openai · anthropic · google · openrouter · ollama

Leaderboard

Rank Submitter Model Score Stability Delivery Design Pass
#1 rl_v06_run2
rl_policy
custom single-shot
91 80.0 100.0 85.0
#2 alex
anthropic/claude-sonnet-4.6
default single-shot
90 78.0 100.0 100.0
#3 alex
x-ai/grok-4.20
default single-shot
89 75.0 100.0 100.0
#4 alex
openai/gpt-5.4
default single-shot
89 82.0 100.0 100.0
#5 rl_v06_run1
rl_policy
custom single-shot
89 77.0 98.0 85.0
#6 rl_v06_run2
rl_policy
custom single-shot
87 82.0 100.0 50.0
#7 rl_v06_run2
rl_policy
custom single-shot
87 78.0 96.0 85.0
#8 alex
google/gemini-3.1-pro-preview
default reflexion
86 76.0 90.0 100.0
#9 rl_v06_run2
rl_policy
custom single-shot
86 76.0 90.0 85.0
#10 rl_v06_run2
rl_policy
custom single-shot
86 73.0 100.0 85.0
#11 rl_v06_run2
rl_policy
custom single-shot
86 77.0 90.0 85.0
#12 rl_v06_run2
rl_policy
custom single-shot
85 69.0 97.0 100.0
#13 rl_v06_run1
rl_policy
custom single-shot
84 75.0 100.0 60.0
#14 rl_v06_run2
rl_policy
custom single-shot
84 70.0 100.0 60.0
#15 rl_v06_run2
rl_policy
custom single-shot
84 81.0 87.0 100.0
#16 alex
x-ai/grok-4.20
default reflexion
82 61.0 100.0 100.0
#17 rl_v06_run2
rl_policy
custom single-shot
82 68.0 97.0 60.0
#18 rl_v06_run2
rl_policy
custom single-shot
82 74.0 81.0 100.0
#19 rl_v06_run2
rl_policy
custom single-shot
82 71.0 100.0 60.0
#20 rl_v06_run1
rl_policy
custom single-shot
81 68.0 100.0 100.0
#21 rl_v06_run1
rl_policy
custom single-shot
81 75.0 100.0 75.0
#22 rl_v06_run2
rl_policy
custom single-shot
81 63.0 100.0 75.0
#23 rl_v06_run2
rl_policy
custom single-shot
80 72.0 100.0 50.0
#24 rl_v06_run1
rl_policy
custom single-shot
79 79.0 88.0 50.0
#25 rl_v06_run1
rl_policy
custom single-shot
79 66.0 83.0 100.0
#26 rl_v06_run1
rl_policy
custom single-shot
78 71.0 73.0 85.0
#27 rl_v06_run2
rl_policy
custom single-shot
78 80.0 63.0 85.0
#28 rl_v06_run1
rl_policy
custom single-shot
77 69.0 75.0 85.0
#29 rl_v06_run2
rl_policy
custom single-shot
77 72.0 77.0 75.0
#30 rl_v06_run1
rl_policy
custom single-shot
76 60.0 91.0 85.0
#31 rl_v06_run2
rl_policy
custom single-shot
76 70.0 70.0 85.0
#32 rl_v06_run1
rl_policy
custom single-shot
75 61.0 78.0 85.0
#33 rl_v06_run2
rl_policy
custom single-shot
74 58.0 93.0 60.0
#34 rl_v06_run2
rl_policy
custom single-shot
74 73.0 83.0 50.0
#35 rl_v06_run2
rl_policy
custom single-shot
74 58.0 79.0 85.0
#36 rl_v06_run1
rl_policy
custom single-shot
73 47.0 92.0 85.0
#37 rl_v06_run2
rl_policy
custom single-shot
73 68.0 63.0 85.0
#38 rl_v06_run2
rl_policy
custom single-shot
69 57.0 67.0 85.0
#39 alex
google/gemini-3.1-pro-preview
default single-shot
68 49.0 73.0 100.0
#40 rl_v06_run2
rl_policy
custom single-shot
68 44.0 81.0 100.0
#41 rl_v06_run2
rl_policy
custom single-shot
68 55.0 73.0 60.0
#42 rl_v06_run2
rl_policy
custom single-shot
67 62.0 63.0 75.0
#43 rl_v06_run2
rl_policy
custom single-shot
67 46.0 74.0 100.0
#44 rl_v06_run2
rl_policy
custom single-shot
67 52.0 68.0 100.0
#45 rl_v06_run2
rl_policy
custom single-shot
66 72.0 40.0 85.0
#46 rl_v06_run2
rl_policy
custom single-shot
66 73.0 44.0 85.0
#47 rl_v06_run2
rl_policy
custom single-shot
66 59.0 63.0 75.0
#48 rl_v06_run2
rl_policy
custom single-shot
66 55.0 67.0 60.0
#49 rl_v06_run1
rl_policy
custom single-shot
64 32.0 84.0 100.0
#50 rl_v06_run2
rl_policy
custom single-shot
64 40.0 74.0 100.0
#51 rl_v06_run2
rl_policy
custom single-shot
64 61.0 46.0 85.0
#52 rl_v06_run1
rl_policy
custom single-shot
60 70.0 47.0 50.0
#53 alex
anthropic/claude-sonnet-4.6
default reflexion
56 8.0 100.0 100.0
#54 rl_v06_run1
rl_policy
custom single-shot
56 79.0 25.0 50.0
#55 rl_v06_run2
rl_policy
custom single-shot
55 36.0 53.0 100.0
#56 rl_v06_run2
rl_policy
custom single-shot
55 49.0 37.0 100.0
#57 alex
openai/gpt-5.4
default reflexion
53 0.0 93.0 100.0
#58 rl_v06_run2
rl_policy
custom single-shot
51 43.0 33.0 100.0
#59 rl_v06_run2
rl_policy
custom single-shot
50 50.0 22.0 85.0
Per-scenario breakdown of the top run
Scenario Health Drop rate Delivered Pass
baseline 83.0 0.4% 381
eight-am-spike 80.0 1.6% 1370
steamer-down 74.0 0.0% 293
slow-machine 83.0 0.5% 349