chini-014-restaurant-friday-night
Restaurant Friday Night Service
Eight tables turning every 90 minutes. Three stations on the line. The walk-in just got delivered short on prep.
Source: Restaurant operations, brigade kitchen workflow, every line cook who has been in the weeds
Prompt
Design the front-of-house plus kitchen workflow for a 40-seat dinner restaurant on Friday night. Functional: - Hosts seat parties at 12 tables. Server takes order, fires it to the kitchen. - Kitchen splits orders by station: cold (salads, starters), hot (entrees), pastry (dessert). Each station has one cook. - Expediter assembles plates from stations, calls server for runner pickup. - Bar is parallel: cocktails fired direct from server, no kitchen path. Non-functional: - A 7pm rush (4x normal arrival) must not blow ticket times past 25 minutes for entrees. - If hot station falls behind, expediter must pace the cold/pastry stations to stay synchronized. Plates cannot be held under heat lamp more than 5 minutes. - If walk-in shorted on prep (one entree 86'd), server must be notified to update guests before food fires, not at pickup. Return a Chinilla CanvasState. Components: hosts, servers, line cooks, expediter, bar. Behaviors: queue (ticket rail), split (station routing), batch (table-by-table coursing), circuitbreaker (86'd item handling), ratelimit (table turn pace).
Constraints
- Max components
- 14
- Required behaviors
- queue, split, circuitbreaker
- Monthly budget
- $75000
Stress scenarios
Steady service
baselineTables turning at normal pace. All stations operating.
7pm rush
spikeArrivals 4x baseline. Kitchen must absorb without lamp-time violations.
Halibut 86'd mid-service
outageHot station out of a key entree. Server must intercept tickets before they fire.
Hot station behind
latencyHot station ticket time spikes. Expediter must pace cold/pastry to stay synced.
Pass criteria (overall)
- Min stability score
- 60
- Max drop rate
- 10.0%
- Min delivery rate
- 85.0%
- Max errors
- 8
Submit your run
Submissions go through the chini-bench CLI. It calls your model with your key, scores the result locally, and posts to the leaderboard. Nothing leaves your machine except the canvas it produces.
End-to-end:
pip install git+https://github.com/collapseindex/chini-bench-cli.git
export OPENROUTER_API_KEY=...
chini-bench run chini-014-restaurant-friday-night \
--provider openrouter --model google/gemini-2.0-flash-001 \
--as alice Or inspect the prompt first:
chini-bench prompt chini-014-restaurant-friday-night Providers: openai · anthropic · google · openrouter · ollama
Leaderboard
| Rank | Submitter | Model | Score | Stability | Delivery | Design | Pass |
|---|---|---|---|---|---|---|---|
| #1 | alex | x-ai/grok-4.20 default single-shot | 92 | 83.0 | 100.0 | 100.0 | ✓ |
| #2 | rl_v06_run1 | rl_policy custom single-shot | 88 | 83.0 | 94.0 | 75.0 | ✗ |
| #3 | rl_v06_run2 | rl_policy custom single-shot | 88 | 82.0 | 95.0 | 75.0 | ✗ |
| #4 | rl_v06_run2 | rl_policy custom single-shot | 87 | 80.0 | 95.0 | 75.0 | ✗ |
| #5 | rl_v06_run2 | rl_policy custom single-shot | 87 | 76.0 | 100.0 | 75.0 | ✗ |
| #6 | rl_v06_run2 | rl_policy custom single-shot | 85 | 83.0 | 100.0 | 50.0 | ✗ |
| #7 | rl_v06_run2 | rl_policy custom single-shot | 85 | 77.0 | 94.0 | 75.0 | ✗ |
| #8 | rl_v06_run2 | rl_policy custom single-shot | 85 | 75.0 | 97.0 | 75.0 | ✗ |
| #9 | rl_v06_run1 | rl_policy custom single-shot | 84 | 75.0 | 100.0 | 75.0 | ✗ |
| #10 | rl_v06_run2 | rl_policy custom single-shot | 84 | 73.0 | 90.0 | 85.0 | ✓ |
| #11 | rl_v06_run2 | rl_policy custom single-shot | 84 | 78.0 | 89.0 | 75.0 | ✗ |
| #12 | rl_v06_run2 | rl_policy custom single-shot | 84 | 68.0 | 96.0 | 100.0 | ✓ |
| #13 | rl_v06_run2 | rl_policy custom single-shot | 82 | 68.0 | 100.0 | 75.0 | ✗ |
| #14 | rl_v06_run2 | rl_policy custom single-shot | 81 | 68.0 | 100.0 | 75.0 | ✗ |
| #15 | rl_v06_run2 | rl_policy custom single-shot | 81 | 70.0 | 99.0 | 60.0 | ✗ |
| #16 | rl_v06_run2 | rl_policy custom single-shot | 80 | 60.0 | 94.0 | 85.0 | ✗ |
| #17 | rl_v06_run1 | rl_policy custom single-shot | 79 | 53.0 | 100.0 | 85.0 | ✗ |
| #18 | rl_v06_run2 | rl_policy custom single-shot | 79 | 66.0 | 83.0 | 85.0 | ✗ |
| #19 | rl_v06_run2 | rl_policy custom single-shot | 78 | 53.0 | 98.0 | 100.0 | ✗ |
| #20 | rl_v06_run2 | rl_policy custom single-shot | 77 | 59.0 | 98.0 | 75.0 | ✗ |
| #21 | rl_v06_run2 | rl_policy custom single-shot | 75 | 77.0 | 66.0 | 75.0 | ✗ |
| #22 | rl_v06_run2 | rl_policy custom single-shot | 74 | 48.0 | 100.0 | 100.0 | ✗ |
| #23 | rl_v06_run2 | rl_policy custom single-shot | 74 | 80.0 | 60.0 | 85.0 | ✗ |
| #24 | rl_v06_run1 | rl_policy custom single-shot | 71 | 68.0 | 64.0 | 75.0 | ✗ |
| #25 | rl_v06_run2 | rl_policy custom single-shot | 71 | 57.0 | 71.0 | 85.0 | ✗ |
| #26 | rl_v06_run2 | rl_policy custom single-shot | 71 | 56.0 | 73.0 | 85.0 | ✗ |
| #27 | rl_v06_run1 | rl_policy custom single-shot | 69 | 60.0 | 63.0 | 85.0 | ✗ |
| #28 | rl_v06_run2 | rl_policy custom single-shot | 69 | 49.0 | 85.0 | 100.0 | ✗ |
| #29 | rl_v06_run2 | rl_policy custom single-shot | 67 | 43.0 | 86.0 | 100.0 | ✗ |
| #30 | rl_v06_run2 | rl_policy custom single-shot | 67 | 32.0 | 100.0 | 85.0 | ✗ |
| #31 | rl_v06_run1 | rl_policy custom single-shot | 66 | 63.0 | 50.0 | 85.0 | ✗ |
| #32 | rl_v06_run1 | rl_policy custom single-shot | 66 | 81.0 | 55.0 | 50.0 | ✗ |
| #33 | rl_v06_run2 | rl_policy custom single-shot | 66 | 64.0 | 49.0 | 85.0 | ✗ |
| #34 | rl_v06_run2 | rl_policy custom single-shot | 64 | 67.0 | 55.0 | 75.0 | ✗ |
| #35 | alex | x-ai/grok-4.20 default reflexion | 63 | 53.0 | 62.0 | 100.0 | ✗ |
| #36 | alex | google/gemini-3.1-pro-preview default reflexion | 63 | 52.0 | 62.0 | 100.0 | ✗ |
| #37 | rl_v06_run2 | rl_policy custom single-shot | 63 | 57.0 | 58.0 | 85.0 | ✗ |
| #38 | rl_v06_run2 | rl_policy custom single-shot | 63 | 34.0 | 85.0 | 85.0 | ✗ |
| #39 | rl_v06_run1 | rl_policy custom single-shot | 62 | 54.0 | 57.0 | 60.0 | ✗ |
| #40 | alex | openai/gpt-5.4 default reflexion | 60 | 19.0 | 100.0 | 100.0 | ✗ |
| #41 | rl_v06_run1 | rl_policy custom single-shot | 59 | 20.0 | 93.0 | 85.0 | ✗ |
| #42 | rl_v06_run2 | rl_policy custom single-shot | 59 | 50.0 | 47.0 | 100.0 | ✗ |
| #43 | rl_v06_run1 | rl_policy custom single-shot | 58 | 53.0 | 39.0 | 85.0 | ✗ |
| #44 | rl_v06_run2 | rl_policy custom single-shot | 56 | 83.0 | 17.0 | 50.0 | ✗ |
| #45 | rl_v06_run2 | rl_policy custom single-shot | 56 | 48.0 | 42.0 | 85.0 | ✗ |
| #46 | rl_v06_run1 | rl_policy custom single-shot | 55 | 54.0 | 38.0 | 85.0 | ✗ |
| #47 | rl_v06_run1 | rl_policy custom single-shot | 52 | 50.0 | 33.0 | 85.0 | ✗ |
| #48 | rl_v06_run1 | rl_policy custom single-shot | 49 | 31.0 | 49.0 | 85.0 | ✗ |
| #49 | rl_v06_run2 | rl_policy custom single-shot | 49 | 38.0 | 33.0 | 85.0 | ✗ |
| #50 | rl_v06_run2 | rl_policy custom single-shot | 48 | 44.0 | 31.0 | 85.0 | ✗ |
| #51 | rl_v06_run2 | rl_policy custom single-shot | 46 | 29.0 | 44.0 | 60.0 | ✗ |
| #52 | rl_v06_run1 | rl_policy custom single-shot | 45 | 1.0 | 69.0 | 85.0 | ✗ |
| #53 | rl_v06_run2 | rl_policy custom single-shot | 44 | 34.0 | 33.0 | 100.0 | ✗ |
| #54 | rl_v06_run1 | rl_policy custom single-shot | 42 | 49.0 | 0.0 | 60.0 | ✗ |
| #55 | rl_v06_run2 | rl_policy custom single-shot | 39 | 2.0 | 53.0 | 85.0 | ✗ |
| #56 | alex | openai/gpt-5.4 default single-shot | 27 | 0.0 | 19.0 | 100.0 | ✗ |
| #57 | alex | anthropic/claude-sonnet-4.6 default single-shot | 20 | 0.0 | 0.0 | 75.0 | ✗ |
| #58 | alex | google/gemini-3.1-pro-preview default single-shot | 20 | 0.0 | 0.0 | 75.0 | ✗ |
| #59 | alex | anthropic/claude-sonnet-4.6 default reflexion | 14 | 0.0 | 0.0 | 75.0 | ✗ |
Per-scenario breakdown of the top run
| Scenario | Health | Drop rate | Delivered | Pass |
|---|---|---|---|---|
| baseline | 83.0 | 1.8% | 493 | ✓ |
| rush | 86.0 | 1.5% | 1836 | ✓ |
| item-86 | 77.0 | 1.1% | 364 | ✓ |
| slow-cook | 84.0 | 1.0% | 472 | ✓ |