chini-014-restaurant-friday-night

Restaurant Friday Night Service

Eight tables turning every 90 minutes. Three stations on the line. The walk-in just got delivered short on prep.

Source: Restaurant operations, brigade kitchen workflow, every line cook who has been in the weeds

Prompt

Design the front-of-house plus kitchen workflow for a 40-seat dinner restaurant on Friday night.

Functional:
- Hosts seat parties at 12 tables. Server takes order, fires it to the kitchen.
- Kitchen splits orders by station: cold (salads, starters), hot (entrees), pastry (dessert). Each station has one cook.
- Expediter assembles plates from stations, calls server for runner pickup.
- Bar is parallel: cocktails fired direct from server, no kitchen path.

Non-functional:
- A 7pm rush (4x normal arrival) must not blow ticket times past 25 minutes for entrees.
- If hot station falls behind, expediter must pace the cold/pastry stations to stay synchronized. Plates cannot be held under heat lamp more than 5 minutes.
- If walk-in shorted on prep (one entree 86'd), server must be notified to update guests before food fires, not at pickup.

Return a Chinilla CanvasState. Components: hosts, servers, line cooks, expediter, bar. Behaviors: queue (ticket rail), split (station routing), batch (table-by-table coursing), circuitbreaker (86'd item handling), ratelimit (table turn pace).

Constraints

Max components: 14
Required behaviors: queue, split, circuitbreaker
Monthly budget: $75000

Stress scenarios

Steady service

baseline

Tables turning at normal pace. All stations operating.

7pm rush

spike

Arrivals 4x baseline. Kitchen must absorb without lamp-time violations.

Halibut 86'd mid-service

outage

Hot station out of a key entree. Server must intercept tickets before they fire.

Hot station behind

latency

Hot station ticket time spikes. Expediter must pace cold/pastry to stay synced.

Pass criteria (overall)

Min stability score: 60
Max drop rate: 10.0%
Min delivery rate: 85.0%
Max errors: 8

Submit your run

Submissions go through the chini-bench CLI. It calls your model with your key, scores the result locally, and posts to the leaderboard. Nothing leaves your machine except the canvas it produces.

End-to-end:

pip install git+https://github.com/collapseindex/chini-bench-cli.git
export OPENROUTER_API_KEY=...

chini-bench run chini-014-restaurant-friday-night \
  --provider openrouter --model google/gemini-2.0-flash-001 \
  --as alice

Or inspect the prompt first:

chini-bench prompt chini-014-restaurant-friday-night

Providers: openai · anthropic · google · openrouter · ollama

Leaderboard

Rank	Submitter	Model	Score	Stability	Delivery	Design	Pass
#1	alex	x-ai/grok-4.20 default single-shot	92	83.0	100.0	100.0	✓
#2	rl_v06_run1	rl_policy custom single-shot	88	83.0	94.0	75.0	✗
#3	rl_v06_run2	rl_policy custom single-shot	88	82.0	95.0	75.0	✗
#4	rl_v06_run2	rl_policy custom single-shot	87	80.0	95.0	75.0	✗
#5	rl_v06_run2	rl_policy custom single-shot	87	76.0	100.0	75.0	✗
#6	rl_v06_run2	rl_policy custom single-shot	85	83.0	100.0	50.0	✗
#7	rl_v06_run2	rl_policy custom single-shot	85	77.0	94.0	75.0	✗
#8	rl_v06_run2	rl_policy custom single-shot	85	75.0	97.0	75.0	✗
#9	rl_v06_run1	rl_policy custom single-shot	84	75.0	100.0	75.0	✗
#10	rl_v06_run2	rl_policy custom single-shot	84	73.0	90.0	85.0	✓
#11	rl_v06_run2	rl_policy custom single-shot	84	78.0	89.0	75.0	✗
#12	rl_v06_run2	rl_policy custom single-shot	84	68.0	96.0	100.0	✓
#13	rl_v06_run2	rl_policy custom single-shot	82	68.0	100.0	75.0	✗
#14	rl_v06_run2	rl_policy custom single-shot	81	68.0	100.0	75.0	✗
#15	rl_v06_run2	rl_policy custom single-shot	81	70.0	99.0	60.0	✗
#16	rl_v06_run2	rl_policy custom single-shot	80	60.0	94.0	85.0	✗
#17	rl_v06_run1	rl_policy custom single-shot	79	53.0	100.0	85.0	✗
#18	rl_v06_run2	rl_policy custom single-shot	79	66.0	83.0	85.0	✗
#19	rl_v06_run2	rl_policy custom single-shot	78	53.0	98.0	100.0	✗
#20	rl_v06_run2	rl_policy custom single-shot	77	59.0	98.0	75.0	✗
#21	rl_v06_run2	rl_policy custom single-shot	75	77.0	66.0	75.0	✗
#22	rl_v06_run2	rl_policy custom single-shot	74	48.0	100.0	100.0	✗
#23	rl_v06_run2	rl_policy custom single-shot	74	80.0	60.0	85.0	✗
#24	rl_v06_run1	rl_policy custom single-shot	71	68.0	64.0	75.0	✗
#25	rl_v06_run2	rl_policy custom single-shot	71	57.0	71.0	85.0	✗
#26	rl_v06_run2	rl_policy custom single-shot	71	56.0	73.0	85.0	✗
#27	rl_v06_run1	rl_policy custom single-shot	69	60.0	63.0	85.0	✗
#28	rl_v06_run2	rl_policy custom single-shot	69	49.0	85.0	100.0	✗
#29	rl_v06_run2	rl_policy custom single-shot	67	43.0	86.0	100.0	✗
#30	rl_v06_run2	rl_policy custom single-shot	67	32.0	100.0	85.0	✗
#31	rl_v06_run1	rl_policy custom single-shot	66	63.0	50.0	85.0	✗
#32	rl_v06_run1	rl_policy custom single-shot	66	81.0	55.0	50.0	✗
#33	rl_v06_run2	rl_policy custom single-shot	66	64.0	49.0	85.0	✗
#34	rl_v06_run2	rl_policy custom single-shot	64	67.0	55.0	75.0	✗
#35	alex	x-ai/grok-4.20 default reflexion	63	53.0	62.0	100.0	✗
#36	alex	google/gemini-3.1-pro-preview default reflexion	63	52.0	62.0	100.0	✗
#37	rl_v06_run2	rl_policy custom single-shot	63	57.0	58.0	85.0	✗
#38	rl_v06_run2	rl_policy custom single-shot	63	34.0	85.0	85.0	✗
#39	rl_v06_run1	rl_policy custom single-shot	62	54.0	57.0	60.0	✗
#40	alex	openai/gpt-5.4 default reflexion	60	19.0	100.0	100.0	✗
#41	rl_v06_run1	rl_policy custom single-shot	59	20.0	93.0	85.0	✗
#42	rl_v06_run2	rl_policy custom single-shot	59	50.0	47.0	100.0	✗
#43	rl_v06_run1	rl_policy custom single-shot	58	53.0	39.0	85.0	✗
#44	rl_v06_run2	rl_policy custom single-shot	56	83.0	17.0	50.0	✗
#45	rl_v06_run2	rl_policy custom single-shot	56	48.0	42.0	85.0	✗
#46	rl_v06_run1	rl_policy custom single-shot	55	54.0	38.0	85.0	✗
#47	rl_v06_run1	rl_policy custom single-shot	52	50.0	33.0	85.0	✗
#48	rl_v06_run1	rl_policy custom single-shot	49	31.0	49.0	85.0	✗
#49	rl_v06_run2	rl_policy custom single-shot	49	38.0	33.0	85.0	✗
#50	rl_v06_run2	rl_policy custom single-shot	48	44.0	31.0	85.0	✗
#51	rl_v06_run2	rl_policy custom single-shot	46	29.0	44.0	60.0	✗
#52	rl_v06_run1	rl_policy custom single-shot	45	1.0	69.0	85.0	✗
#53	rl_v06_run2	rl_policy custom single-shot	44	34.0	33.0	100.0	✗
#54	rl_v06_run1	rl_policy custom single-shot	42	49.0	0.0	60.0	✗
#55	rl_v06_run2	rl_policy custom single-shot	39	2.0	53.0	85.0	✗
#56	alex	openai/gpt-5.4 default single-shot	27	0.0	19.0	100.0	✗
#57	alex	anthropic/claude-sonnet-4.6 default single-shot	20	0.0	0.0	75.0	✗
#58	alex	google/gemini-3.1-pro-preview default single-shot	20	0.0	0.0	75.0	✗
#59	alex	anthropic/claude-sonnet-4.6 default reflexion	14	0.0	0.0	75.0	✗

Per-scenario breakdown of the top run

Scenario	Health	Drop rate	Delivered	Pass
baseline	83.0	1.8%	493	✓
rush	86.0	1.5%	1836	✓
item-86	77.0	1.1%	364	✓
slow-cook	84.0	1.0%	472	✓

How is this scored? →