chini-013-pottery-studio

Pottery Studio Firing Schedule

Two kilns, twenty members, four firing stages, one electrical limit. Don't crack the work.

Source: Small ceramics studio operations, kiln scheduling literature, the author's friend who runs a studio in Brooklyn

Prompt

Design the firing pipeline for a member-run pottery studio.

Functional:
- Members drop greenware on the bisque shelf when ready. Each piece must move bisque -> glaze application (member work) -> glaze firing -> pickup.
- Two kilns: one bisque-only, one glaze-only. Each kiln cycle is 18 hours and runs only when full.
- Members can only glaze pieces that have been bisqued. The bisque shelf and glaze shelf are physically separate.
- Studio drops finished work on the pickup rack with member name.

Non-functional:
- A member-event week (4x normal output) must not cause kilns to skip safety underloading or members to walk out without their work.
- If one kiln fails, the other must NOT be reused for both stages (cross-contamination ruins glaze). The system rate-limits intake instead.
- The shared 200-amp panel cannot run both kilns at once. Schedule must enforce this.

Return a Chinilla CanvasState. Components are people, kilns, shelves, the schedule. Behaviors: queue (shelves), batch (kiln cycles), ratelimit (electrical cap), circuitbreaker (kiln failover refusal), split (bisque vs glaze routing).

Constraints

Max components: 12
Required behaviors: queue, batch, ratelimit
Monthly budget: $6000

Stress scenarios

Normal week

baseline

Steady member output, both kilns healthy.

Open studio event

spike

Member volume 4x for the week. Kilns must batch, shelf overflow must be prevented.

Bisque kiln fails

outage

Bisque kiln down. System must rate-limit intake, not reroute through glaze kiln.

Long cone-10 cycle

latency

Glaze firing extended for cone 10. Downstream shelf must absorb without dumping work.

Pass criteria (overall)

Min stability score: 60
Max drop rate: 10.0%
Min delivery rate: 85.0%
Max errors: 6

Submit your run

Submissions go through the chini-bench CLI. It calls your model with your key, scores the result locally, and posts to the leaderboard. Nothing leaves your machine except the canvas it produces.

End-to-end:

pip install git+https://github.com/collapseindex/chini-bench-cli.git
export OPENROUTER_API_KEY=...

chini-bench run chini-013-pottery-studio \
  --provider openrouter --model google/gemini-2.0-flash-001 \
  --as alice

Or inspect the prompt first:

chini-bench prompt chini-013-pottery-studio

Providers: openai · anthropic · google · openrouter · ollama

Leaderboard

Rank	Submitter	Model	Score	Stability	Delivery	Design	Pass
#1	rl_v06_run2	rl_policy custom single-shot	91	82.0	96.0	75.0	✓
#2	rl_v06_run2	rl_policy custom single-shot	90	83.0	92.0	75.0	✓
#3	rl_v06_run2	rl_policy custom single-shot	90	83.0	92.0	75.0	✓
#4	rl_v06_run1	rl_policy custom single-shot	89	84.0	88.0	75.0	✗
#5	rl_v06_run2	rl_policy custom single-shot	89	84.0	88.0	75.0	✗
#6	rl_v06_run2	rl_policy custom single-shot	89	84.0	88.0	75.0	✗
#7	rl_v06_run2	rl_policy custom single-shot	89	84.0	88.0	75.0	✗
#8	rl_v06_run2	rl_policy custom single-shot	88	83.0	88.0	75.0	✗
#9	rl_v06_run2	rl_policy custom single-shot	87	71.0	100.0	75.0	✓
#10	rl_v06_run1	rl_policy custom single-shot	86	83.0	83.0	75.0	✗
#11	rl_v06_run1	rl_policy custom single-shot	85	80.0	83.0	85.0	✗
#12	rl_v06_run2	rl_policy custom single-shot	85	80.0	82.0	85.0	✗
#13	rl_v06_run2	rl_policy custom single-shot	84	66.0	97.0	85.0	✗
#14	rl_v06_run2	rl_policy custom single-shot	84	83.0	75.0	75.0	✗
#15	rl_v06_run2	rl_policy custom single-shot	80	74.0	76.0	85.0	✗
#16	rl_v06_run2	rl_policy custom single-shot	79	73.0	74.0	85.0	✗
#17	rl_v06_run2	rl_policy custom single-shot	78	72.0	72.0	85.0	✗
#18	rl_v06_run2	rl_policy custom single-shot	77	71.0	72.0	85.0	✗
#19	rl_v06_run2	rl_policy custom single-shot	77	71.0	71.0	85.0	✗
#20	rl_v06_run1	rl_policy custom single-shot	76	68.0	72.0	85.0	✗
#21	rl_v06_run1	rl_policy custom single-shot	75	81.0	75.0	50.0	✗
#22	rl_v06_run1	rl_policy custom single-shot	71	56.0	75.0	85.0	✗
#23	rl_v06_run2	rl_policy custom single-shot	71	63.0	66.0	85.0	✗
#24	rl_v06_run2	rl_policy custom single-shot	68	61.0	59.0	85.0	✗
#25	rl_v06_run2	rl_policy custom single-shot	64	57.0	53.0	85.0	✗
#26	rl_v06_run2	rl_policy custom single-shot	64	40.0	75.0	85.0	✗
#27	rl_v06_run2	rl_policy custom single-shot	64	49.0	64.0	85.0	✗
#28	rl_v06_run1	rl_policy custom single-shot	62	49.0	58.0	85.0	✗
#29	rl_v06_run1	rl_policy custom single-shot	61	52.0	49.0	75.0	✗
#30	alex	x-ai/grok-4.20 default reflexion	59	15.0	100.0	100.0	✗
#31	rl_v06_run1	rl_policy custom single-shot	59	41.0	58.0	75.0	✗
#32	rl_v06_run1	rl_policy custom single-shot	58	50.0	44.0	85.0	✗
#33	rl_v06_run2	rl_policy custom single-shot	54	37.0	49.0	85.0	✗
#34	alex	anthropic/claude-sonnet-4.6 default reflexion	51	0.0	100.0	100.0	✗
#35	rl_v06_run1	rl_policy custom single-shot	49	33.0	40.0	85.0	✗
#36	alex	openai/gpt-5.4 default single-shot	47	61.0	0.0	75.0	✗
#37	alex	google/gemini-3.1-pro-preview default reflexion	47	45.0	25.0	100.0	✗
#38	alex	google/gemini-3.1-pro-preview default single-shot	45	56.0	0.0	75.0	✗
#39	rl_v06_run2	rl_policy custom single-shot	45	29.0	35.0	85.0	✗
#40	alex	x-ai/grok-4.20 default single-shot	29	21.0	0.0	75.0	✗
#41	alex	anthropic/claude-sonnet-4.6 default single-shot	24	9.0	0.0	75.0	✗
#42	alex	openai/gpt-5.4 default reflexion	24	17.0	0.0	75.0	✗

Per-scenario breakdown of the top run

Scenario	Health	Drop rate	Delivered	Pass
baseline	84.0	0.0%	240	✓
member-event	83.0	0.0%	864	✓
kiln-down	76.0	0.0%	180	✓
slow-cycle	84.0	0.0%	240	✓

How is this scored? →