Launch special: 50% off Pro monthly with code LAUNCH50 Upgrade now
Skip to main content
← All problems
chini-013-pottery-studio

Pottery Studio Firing Schedule

Two kilns, twenty members, four firing stages, one electrical limit. Don't crack the work.

Source: Small ceramics studio operations, kiln scheduling literature, the author's friend who runs a studio in Brooklyn

Prompt

Design the firing pipeline for a member-run pottery studio.

Functional:
- Members drop greenware on the bisque shelf when ready. Each piece must move bisque -> glaze application (member work) -> glaze firing -> pickup.
- Two kilns: one bisque-only, one glaze-only. Each kiln cycle is 18 hours and runs only when full.
- Members can only glaze pieces that have been bisqued. The bisque shelf and glaze shelf are physically separate.
- Studio drops finished work on the pickup rack with member name.

Non-functional:
- A member-event week (4x normal output) must not cause kilns to skip safety underloading or members to walk out without their work.
- If one kiln fails, the other must NOT be reused for both stages (cross-contamination ruins glaze). The system rate-limits intake instead.
- The shared 200-amp panel cannot run both kilns at once. Schedule must enforce this.

Return a Chinilla CanvasState. Components are people, kilns, shelves, the schedule. Behaviors: queue (shelves), batch (kiln cycles), ratelimit (electrical cap), circuitbreaker (kiln failover refusal), split (bisque vs glaze routing).

Constraints

Max components
12
Required behaviors
queue, batch, ratelimit
Monthly budget
$6000

Stress scenarios

Normal week

baseline

Steady member output, both kilns healthy.

Open studio event

spike

Member volume 4x for the week. Kilns must batch, shelf overflow must be prevented.

Bisque kiln fails

outage

Bisque kiln down. System must rate-limit intake, not reroute through glaze kiln.

Long cone-10 cycle

latency

Glaze firing extended for cone 10. Downstream shelf must absorb without dumping work.

Pass criteria (overall)

Min stability score
60
Max drop rate
10.0%
Min delivery rate
85.0%
Max errors
6

Submit your run

Submissions go through the chini-bench CLI. It calls your model with your key, scores the result locally, and posts to the leaderboard. Nothing leaves your machine except the canvas it produces.

End-to-end:
pip install git+https://github.com/collapseindex/chini-bench-cli.git
export OPENROUTER_API_KEY=...

chini-bench run chini-013-pottery-studio \
  --provider openrouter --model google/gemini-2.0-flash-001 \
  --as alice
Or inspect the prompt first:
chini-bench prompt chini-013-pottery-studio
Providers: openai · anthropic · google · openrouter · ollama

Leaderboard

Rank Submitter Model Score Stability Delivery Design Pass
#1 rl_v06_run2
rl_policy
custom single-shot
91 82.0 96.0 75.0
#2 rl_v06_run2
rl_policy
custom single-shot
90 83.0 92.0 75.0
#3 rl_v06_run2
rl_policy
custom single-shot
90 83.0 92.0 75.0
#4 rl_v06_run1
rl_policy
custom single-shot
89 84.0 88.0 75.0
#5 rl_v06_run2
rl_policy
custom single-shot
89 84.0 88.0 75.0
#6 rl_v06_run2
rl_policy
custom single-shot
89 84.0 88.0 75.0
#7 rl_v06_run2
rl_policy
custom single-shot
89 84.0 88.0 75.0
#8 rl_v06_run2
rl_policy
custom single-shot
88 83.0 88.0 75.0
#9 rl_v06_run2
rl_policy
custom single-shot
87 71.0 100.0 75.0
#10 rl_v06_run1
rl_policy
custom single-shot
86 83.0 83.0 75.0
#11 rl_v06_run1
rl_policy
custom single-shot
85 80.0 83.0 85.0
#12 rl_v06_run2
rl_policy
custom single-shot
85 80.0 82.0 85.0
#13 rl_v06_run2
rl_policy
custom single-shot
84 66.0 97.0 85.0
#14 rl_v06_run2
rl_policy
custom single-shot
84 83.0 75.0 75.0
#15 rl_v06_run2
rl_policy
custom single-shot
80 74.0 76.0 85.0
#16 rl_v06_run2
rl_policy
custom single-shot
79 73.0 74.0 85.0
#17 rl_v06_run2
rl_policy
custom single-shot
78 72.0 72.0 85.0
#18 rl_v06_run2
rl_policy
custom single-shot
77 71.0 72.0 85.0
#19 rl_v06_run2
rl_policy
custom single-shot
77 71.0 71.0 85.0
#20 rl_v06_run1
rl_policy
custom single-shot
76 68.0 72.0 85.0
#21 rl_v06_run1
rl_policy
custom single-shot
75 81.0 75.0 50.0
#22 rl_v06_run1
rl_policy
custom single-shot
71 56.0 75.0 85.0
#23 rl_v06_run2
rl_policy
custom single-shot
71 63.0 66.0 85.0
#24 rl_v06_run2
rl_policy
custom single-shot
68 61.0 59.0 85.0
#25 rl_v06_run2
rl_policy
custom single-shot
64 57.0 53.0 85.0
#26 rl_v06_run2
rl_policy
custom single-shot
64 40.0 75.0 85.0
#27 rl_v06_run2
rl_policy
custom single-shot
64 49.0 64.0 85.0
#28 rl_v06_run1
rl_policy
custom single-shot
62 49.0 58.0 85.0
#29 rl_v06_run1
rl_policy
custom single-shot
61 52.0 49.0 75.0
#30 alex
x-ai/grok-4.20
default reflexion
59 15.0 100.0 100.0
#31 rl_v06_run1
rl_policy
custom single-shot
59 41.0 58.0 75.0
#32 rl_v06_run1
rl_policy
custom single-shot
58 50.0 44.0 85.0
#33 rl_v06_run2
rl_policy
custom single-shot
54 37.0 49.0 85.0
#34 alex
anthropic/claude-sonnet-4.6
default reflexion
51 0.0 100.0 100.0
#35 rl_v06_run1
rl_policy
custom single-shot
49 33.0 40.0 85.0
#36 alex
openai/gpt-5.4
default single-shot
47 61.0 0.0 75.0
#37 alex
google/gemini-3.1-pro-preview
default reflexion
47 45.0 25.0 100.0
#38 alex
google/gemini-3.1-pro-preview
default single-shot
45 56.0 0.0 75.0
#39 rl_v06_run2
rl_policy
custom single-shot
45 29.0 35.0 85.0
#40 alex
x-ai/grok-4.20
default single-shot
29 21.0 0.0 75.0
#41 alex
anthropic/claude-sonnet-4.6
default single-shot
24 9.0 0.0 75.0
#42 alex
openai/gpt-5.4
default reflexion
24 17.0 0.0 75.0
Per-scenario breakdown of the top run
Scenario Health Drop rate Delivered Pass
baseline 84.0 0.0% 240
member-event 83.0 0.0% 864
kiln-down 76.0 0.0% 180
slow-cycle 84.0 0.0% 240