Launch special: 50% off Pro monthly with code LAUNCH50 Upgrade now
Skip to main content
← All problems
chini-012-energy-drink-habit

Quitting the Energy Drink Habit

A craving is a packet. Willpower is backpressure. Design the system that keeps you off the 4pm Red Bull.

Source: Behavioral psychology, habit-loop literature, and a personal problem the author refuses to discuss further

Prompt

Design a personal system to taper someone off a 3-can-per-day energy drink habit over 30 days without crashing their workday.

Functional:
- Cravings arrive throughout the day (model them as packets). Each craving must be routed to a healthy substitute (water, walk, snack, deep breath) OR, in capped quantity, an actual drink.
- A cap on real-drink consumption per day. The cap shrinks weekly.
- Triggers are tracked: 9am wake, 2pm crash, 8pm gym. Each trigger emits a craving packet.

Non-functional:
- A bad day (work stress 4x normal craving rate) must not blow the daily cap. The system rate-limits and substitutes.
- If the planned substitute is unavailable (out of LaCroix, gym closed), the system must fail gracefully to a different substitute, not directly to a drink.
- The system must not be so restrictive that the user just abandons it. Some real drinks are allowed; the goal is taper, not cold-turkey.

Return a Chinilla CanvasState. Components are routines, substitutes, the user, and the cap. Behaviors are the same primitives: queue (urge backlog), ratelimit (daily cap), circuitbreaker (substitute failover), retry (try substitute again before caving), storage (snack stash), split (route by craving intensity).

Constraints

Max components
10
Required behaviors
ratelimit, circuitbreaker, split
Monthly budget
$200

Stress scenarios

Normal day

baseline

Three trigger windows, baseline craving rate. System should keep within daily cap.

Bad work day

spike

Cravings 4x baseline. Cap must hold, substitutes must absorb the rest.

Primary substitute unavailable

outage

LaCroix stash is empty. System must reroute to walk/snack/breath, not collapse to a drink.

Walk takes longer than planned

latency

Substitute resolution time spikes (long meeting, no break). System must hold without dumping the queue.

Pass criteria (overall)

Min stability score
60
Max drop rate
15.0%
Min delivery rate
80.0%
Max errors
8

Submit your run

Submissions go through the chini-bench CLI. It calls your model with your key, scores the result locally, and posts to the leaderboard. Nothing leaves your machine except the canvas it produces.

End-to-end:
pip install git+https://github.com/collapseindex/chini-bench-cli.git
export OPENROUTER_API_KEY=...

chini-bench run chini-012-energy-drink-habit \
  --provider openrouter --model google/gemini-2.0-flash-001 \
  --as alice
Or inspect the prompt first:
chini-bench prompt chini-012-energy-drink-habit
Providers: openai · anthropic · google · openrouter · ollama

Leaderboard

Rank Submitter Model Score Stability Delivery Design Pass
#1 alex
openai/gpt-5.4
default single-shot
92 83.0 100.0 100.0
#2 rl_v06_run2
rl_policy
custom single-shot
89 81.0 96.0 60.0
#3 rl_v06_run2
rl_policy
custom single-shot
88 80.0 92.0 100.0
#4 rl_v06_run2
rl_policy
custom single-shot
87 73.0 100.0 75.0
#5 rl_v06_run2
rl_policy
custom single-shot
87 73.0 100.0 60.0
#6 rl_v06_run2
rl_policy
custom single-shot
87 80.0 89.0 60.0
#7 rl_v06_run2
rl_policy
custom single-shot
86 83.0 100.0 85.0
#8 rl_v06_run2
rl_policy
custom single-shot
85 72.0 100.0 75.0
#9 rl_v06_run1
rl_policy
custom single-shot
84 80.0 89.0 60.0
#10 rl_v06_run2
rl_policy
custom single-shot
84 68.0 100.0 85.0
#11 rl_v06_run2
rl_policy
custom single-shot
84 67.0 100.0 60.0
#12 rl_v06_run2
rl_policy
custom single-shot
84 79.0 83.0 60.0
#13 alex
google/gemini-3.1-pro-preview
default single-shot
83 66.0 100.0 100.0
#14 rl_v06_run2
rl_policy
custom single-shot
83 74.0 100.0 60.0
#15 rl_v06_run2
rl_policy
custom single-shot
83 71.0 90.0 75.0
#16 rl_v06_run2
rl_policy
custom single-shot
82 66.0 96.0 60.0
#17 rl_v06_run2
rl_policy
custom single-shot
82 69.0 93.0 60.0
#18 rl_v06_run2
rl_policy
custom single-shot
81 76.0 78.0 60.0
#19 rl_v06_run2
rl_policy
custom single-shot
81 75.0 77.0 60.0
#20 rl_v06_run2
rl_policy
custom single-shot
81 69.0 100.0 100.0
#21 rl_v06_run2
rl_policy
custom single-shot
81 73.0 83.0 100.0
#22 rl_v06_run2
rl_policy
custom single-shot
81 69.0 87.0 75.0
#23 rl_v06_run2
rl_policy
custom single-shot
80 71.0 82.0 75.0
#24 rl_v06_run1
rl_policy
custom single-shot
79 77.0 69.0 60.0
#25 rl_v06_run1
rl_policy
custom single-shot
78 83.0 92.0 50.0
#26 rl_v06_run2
rl_policy
custom single-shot
78 79.0 100.0 50.0
#27 rl_v06_run1
rl_policy
custom single-shot
77 53.0 100.0 75.0
#28 rl_v06_run2
rl_policy
custom single-shot
77 58.0 92.0 60.0
#29 rl_v06_run2
rl_policy
custom single-shot
77 84.0 62.0 60.0
#30 rl_v06_run2
rl_policy
custom single-shot
76 82.0 88.0 50.0
#31 rl_v06_run2
rl_policy
custom single-shot
75 60.0 96.0 60.0
#32 rl_v06_run2
rl_policy
custom single-shot
72 61.0 73.0 60.0
#33 rl_v06_run1
rl_policy
custom single-shot
71 65.0 61.0 60.0
#34 rl_v06_run2
rl_policy
custom single-shot
71 62.0 65.0 60.0
#35 rl_v06_run1
rl_policy
custom single-shot
70 68.0 100.0 50.0
#36 rl_v06_run2
rl_policy
custom single-shot
70 60.0 68.0 75.0
#37 alex
google/gemini-3.1-pro-preview
default reflexion
67 61.0 66.0 100.0
#38 rl_v06_run1
rl_policy
custom single-shot
66 43.0 100.0 75.0
#39 alex
x-ai/grok-4.20
default reflexion
65 53.0 73.0 100.0
#40 rl_v06_run1
rl_policy
custom single-shot
65 58.0 67.0 60.0
#41 alex
openai/gpt-5.4
default reflexion
63 38.0 100.0 100.0
#42 rl_v06_run2
rl_policy
custom single-shot
63 61.0 55.0 60.0
#43 rl_v06_run1
rl_policy
custom single-shot
59 43.0 57.0 75.0
#44 alex
x-ai/grok-4.20
default single-shot
58 19.0 94.0 100.0
#45 rl_v06_run1
rl_policy
custom single-shot
58 46.0 50.0 75.0
#46 rl_v06_run2
rl_policy
custom single-shot
57 47.0 45.0 60.0
#47 rl_v06_run1
rl_policy
custom single-shot
55 32.0 78.0 60.0
#48 rl_v06_run1
rl_policy
custom single-shot
53 43.0 51.0 100.0
#49 rl_v06_run2
rl_policy
custom single-shot
51 47.0 38.0 100.0
#50 rl_v06_run2
rl_policy
custom single-shot
50 38.0 35.0 60.0
#51 rl_v06_run2
rl_policy
custom single-shot
49 1.0 96.0 75.0
#52 rl_v06_run1
rl_policy
custom single-shot
40 0.0 100.0 60.0
#53 rl_v06_run2
rl_policy
custom single-shot
39 27.0 18.0 85.0
#54 alex
anthropic/claude-sonnet-4.6
default single-shot
29 18.0 0.0 75.0
#55 alex
anthropic/claude-sonnet-4.6
default reflexion
14 0.0 0.0 75.0
Per-scenario breakdown of the top run
Scenario Health Drop rate Delivered Pass
baseline 83.0 0.8% 372
stress-day 85.0 1.6% 1274
substitute-out 82.0 0.0% 192
delayed-relief 83.0 0.8% 372