Launch special: 50% off Pro monthly with code LAUNCH50 Upgrade now
Skip to main content
← All problems
chini-017-couch-to-5k

Couch to 5K

Three runs a week, nine weeks, one knee that hurts on Wednesdays. Get to the 5K without quitting.

Source: Couch to 5K running program, behavior change literature, the eternal hope of the new year

Prompt

Design a personal running progression system to take a sedentary adult from zero to a 5K run in 9 weeks without injury or dropout.

Functional:
- Three runs per week. Each session has a target structure (walk/jog intervals, then continuous jog, then 5K continuous).
- Recovery days between runs. Sleep, hydration, soreness check before each session.
- Weekly progression: intervals lengthen, walks shorten. Milestones: first 10-min jog, first 20-min, first 5K.
- Optional cross-training (bike, swim) on recovery days when feeling good.

Non-functional:
- A bad-week event (work travel, sickness, life chaos = 4x friction) must not cause program abandonment. System auto-defers, repeats the prior week instead of skipping.
- If knee pain or excessive soreness reported, the session must downgrade (more walk, less jog) before the user gets injured. Not skip the run entirely (loss of habit).
- Missing two consecutive sessions must trigger a recovery week, not a guilt-driven catch-up double.

Return a Chinilla CanvasState. Components: scheduler, sessions, recovery checks, progression rules, fallback paths. Behaviors: queue (session backlog), ratelimit (max sessions/week), circuitbreaker (pain trigger downgrade), retry (repeat-week deferral), split (training vs recovery routing).

Constraints

Max components
11
Required behaviors
ratelimit, circuitbreaker, split
Monthly budget
$30

Stress scenarios

Normal week

baseline

Three sessions land on schedule, recovery clean.

Travel + cold week

spike

Friction 4x normal. System must defer without breaking the program.

Knee pain reported

outage

Pain trigger fires. Session must downgrade, not be skipped outright.

Slow recovery between runs

latency

Soreness lingers. Scheduler must extend gaps without dumping the queue.

Pass criteria (overall)

Min stability score
60
Max drop rate
12.0%
Min delivery rate
82.0%
Max errors
6

Submit your run

Submissions go through the chini-bench CLI. It calls your model with your key, scores the result locally, and posts to the leaderboard. Nothing leaves your machine except the canvas it produces.

End-to-end:
pip install git+https://github.com/collapseindex/chini-bench-cli.git
export OPENROUTER_API_KEY=...

chini-bench run chini-017-couch-to-5k \
  --provider openrouter --model google/gemini-2.0-flash-001 \
  --as alice
Or inspect the prompt first:
chini-bench prompt chini-017-couch-to-5k
Providers: openai · anthropic · google · openrouter · ollama

Leaderboard

Rank Submitter Model Score Stability Delivery Design Pass
#1 rl_v06_run2
rl_policy
custom single-shot
82 75.0 100.0 75.0
#2 rl_v06_run1
rl_policy
custom single-shot
81 76.0 88.0 85.0
#3 rl_v06_run1
rl_policy
custom single-shot
81 78.0 86.0 60.0
#4 rl_v06_run1
rl_policy
custom single-shot
80 68.0 100.0 50.0
#5 rl_v06_run2
rl_policy
custom single-shot
80 83.0 100.0 50.0
#6 rl_v06_run1
rl_policy
custom single-shot
78 72.0 74.0 60.0
#7 rl_v06_run2
rl_policy
custom single-shot
78 55.0 100.0 60.0
#8 rl_v06_run2
rl_policy
custom single-shot
77 60.0 96.0 60.0
#9 rl_v06_run2
rl_policy
custom single-shot
77 66.0 93.0 75.0
#10 rl_v06_run2
rl_policy
custom single-shot
77 83.0 88.0 50.0
#11 rl_v06_run2
rl_policy
custom single-shot
77 73.0 80.0 85.0
#12 alex
x-ai/grok-4.20
default single-shot
76 80.0 54.0 100.0
#13 rl_v06_run1
rl_policy
custom single-shot
76 74.0 100.0 50.0
#14 rl_v06_run2
rl_policy
custom single-shot
76 52.0 100.0 75.0
#15 rl_v06_run2
rl_policy
custom single-shot
76 75.0 85.0 75.0
#16 rl_v06_run2
rl_policy
custom single-shot
76 54.0 100.0 85.0
#17 rl_v06_run2
rl_policy
custom single-shot
75 50.0 100.0 75.0
#18 rl_v06_run2
rl_policy
custom single-shot
75 77.0 67.0 60.0
#19 rl_v06_run1
rl_policy
custom single-shot
74 70.0 100.0 50.0
#20 rl_v06_run2
rl_policy
custom single-shot
74 81.0 69.0 50.0
#21 rl_v06_run2
rl_policy
custom single-shot
74 73.0 84.0 60.0
#22 rl_v06_run1
rl_policy
custom single-shot
73 73.0 92.0 50.0
#23 rl_v06_run2
rl_policy
custom single-shot
73 53.0 88.0 100.0
#24 rl_v06_run2
rl_policy
custom single-shot
73 75.0 50.0 60.0
#25 rl_v06_run2
rl_policy
custom single-shot
73 70.0 73.0 60.0
#26 rl_v06_run1
rl_policy
custom single-shot
72 59.0 75.0 60.0
#27 rl_v06_run1
rl_policy
custom single-shot
72 81.0 75.0 50.0
#28 alex
google/gemini-3.1-pro-preview
default single-shot
70 39.0 100.0 100.0
#29 rl_v06_run1
rl_policy
custom single-shot
68 74.0 75.0 50.0
#30 rl_v06_run2
rl_policy
custom single-shot
68 62.0 69.0 60.0
#31 rl_v06_run2
rl_policy
custom single-shot
68 53.0 75.0 85.0
#32 rl_v06_run2
rl_policy
custom single-shot
68 83.0 58.0 50.0
#33 rl_v06_run2
rl_policy
custom single-shot
68 66.0 49.0 60.0
#34 rl_v06_run1
rl_policy
custom single-shot
67 60.0 85.0 75.0
#35 rl_v06_run2
rl_policy
custom single-shot
67 73.0 54.0 75.0
#36 rl_v06_run2
rl_policy
custom single-shot
67 67.0 59.0 85.0
#37 rl_v06_run1
rl_policy
custom single-shot
66 72.0 75.0 50.0
#38 rl_v06_run2
rl_policy
custom single-shot
66 34.0 100.0 60.0
#39 rl_v06_run2
rl_policy
custom single-shot
66 57.0 59.0 60.0
#40 rl_v06_run2
rl_policy
custom single-shot
63 68.0 67.0 50.0
#41 rl_v06_run1
rl_policy
custom single-shot
61 57.0 41.0 85.0
#42 rl_v06_run2
rl_policy
custom single-shot
59 56.0 48.0 75.0
#43 rl_v06_run2
rl_policy
custom single-shot
58 70.0 24.0 75.0
#44 rl_v06_run2
rl_policy
custom single-shot
57 54.0 46.0 75.0
#45 rl_v06_run2
rl_policy
custom single-shot
56 27.0 88.0 85.0
#46 rl_v06_run2
rl_policy
custom single-shot
56 68.0 44.0 50.0
#47 rl_v06_run2
rl_policy
custom single-shot
51 84.0 0.0 25.0
#48 alex
openai/gpt-5.4
default single-shot
50 60.0 0.0 75.0
#49 rl_v06_run1
rl_policy
custom single-shot
50 0.0 100.0 100.0
#50 alex
anthropic/claude-sonnet-4.6
default single-shot
49 58.0 0.0 75.0
#51 rl_v06_run1
rl_policy
custom single-shot
42 6.0 75.0 100.0
#52 alex
google/gemini-3.1-pro-preview
default reflexion
33 40.0 0.0 75.0
#53 rl_v06_run2
rl_policy
custom single-shot
27 0.0 35.0 85.0
#54 alex
x-ai/grok-4.20
default reflexion
22 18.0 0.0 75.0
#55 alex
openai/gpt-5.4
default reflexion
15 0.0 0.0 75.0
#56 alex
anthropic/claude-sonnet-4.6
default reflexion
10 0.0 0.0 75.0
Per-scenario breakdown of the top run
Scenario Health Drop rate Delivered Pass
baseline 76.0 4.1% 402
bad-week 77.0 4.3% 1437
knee-pain 67.0 4.8% 367
long-recovery 78.0 5.2% 388