chini-017-couch-to-5k

Couch to 5K

Three runs a week, nine weeks, one knee that hurts on Wednesdays. Get to the 5K without quitting.

Source: Couch to 5K running program, behavior change literature, the eternal hope of the new year

Prompt

Design a personal running progression system to take a sedentary adult from zero to a 5K run in 9 weeks without injury or dropout.

Functional:
- Three runs per week. Each session has a target structure (walk/jog intervals, then continuous jog, then 5K continuous).
- Recovery days between runs. Sleep, hydration, soreness check before each session.
- Weekly progression: intervals lengthen, walks shorten. Milestones: first 10-min jog, first 20-min, first 5K.
- Optional cross-training (bike, swim) on recovery days when feeling good.

Non-functional:
- A bad-week event (work travel, sickness, life chaos = 4x friction) must not cause program abandonment. System auto-defers, repeats the prior week instead of skipping.
- If knee pain or excessive soreness reported, the session must downgrade (more walk, less jog) before the user gets injured. Not skip the run entirely (loss of habit).
- Missing two consecutive sessions must trigger a recovery week, not a guilt-driven catch-up double.

Return a Chinilla CanvasState. Components: scheduler, sessions, recovery checks, progression rules, fallback paths. Behaviors: queue (session backlog), ratelimit (max sessions/week), circuitbreaker (pain trigger downgrade), retry (repeat-week deferral), split (training vs recovery routing).

Constraints

Max components: 11
Required behaviors: ratelimit, circuitbreaker, split
Monthly budget: $30

Stress scenarios

Normal week

baseline

Three sessions land on schedule, recovery clean.

Travel + cold week

spike

Friction 4x normal. System must defer without breaking the program.

Knee pain reported

outage

Pain trigger fires. Session must downgrade, not be skipped outright.

Slow recovery between runs

latency

Soreness lingers. Scheduler must extend gaps without dumping the queue.

Pass criteria (overall)

Min stability score: 60
Max drop rate: 12.0%
Min delivery rate: 82.0%
Max errors: 6

Submit your run

Submissions go through the chini-bench CLI. It calls your model with your key, scores the result locally, and posts to the leaderboard. Nothing leaves your machine except the canvas it produces.

End-to-end:

pip install git+https://github.com/collapseindex/chini-bench-cli.git
export OPENROUTER_API_KEY=...

chini-bench run chini-017-couch-to-5k \
  --provider openrouter --model google/gemini-2.0-flash-001 \
  --as alice

Or inspect the prompt first:

chini-bench prompt chini-017-couch-to-5k

Providers: openai · anthropic · google · openrouter · ollama

Leaderboard

Rank	Submitter	Model	Score	Stability	Delivery	Design	Pass
#1	rl_v06_run2	rl_policy custom single-shot	82	75.0	100.0	75.0	✗
#2	rl_v06_run1	rl_policy custom single-shot	81	76.0	88.0	85.0	✗
#3	rl_v06_run1	rl_policy custom single-shot	81	78.0	86.0	60.0	✗
#4	rl_v06_run1	rl_policy custom single-shot	80	68.0	100.0	50.0	✗
#5	rl_v06_run2	rl_policy custom single-shot	80	83.0	100.0	50.0	✗
#6	rl_v06_run1	rl_policy custom single-shot	78	72.0	74.0	60.0	✗
#7	rl_v06_run2	rl_policy custom single-shot	78	55.0	100.0	60.0	✗
#8	rl_v06_run2	rl_policy custom single-shot	77	60.0	96.0	60.0	✗
#9	rl_v06_run2	rl_policy custom single-shot	77	66.0	93.0	75.0	✗
#10	rl_v06_run2	rl_policy custom single-shot	77	83.0	88.0	50.0	✗
#11	rl_v06_run2	rl_policy custom single-shot	77	73.0	80.0	85.0	✗
#12	alex	x-ai/grok-4.20 default single-shot	76	80.0	54.0	100.0	✗
#13	rl_v06_run1	rl_policy custom single-shot	76	74.0	100.0	50.0	✗
#14	rl_v06_run2	rl_policy custom single-shot	76	52.0	100.0	75.0	✗
#15	rl_v06_run2	rl_policy custom single-shot	76	75.0	85.0	75.0	✗
#16	rl_v06_run2	rl_policy custom single-shot	76	54.0	100.0	85.0	✗
#17	rl_v06_run2	rl_policy custom single-shot	75	50.0	100.0	75.0	✗
#18	rl_v06_run2	rl_policy custom single-shot	75	77.0	67.0	60.0	✗
#19	rl_v06_run1	rl_policy custom single-shot	74	70.0	100.0	50.0	✗
#20	rl_v06_run2	rl_policy custom single-shot	74	81.0	69.0	50.0	✗
#21	rl_v06_run2	rl_policy custom single-shot	74	73.0	84.0	60.0	✗
#22	rl_v06_run1	rl_policy custom single-shot	73	73.0	92.0	50.0	✗
#23	rl_v06_run2	rl_policy custom single-shot	73	53.0	88.0	100.0	✗
#24	rl_v06_run2	rl_policy custom single-shot	73	75.0	50.0	60.0	✗
#25	rl_v06_run2	rl_policy custom single-shot	73	70.0	73.0	60.0	✗
#26	rl_v06_run1	rl_policy custom single-shot	72	59.0	75.0	60.0	✗
#27	rl_v06_run1	rl_policy custom single-shot	72	81.0	75.0	50.0	✗
#28	alex	google/gemini-3.1-pro-preview default single-shot	70	39.0	100.0	100.0	✗
#29	rl_v06_run1	rl_policy custom single-shot	68	74.0	75.0	50.0	✗
#30	rl_v06_run2	rl_policy custom single-shot	68	62.0	69.0	60.0	✗
#31	rl_v06_run2	rl_policy custom single-shot	68	53.0	75.0	85.0	✗
#32	rl_v06_run2	rl_policy custom single-shot	68	83.0	58.0	50.0	✗
#33	rl_v06_run2	rl_policy custom single-shot	68	66.0	49.0	60.0	✗
#34	rl_v06_run1	rl_policy custom single-shot	67	60.0	85.0	75.0	✗
#35	rl_v06_run2	rl_policy custom single-shot	67	73.0	54.0	75.0	✗
#36	rl_v06_run2	rl_policy custom single-shot	67	67.0	59.0	85.0	✗
#37	rl_v06_run1	rl_policy custom single-shot	66	72.0	75.0	50.0	✗
#38	rl_v06_run2	rl_policy custom single-shot	66	34.0	100.0	60.0	✗
#39	rl_v06_run2	rl_policy custom single-shot	66	57.0	59.0	60.0	✗
#40	rl_v06_run2	rl_policy custom single-shot	63	68.0	67.0	50.0	✗
#41	rl_v06_run1	rl_policy custom single-shot	61	57.0	41.0	85.0	✗
#42	rl_v06_run2	rl_policy custom single-shot	59	56.0	48.0	75.0	✗
#43	rl_v06_run2	rl_policy custom single-shot	58	70.0	24.0	75.0	✗
#44	rl_v06_run2	rl_policy custom single-shot	57	54.0	46.0	75.0	✗
#45	rl_v06_run2	rl_policy custom single-shot	56	27.0	88.0	85.0	✗
#46	rl_v06_run2	rl_policy custom single-shot	56	68.0	44.0	50.0	✗
#47	rl_v06_run2	rl_policy custom single-shot	51	84.0	0.0	25.0	✗
#48	alex	openai/gpt-5.4 default single-shot	50	60.0	0.0	75.0	✗
#49	rl_v06_run1	rl_policy custom single-shot	50	0.0	100.0	100.0	✗
#50	alex	anthropic/claude-sonnet-4.6 default single-shot	49	58.0	0.0	75.0	✗
#51	rl_v06_run1	rl_policy custom single-shot	42	6.0	75.0	100.0	✗
#52	alex	google/gemini-3.1-pro-preview default reflexion	33	40.0	0.0	75.0	✗
#53	rl_v06_run2	rl_policy custom single-shot	27	0.0	35.0	85.0	✗
#54	alex	x-ai/grok-4.20 default reflexion	22	18.0	0.0	75.0	✗
#55	alex	openai/gpt-5.4 default reflexion	15	0.0	0.0	75.0	✗
#56	alex	anthropic/claude-sonnet-4.6 default reflexion	10	0.0	0.0	75.0	✗

Per-scenario breakdown of the top run

Scenario	Health	Drop rate	Delivered	Pass
baseline	76.0	4.1%	402	✓
bad-week	77.0	4.3%	1437	✓
knee-pain	67.0	4.8%	367	✓
long-recovery	78.0	5.2%	388	✓

How is this scored? →