chini-003-twitter-timeline

Social Timeline (Twitter-style fanout)

Generate a personalized timeline for millions of users. Don't melt when a celebrity posts.

Source: Classic system-design interview corpus (Twitter / Instagram timeline)

Prompt

Design the timeline-generation system for a social network.

Functional:
- POST /tweet appends a post by a user.
- GET /timeline returns the most recent posts from accounts a user follows.
- Reads vastly outnumber writes (~100:1).

Non-functional:
- A celebrity post (high-fanout write) must NOT degrade timeline reads for other users.
- Survive a 10x read spike (a viral event).
- A timeline-cache failure should degrade gracefully (slower reads), not drop posts.

Return a Chinilla CanvasState. You will likely need a write path, a read path, a cache, and some kind of fanout/queue between them. Include a rate-limit behavior on the write side.

Constraints

Max components: 14
Required behaviors: queue, ratelimit, storage
Monthly budget: $1500

Stress scenarios

Baseline reads + writes

baseline

Mixed traffic at normal rates.

10x viral spike

spike

Read traffic explodes 10x. Reads must survive even if write path slows.

Timeline cache failure

outage

Cache layer drops out. Reads must fall back, not error.

Noisy network

cascade

Process-time jitter at 25%. Tests resilience to flaky internal calls.

Pass criteria (overall)

Min stability score: 70
Max drop rate: 5.0%
Min delivery rate: 90.0%
Max errors: 5

Submit your run

Submissions go through the chini-bench CLI. It calls your model with your key, scores the result locally, and posts to the leaderboard. Nothing leaves your machine except the canvas it produces.

End-to-end:

pip install git+https://github.com/collapseindex/chini-bench-cli.git
export OPENROUTER_API_KEY=...

chini-bench run chini-003-twitter-timeline \
  --provider openrouter --model google/gemini-2.0-flash-001 \
  --as alice

Or inspect the prompt first:

chini-bench prompt chini-003-twitter-timeline

Providers: openai · anthropic · google · openrouter · ollama

Leaderboard

Rank	Submitter	Model	Score	Stability	Delivery	Design	Pass
#1	rl_v06_run2	rl_policy custom single-shot	89	77.0	100.0	75.0	✗
#2	rl_v06_run2	rl_policy custom single-shot	87	74.0	100.0	75.0	✗
#3	rl_v06_run1	rl_policy custom single-shot	86	72.0	100.0	75.0	✗
#4	rl_v06_run2	rl_policy custom single-shot	86	81.0	84.0	75.0	✗
#5	rl_v06_run2	rl_policy custom single-shot	86	81.0	100.0	25.0	✗
#6	rl_v06_run2	rl_policy custom single-shot	85	69.0	100.0	75.0	✗
#7	rl_v06_run2	rl_policy custom single-shot	85	75.0	100.0	50.0	✗
#8	rl_v06_run2	rl_policy custom single-shot	85	83.0	85.0	75.0	✗
#9	rl_v06_run2	rl_policy custom single-shot	85	80.0	100.0	25.0	✗
#10	rl_v06_run2	rl_policy custom single-shot	85	80.0	83.0	75.0	✗
#11	rl_v06_run2	rl_policy custom single-shot	84	79.0	81.0	75.0	✗
#12	rl_v06_run2	rl_policy custom single-shot	84	80.0	88.0	75.0	✗
#13	rl_v06_run1	rl_policy custom single-shot	82	80.0	90.0	25.0	✗
#14	rl_v06_run2	rl_policy custom single-shot	82	74.0	92.0	50.0	✗
#15	rl_v06_run1	rl_policy custom single-shot	81	80.0	88.0	25.0	✗
#16	rl_v06_run2	rl_policy custom single-shot	81	66.0	100.0	50.0	✗
#17	rl_v06_run2	rl_policy custom single-shot	81	66.0	100.0	50.0	✗
#18	rl_v06_run2	rl_policy custom single-shot	81	81.0	84.0	25.0	✗
#19	rl_v06_run2	rl_policy custom single-shot	81	71.0	100.0	25.0	✗
#20	rl_v06_run2	rl_policy custom single-shot	81	77.0	75.0	75.0	✗
#21	rl_v06_run2	rl_policy custom single-shot	81	80.0	70.0	75.0	✗
#22	alex	x-ai/grok-4.20 default reflexion	80	79.0	68.0	100.0	✗
#23	rl_v06_run2	rl_policy custom single-shot	80	78.0	69.0	75.0	✗
#24	rl_v06_run2	rl_policy custom single-shot	80	79.0	68.0	75.0	✗
#25	rl_v06_run2	rl_policy custom single-shot	80	80.0	84.0	25.0	✗
#26	rl_v06_run2	rl_policy custom single-shot	80	79.0	68.0	75.0	✗
#27	rl_v06_run2	rl_policy custom single-shot	80	79.0	68.0	75.0	✗
#28	alex	google/gemini-3.1-pro-preview default single-shot	79	57.0	100.0	100.0	✗
#29	rl_v06_run1	rl_policy custom single-shot	79	79.0	83.0	35.0	✗
#30	rl_v06_run1	rl_policy custom single-shot	79	78.0	68.0	75.0	✗
#31	rl_v06_run1	rl_policy custom single-shot	79	78.0	68.0	75.0	✗
#32	rl_v06_run2	rl_policy custom single-shot	79	78.0	68.0	75.0	✗
#33	rl_v06_run2	rl_policy custom single-shot	79	67.0	100.0	25.0	✗
#34	rl_v06_run2	rl_policy custom single-shot	79	78.0	68.0	75.0	✗
#35	rl_v06_run2	rl_policy custom single-shot	79	77.0	75.0	75.0	✗
#36	alex	openai/gpt-5.4 default single-shot	78	60.0	100.0	100.0	✗
#37	rl_v06_run1	rl_policy custom single-shot	78	76.0	65.0	75.0	✗
#38	rl_v06_run2	rl_policy custom single-shot	78	76.0	65.0	75.0	✗
#39	rl_v06_run2	rl_policy custom single-shot	78	76.0	65.0	75.0	✗
#40	rl_v06_run2	rl_policy custom single-shot	78	77.0	65.0	75.0	✗
#41	rl_v06_run1	rl_policy custom single-shot	77	78.0	68.0	50.0	✗
#42	rl_v06_run1	rl_policy custom single-shot	77	79.0	82.0	25.0	✗
#43	rl_v06_run1	rl_policy custom single-shot	77	64.0	100.0	25.0	✗
#44	rl_v06_run2	rl_policy custom single-shot	77	78.0	68.0	50.0	✗
#45	rl_v06_run2	rl_policy custom single-shot	77	81.0	64.0	50.0	✗
#46	alex	anthropic/claude-sonnet-4.6 default single-shot	76	52.0	100.0	100.0	✗
#47	alex	google/gemini-3.1-pro-preview default reflexion	76	51.0	100.0	100.0	✗
#48	rl_v06_run1	rl_policy custom single-shot	76	77.0	65.0	50.0	✗
#49	rl_v06_run2	rl_policy custom single-shot	74	75.0	64.0	75.0	✗
#50	rl_v06_run2	rl_policy custom single-shot	74	78.0	68.0	25.0	✗
#51	rl_v06_run1	rl_policy custom single-shot	73	62.0	83.0	50.0	✗
#52	rl_v06_run1	rl_policy custom single-shot	73	77.0	66.0	25.0	✗
#53	rl_v06_run1	rl_policy custom single-shot	73	77.0	66.0	25.0	✗
#54	rl_v06_run2	rl_policy custom single-shot	73	76.0	65.0	25.0	✗
#55	rl_v06_run2	rl_policy custom single-shot	73	77.0	65.0	25.0	✗
#56	rl_v06_run2	rl_policy custom single-shot	73	77.0	66.0	25.0	✗
#57	alex	anthropic/claude-sonnet-4.6 default reflexion	68	44.0	100.0	100.0	✗
#58	rl_v06_run2	rl_policy custom single-shot	68	80.0	34.0	75.0	✗
#59	rl_v06_run1	rl_policy custom single-shot	67	62.0	54.0	75.0	✗
#60	rl_v06_run1	rl_policy custom single-shot	64	62.0	53.0	50.0	✗
#61	alex	x-ai/grok-4.20 default single-shot	42	16.0	47.0	100.0	✗
#62	alex	openai/gpt-5.4 default reflexion	24	9.0	0.0	75.0	✗

Per-scenario breakdown of the top run

Scenario	Health	Drop rate	Delivered	Pass
baseline	81.0	0.0%	280	✓
viral-spike	82.0	0.0%	2520	✓
cache-failure	72.0	0.0%	45	✓
fanout-jitter	73.0	5.3%	71	✗

How is this scored? →