Launch special: 50% off Pro monthly with code LAUNCH50 Upgrade now
Skip to main content
← All problems
chini-016-inbox-zero

Inbox Zero Maintenance

300 emails a day, three contexts, two devices, one human attention budget.

Source: Productivity literature, GTD methodology, every knowledge worker drowning in email

Prompt

Design a personal email-processing system to keep a 300-email-per-day inbox at zero by EOD without destroying focus.

Functional:
- Email arrives all day across three contexts: work, personal, newsletters/promo.
- Each email gets one of: archive (no action), reply now (<2 min), defer (snooze with action), delegate (forward + tag), file (reference).
- Two processing windows per day (morning + late afternoon). Outside those windows, email is queued, not read.
- Newsletters auto-route to a read-later bucket, never trigger a notification.

Non-functional:
- A bad day (4x normal volume, e.g. after PTO) must not blow the daily processing budget. System batches and defers aggressively.
- If a key person emails (boss, partner, named contacts), notification breaks the window-only rule but rate-limits to one ping per hour.
- If the user misses an evening window, morning window must absorb the backlog without consuming the entire morning.

Return a Chinilla CanvasState. Components: inbox, classifier, windows, action buckets, notifications. Behaviors: split (context routing), queue (window batching), ratelimit (notification cap), batch (bulk processing), filter (newsletter shunt).

Constraints

Max components
12
Required behaviors
split, queue, ratelimit
Monthly budget
$50

Stress scenarios

Normal day

baseline

300 emails across two windows, mixed contexts.

Back from vacation

spike

4x backlog. Morning window must absorb without taking the whole morning.

Missed evening window

outage

User skipped late-afternoon processing. Backlog hits morning queue.

Hard-to-classify thread

latency

Long ambiguous threads require human read time. System must not block fresh email.

Pass criteria (overall)

Min stability score
60
Max drop rate
10.0%
Min delivery rate
85.0%
Max errors
6

Submit your run

Submissions go through the chini-bench CLI. It calls your model with your key, scores the result locally, and posts to the leaderboard. Nothing leaves your machine except the canvas it produces.

End-to-end:
pip install git+https://github.com/collapseindex/chini-bench-cli.git
export OPENROUTER_API_KEY=...

chini-bench run chini-016-inbox-zero \
  --provider openrouter --model google/gemini-2.0-flash-001 \
  --as alice --x alice --linkedin alice-builds
Or inspect the prompt first:
chini-bench prompt chini-016-inbox-zero
Providers: openai · anthropic · google · openrouter · ollama

Leaderboard

Rank Submitter Model Score Stability Delivery Design Pass Links
#1 alex default
A anthropic/claude-sonnet-4.6
95 88.0 100.0 75.0 X
#2 alex default
X x-ai/grok-4.20
93 91.0 91.0 75.0 X
#3 alex default
O openai/gpt-5.4
89 78.0 96.0 75.0 X
#4 alex default
G google/gemini-3.1-pro-preview
89 75.0 100.0 75.0 X
Per-scenario breakdown of the top run
Scenario Health Drop rate Delivered Pass
baseline 89.0 0.0% 352
post-pto 94.0 0.0% 1280
missed-window 79.0 0.0% 240
slow-classify 89.0 0.0% 320