chini-016-inbox-zero
Inbox Zero Maintenance
300 emails a day, three contexts, two devices, one human attention budget.
Source: Productivity literature, GTD methodology, every knowledge worker drowning in email
Prompt
Design a personal email-processing system to keep a 300-email-per-day inbox at zero by EOD without destroying focus. Functional: - Email arrives all day across three contexts: work, personal, newsletters/promo. - Each email gets one of: archive (no action), reply now (<2 min), defer (snooze with action), delegate (forward + tag), file (reference). - Two processing windows per day (morning + late afternoon). Outside those windows, email is queued, not read. - Newsletters auto-route to a read-later bucket, never trigger a notification. Non-functional: - A bad day (4x normal volume, e.g. after PTO) must not blow the daily processing budget. System batches and defers aggressively. - If a key person emails (boss, partner, named contacts), notification breaks the window-only rule but rate-limits to one ping per hour. - If the user misses an evening window, morning window must absorb the backlog without consuming the entire morning. Return a Chinilla CanvasState. Components: inbox, classifier, windows, action buckets, notifications. Behaviors: split (context routing), queue (window batching), ratelimit (notification cap), batch (bulk processing), filter (newsletter shunt).
Constraints
- Max components
- 12
- Required behaviors
- split, queue, ratelimit
- Monthly budget
- $50
Stress scenarios
Normal day
baseline300 emails across two windows, mixed contexts.
Back from vacation
spike4x backlog. Morning window must absorb without taking the whole morning.
Missed evening window
outageUser skipped late-afternoon processing. Backlog hits morning queue.
Hard-to-classify thread
latencyLong ambiguous threads require human read time. System must not block fresh email.
Pass criteria (overall)
- Min stability score
- 60
- Max drop rate
- 10.0%
- Min delivery rate
- 85.0%
- Max errors
- 6
Submit your run
Submissions go through the chini-bench CLI. It calls your model with your key, scores the result locally, and posts to the leaderboard. Nothing leaves your machine except the canvas it produces.
End-to-end:
pip install git+https://github.com/collapseindex/chini-bench-cli.git
export OPENROUTER_API_KEY=...
chini-bench run chini-016-inbox-zero \
--provider openrouter --model google/gemini-2.0-flash-001 \
--as alice --x alice --linkedin alice-builds Or inspect the prompt first:
chini-bench prompt chini-016-inbox-zero Providers: openai · anthropic · google · openrouter · ollama
Leaderboard
| Rank | Submitter | Model | Score | Stability | Delivery | Design | Pass | Links |
|---|---|---|---|---|---|---|---|---|
| #1 | alex default | A anthropic/claude-sonnet-4.6 | 95 | 88.0 | 100.0 | 75.0 | ✓ | X |
| #2 | alex default | X x-ai/grok-4.20 | 93 | 91.0 | 91.0 | 75.0 | ✗ | X |
| #3 | alex default | O openai/gpt-5.4 | 89 | 78.0 | 96.0 | 75.0 | ✗ | X |
| #4 | alex default | G google/gemini-3.1-pro-preview | 89 | 75.0 | 100.0 | 75.0 | ✓ | X |
Per-scenario breakdown of the top run
| Scenario | Health | Drop rate | Delivered | Pass |
|---|---|---|---|---|
| baseline | 89.0 | 0.0% | 352 | ✓ |
| post-pto | 94.0 | 0.0% | 1280 | ✓ |
| missed-window | 79.0 | 0.0% | 240 | ✓ |
| slow-classify | 89.0 | 0.0% | 320 | ✓ |