Launch special: 50% off Pro monthly with code LAUNCH50 Upgrade now
Skip to main content
← All problems
chini-007-payment-webhook

Payment Webhook Receiver

Accept inbound webhooks. Never lose one. Never double-process one.

Source: Classic system-design interview corpus (Stripe / Shopify webhook ingest)

Prompt

Design a webhook-receiver service for a payment processor.

Functional:
- POST /webhook accepts JSON events from Stripe-like upstream providers.
- Each event has a unique id; processing must be idempotent (no double charges).
- Downstream consumers (ledger, email, fulfillment) read processed events.

Non-functional:
- A 3x burst of events during a sale must not lose any.
- If a downstream consumer is slow or down, the receive path must keep accepting webhooks (the upstream will give up and stop retrying if you 5xx for too long).
- Retries from the upstream must be deduped, not double-processed.

Return a Chinilla CanvasState. A durable queue between accept and consumers is almost certainly required, plus dedup storage.

Constraints

Max components
11
Required behaviors
queue, storage, retry
Monthly budget
$600

Stress scenarios

Baseline events

baseline

Steady webhook volume.

3x sale burst

spike

Black Friday sale triples webhook volume.

Consumer outage

outage

A downstream consumer is offline. Receive path must still accept events.

Pass criteria (overall)

Min stability score
70
Max drop rate
4.0%
Min delivery rate
92.0%
Max errors
4

Submit your run

Submissions go through the chini-bench CLI. It calls your model with your key, scores the result locally, and posts to the leaderboard. Nothing leaves your machine except the canvas it produces.

End-to-end:
pip install git+https://github.com/collapseindex/chini-bench-cli.git
export OPENROUTER_API_KEY=...

chini-bench run chini-007-payment-webhook \
  --provider openrouter --model google/gemini-2.0-flash-001 \
  --as alice
Or inspect the prompt first:
chini-bench prompt chini-007-payment-webhook
Providers: openai · anthropic · google · openrouter · ollama

Leaderboard

Rank Submitter Model Score Stability Delivery Design Pass
#1 rl_v06_run1
rl_policy
custom single-shot
94 84.0 100.0 100.0
#2 rl_v06_run1
rl_policy
custom single-shot
94 84.0 100.0 100.0
#3 rl_v06_run1
rl_policy
custom single-shot
94 85.0 100.0 100.0
#4 rl_v06_run1
rl_policy
custom single-shot
94 84.0 100.0 100.0
#5 rl_v06_run1
rl_policy
custom single-shot
94 84.0 100.0 100.0
#6 rl_v06_run2
rl_policy
custom single-shot
94 85.0 100.0 100.0
#7 rl_v06_run2
rl_policy
custom single-shot
94 85.0 100.0 100.0
#8 rl_v06_run1
rl_policy
custom single-shot
93 83.0 100.0 100.0
#9 rl_v06_run2
rl_policy
custom single-shot
93 83.0 100.0 100.0
#10 rl_v06_run2
rl_policy
custom single-shot
93 82.0 100.0 85.0
#11 rl_v06_run2
rl_policy
custom single-shot
93 83.0 100.0 100.0
#12 rl_v06_run2
rl_policy
custom single-shot
93 83.0 100.0 85.0
#13 rl_v06_run1
rl_policy
custom single-shot
92 81.0 100.0 85.0
#14 rl_v06_run1
rl_policy
custom single-shot
92 85.0 100.0 75.0
#15 rl_v06_run1
rl_policy
custom single-shot
91 83.0 100.0 100.0
#16 rl_v06_run2
rl_policy
custom single-shot
91 83.0 100.0 75.0
#17 rl_v06_run2
rl_policy
custom single-shot
91 83.0 100.0 85.0
#18 rl_v06_run1
rl_policy
custom single-shot
90 82.0 92.0 85.0
#19 rl_v06_run2
rl_policy
custom single-shot
90 82.0 92.0 85.0
#20 rl_v06_run2
rl_policy
custom single-shot
90 81.0 100.0 75.0
#21 rl_v06_run2
rl_policy
custom single-shot
90 82.0 92.0 85.0
#22 rl_v06_run2
rl_policy
custom single-shot
90 82.0 100.0 60.0
#23 rl_v06_run1
rl_policy
custom single-shot
89 85.0 100.0 75.0
#24 alex
openai/gpt-5.4
default single-shot
88 70.0 100.0 100.0
#25 rl_v06_run2
rl_policy
custom single-shot
88 70.0 100.0 100.0
#26 rl_v06_run2
rl_policy
custom single-shot
88 83.0 92.0 75.0
#27 rl_v06_run2
rl_policy
custom single-shot
88 82.0 100.0 50.0
#28 rl_v06_run2
rl_policy
custom single-shot
88 82.0 89.0 100.0
#29 rl_v06_run2
rl_policy
custom single-shot
88 82.0 87.0 85.0
#30 rl_v06_run1
rl_policy
custom single-shot
87 81.0 87.0 85.0
#31 rl_v06_run2
rl_policy
custom single-shot
87 82.0 92.0 60.0
#32 rl_v06_run2
rl_policy
custom single-shot
86 82.0 83.0 85.0
#33 rl_v06_run2
rl_policy
custom single-shot
86 81.0 83.0 85.0
#34 rl_v06_run2
rl_policy
custom single-shot
86 81.0 89.0 60.0
#35 rl_v06_run2
rl_policy
custom single-shot
86 82.0 83.0 85.0
#36 rl_v06_run2
rl_policy
custom single-shot
86 82.0 89.0 60.0
#37 rl_v06_run2
rl_policy
custom single-shot
86 82.0 83.0 85.0
#38 rl_v06_run2
rl_policy
custom single-shot
85 81.0 87.0 60.0
#39 rl_v06_run2
rl_policy
custom single-shot
85 82.0 80.0 85.0
#40 rl_v06_run2
rl_policy
custom single-shot
84 82.0 83.0 85.0
#41 rl_v06_run2
rl_policy
custom single-shot
84 83.0 83.0 85.0
#42 rl_v06_run2
rl_policy
custom single-shot
84 83.0 83.0 60.0
#43 rl_v06_run2
rl_policy
custom single-shot
84 83.0 83.0 60.0
#44 rl_v06_run2
rl_policy
custom single-shot
84 78.0 83.0 85.0
#45 rl_v06_run2
rl_policy
custom single-shot
82 88.0 67.0 85.0
#46 alex
anthropic/claude-sonnet-4.6
default single-shot
80 84.0 67.0 100.0
#47 alex
google/gemini-3.1-pro-preview
default single-shot
80 84.0 67.0 100.0
#48 rl_v06_run2
rl_policy
custom single-shot
80 80.0 83.0 60.0
#49 alex
google/gemini-3.1-pro-preview
default reflexion
79 81.0 67.0 100.0
#50 rl_v06_run1
rl_policy
custom single-shot
79 81.0 67.0 85.0
#51 rl_v06_run2
rl_policy
custom single-shot
79 80.0 67.0 85.0
#52 rl_v06_run2
rl_policy
custom single-shot
79 81.0 78.0 60.0
#53 rl_v06_run1
rl_policy
custom single-shot
78 78.0 67.0 100.0
#54 rl_v06_run2
rl_policy
custom single-shot
78 83.0 67.0 75.0
#55 rl_v06_run1
rl_policy
custom single-shot
77 82.0 67.0 60.0
#56 rl_v06_run2
rl_policy
custom single-shot
77 81.0 67.0 60.0
#57 rl_v06_run2
rl_policy
custom single-shot
76 74.0 67.0 85.0
#58 rl_v06_run2
rl_policy
custom single-shot
76 80.0 67.0 75.0
#59 alex
anthropic/claude-sonnet-4.6
default reflexion
75 64.0 83.0 100.0
#60 rl_v06_run1
rl_policy
custom single-shot
75 84.0 67.0 50.0
#61 alex
x-ai/grok-4.20
default single-shot
73 65.0 67.0 100.0
#62 alex
x-ai/grok-4.20
default reflexion
70 59.0 67.0 100.0
#63 alex
openai/gpt-5.4
default reflexion
66 34.0 83.0 100.0
Per-scenario breakdown of the top run
Scenario Health Drop rate Delivered Pass
baseline 85.0 0.0% 144
sale-burst 83.0 0.0% 432
consumer-down 85.0 0.0% 30