chini-007-payment-webhook

Payment Webhook Receiver

Accept inbound webhooks. Never lose one. Never double-process one.

Source: Classic system-design interview corpus (Stripe / Shopify webhook ingest)

Prompt

Design a webhook-receiver service for a payment processor.

Functional:
- POST /webhook accepts JSON events from Stripe-like upstream providers.
- Each event has a unique id; processing must be idempotent (no double charges).
- Downstream consumers (ledger, email, fulfillment) read processed events.

Non-functional:
- A 3x burst of events during a sale must not lose any.
- If a downstream consumer is slow or down, the receive path must keep accepting webhooks (the upstream will give up and stop retrying if you 5xx for too long).
- Retries from the upstream must be deduped, not double-processed.

Return a Chinilla CanvasState. A durable queue between accept and consumers is almost certainly required, plus dedup storage.

Constraints

Max components: 11
Required behaviors: queue, storage, retry
Monthly budget: $600

Stress scenarios

Baseline events

baseline

Steady webhook volume.

3x sale burst

spike

Black Friday sale triples webhook volume.

Consumer outage

outage

A downstream consumer is offline. Receive path must still accept events.

Pass criteria (overall)

Min stability score: 70
Max drop rate: 4.0%
Min delivery rate: 92.0%
Max errors: 4

Submit your run

Submissions go through the chini-bench CLI. It calls your model with your key, scores the result locally, and posts to the leaderboard. Nothing leaves your machine except the canvas it produces.

End-to-end:

pip install git+https://github.com/collapseindex/chini-bench-cli.git
export OPENROUTER_API_KEY=...

chini-bench run chini-007-payment-webhook \
  --provider openrouter --model google/gemini-2.0-flash-001 \
  --as alice

Or inspect the prompt first:

chini-bench prompt chini-007-payment-webhook

Providers: openai · anthropic · google · openrouter · ollama

Leaderboard

Rank	Submitter	Model	Score	Stability	Delivery	Design	Pass
#1	rl_v06_run1	rl_policy custom single-shot	94	84.0	100.0	100.0	✓
#2	rl_v06_run1	rl_policy custom single-shot	94	84.0	100.0	100.0	✓
#3	rl_v06_run1	rl_policy custom single-shot	94	85.0	100.0	100.0	✓
#4	rl_v06_run1	rl_policy custom single-shot	94	84.0	100.0	100.0	✓
#5	rl_v06_run1	rl_policy custom single-shot	94	84.0	100.0	100.0	✓
#6	rl_v06_run2	rl_policy custom single-shot	94	85.0	100.0	100.0	✓
#7	rl_v06_run2	rl_policy custom single-shot	94	85.0	100.0	100.0	✓
#8	rl_v06_run1	rl_policy custom single-shot	93	83.0	100.0	100.0	✓
#9	rl_v06_run2	rl_policy custom single-shot	93	83.0	100.0	100.0	✓
#10	rl_v06_run2	rl_policy custom single-shot	93	82.0	100.0	85.0	✓
#11	rl_v06_run2	rl_policy custom single-shot	93	83.0	100.0	100.0	✓
#12	rl_v06_run2	rl_policy custom single-shot	93	83.0	100.0	85.0	✓
#13	rl_v06_run1	rl_policy custom single-shot	92	81.0	100.0	85.0	✓
#14	rl_v06_run1	rl_policy custom single-shot	92	85.0	100.0	75.0	✗
#15	rl_v06_run1	rl_policy custom single-shot	91	83.0	100.0	100.0	✗
#16	rl_v06_run2	rl_policy custom single-shot	91	83.0	100.0	75.0	✗
#17	rl_v06_run2	rl_policy custom single-shot	91	83.0	100.0	85.0	✗
#18	rl_v06_run1	rl_policy custom single-shot	90	82.0	92.0	85.0	✗
#19	rl_v06_run2	rl_policy custom single-shot	90	82.0	92.0	85.0	✗
#20	rl_v06_run2	rl_policy custom single-shot	90	81.0	100.0	75.0	✗
#21	rl_v06_run2	rl_policy custom single-shot	90	82.0	92.0	85.0	✗
#22	rl_v06_run2	rl_policy custom single-shot	90	82.0	100.0	60.0	✗
#23	rl_v06_run1	rl_policy custom single-shot	89	85.0	100.0	75.0	✗
#24	alex	openai/gpt-5.4 default single-shot	88	70.0	100.0	100.0	✓
#25	rl_v06_run2	rl_policy custom single-shot	88	70.0	100.0	100.0	✓
#26	rl_v06_run2	rl_policy custom single-shot	88	83.0	92.0	75.0	✗
#27	rl_v06_run2	rl_policy custom single-shot	88	82.0	100.0	50.0	✗
#28	rl_v06_run2	rl_policy custom single-shot	88	82.0	89.0	100.0	✗
#29	rl_v06_run2	rl_policy custom single-shot	88	82.0	87.0	85.0	✗
#30	rl_v06_run1	rl_policy custom single-shot	87	81.0	87.0	85.0	✗
#31	rl_v06_run2	rl_policy custom single-shot	87	82.0	92.0	60.0	✗
#32	rl_v06_run2	rl_policy custom single-shot	86	82.0	83.0	85.0	✗
#33	rl_v06_run2	rl_policy custom single-shot	86	81.0	83.0	85.0	✗
#34	rl_v06_run2	rl_policy custom single-shot	86	81.0	89.0	60.0	✗
#35	rl_v06_run2	rl_policy custom single-shot	86	82.0	83.0	85.0	✗
#36	rl_v06_run2	rl_policy custom single-shot	86	82.0	89.0	60.0	✗
#37	rl_v06_run2	rl_policy custom single-shot	86	82.0	83.0	85.0	✗
#38	rl_v06_run2	rl_policy custom single-shot	85	81.0	87.0	60.0	✗
#39	rl_v06_run2	rl_policy custom single-shot	85	82.0	80.0	85.0	✗
#40	rl_v06_run2	rl_policy custom single-shot	84	82.0	83.0	85.0	✗
#41	rl_v06_run2	rl_policy custom single-shot	84	83.0	83.0	85.0	✗
#42	rl_v06_run2	rl_policy custom single-shot	84	83.0	83.0	60.0	✗
#43	rl_v06_run2	rl_policy custom single-shot	84	83.0	83.0	60.0	✗
#44	rl_v06_run2	rl_policy custom single-shot	84	78.0	83.0	85.0	✗
#45	rl_v06_run2	rl_policy custom single-shot	82	88.0	67.0	85.0	✗
#46	alex	anthropic/claude-sonnet-4.6 default single-shot	80	84.0	67.0	100.0	✗
#47	alex	google/gemini-3.1-pro-preview default single-shot	80	84.0	67.0	100.0	✗
#48	rl_v06_run2	rl_policy custom single-shot	80	80.0	83.0	60.0	✗
#49	alex	google/gemini-3.1-pro-preview default reflexion	79	81.0	67.0	100.0	✗
#50	rl_v06_run1	rl_policy custom single-shot	79	81.0	67.0	85.0	✗
#51	rl_v06_run2	rl_policy custom single-shot	79	80.0	67.0	85.0	✗
#52	rl_v06_run2	rl_policy custom single-shot	79	81.0	78.0	60.0	✗
#53	rl_v06_run1	rl_policy custom single-shot	78	78.0	67.0	100.0	✗
#54	rl_v06_run2	rl_policy custom single-shot	78	83.0	67.0	75.0	✗
#55	rl_v06_run1	rl_policy custom single-shot	77	82.0	67.0	60.0	✗
#56	rl_v06_run2	rl_policy custom single-shot	77	81.0	67.0	60.0	✗
#57	rl_v06_run2	rl_policy custom single-shot	76	74.0	67.0	85.0	✗
#58	rl_v06_run2	rl_policy custom single-shot	76	80.0	67.0	75.0	✗
#59	alex	anthropic/claude-sonnet-4.6 default reflexion	75	64.0	83.0	100.0	✗
#60	rl_v06_run1	rl_policy custom single-shot	75	84.0	67.0	50.0	✗
#61	alex	x-ai/grok-4.20 default single-shot	73	65.0	67.0	100.0	✗
#62	alex	x-ai/grok-4.20 default reflexion	70	59.0	67.0	100.0	✗
#63	alex	openai/gpt-5.4 default reflexion	66	34.0	83.0	100.0	✗

Per-scenario breakdown of the top run

Scenario	Health	Drop rate	Delivered	Pass
baseline	85.0	0.0%	144	✓
sale-burst	83.0	0.0%	432	✓
consumer-down	85.0	0.0%	30	✓

How is this scored? →