chini-002-checkout

E-commerce Checkout with Idempotent Payments

Process checkouts without ever charging a customer twice. Survive a downstream payment-API outage.

Source: Classic system-design interview corpus (Stripe / payment gateway design)

Prompt

Design a checkout system that takes a cart and processes payment.

Functional:
- POST /checkout accepts an order, returns an order id.
- Payment is processed via an external payment provider that may time out.
- Every checkout must be idempotent: a retried checkout must NOT charge the user twice.

Non-functional:
- Survive a 5x peak (e.g., flash sale) without dropping more than 5% of orders.
- Survive a slow payment provider (added latency) without blocking the whole queue.
- Survive a brief payment-provider outage by queueing or circuit-breaking.

Return a Chinilla CanvasState. Include a queue, a retry behavior, and a circuit-breaker on the payment-provider edge.

Constraints

Max components: 12
Required behaviors: queue, retry, circuitbreaker
Monthly budget: $800

Stress scenarios

Baseline orders

baseline

Steady checkout traffic.

5x flash sale

spike

Order volume spikes 5x for the duration.

Slow payment provider

latency

Payment provider response time grows by 1500ms.

Payment provider outage

outage

Payment provider is down. System must queue or fail safely (no double-charge).

Pass criteria (overall)

Min stability score: 75
Max drop rate: 3.0%
Min delivery rate: 95.0%
Max errors: 3

Submit your run

Submissions go through the chini-bench CLI. It calls your model with your key, scores the result locally, and posts to the leaderboard. Nothing leaves your machine except the canvas it produces.

End-to-end:

pip install git+https://github.com/collapseindex/chini-bench-cli.git
export OPENROUTER_API_KEY=...

chini-bench run chini-002-checkout \
  --provider openrouter --model google/gemini-2.0-flash-001 \
  --as alice

Or inspect the prompt first:

chini-bench prompt chini-002-checkout

Providers: openai · anthropic · google · openrouter · ollama

Leaderboard

Rank	Submitter	Model	Score	Stability	Delivery	Design	Pass
#1	alex	anthropic/claude-sonnet-4.6 default single-shot	94	87.0	100.0	100.0	✓
#2	alex	openai/gpt-5.4 default single-shot	89	75.0	100.0	100.0	✓
#3	alex	x-ai/grok-4.20 default single-shot	86	69.0	100.0	100.0	✗
#4	alex	google/gemini-3.1-pro-preview default reflexion	84	65.0	100.0	100.0	✗
#5	rl_v06_run1	rl_policy custom single-shot	82	61.0	100.0	100.0	✗
#6	rl_v06_run2	rl_policy custom single-shot	82	70.0	88.0	75.0	✗
#7	alex	google/gemini-3.1-pro-preview default single-shot	80	56.0	100.0	100.0	✗
#8	rl_v06_run2	rl_policy custom single-shot	79	54.0	100.0	75.0	✗
#9	rl_v06_run2	rl_policy custom single-shot	79	72.0	77.0	75.0	✗
#10	rl_v06_run2	rl_policy custom single-shot	78	71.0	73.0	100.0	✗
#11	rl_v06_run2	rl_policy custom single-shot	77	75.0	76.0	75.0	✗
#12	alex	anthropic/claude-sonnet-4.6 default reflexion	76	61.0	100.0	100.0	✗
#13	rl_v06_run1	rl_policy custom single-shot	75	82.0	83.0	25.0	✗
#14	rl_v06_run2	rl_policy custom single-shot	75	81.0	75.0	60.0	✗
#15	rl_v06_run2	rl_policy custom single-shot	74	70.0	63.0	100.0	✗
#16	rl_v06_run1	rl_policy custom single-shot	73	74.0	66.0	75.0	✗
#17	rl_smoke8	rl_policy custom single-shot	72	82.0	75.0	25.0	✗
#18	rl_v06_run1	rl_policy custom single-shot	72	82.0	75.0	25.0	✗
#19	rl_v06_run1	rl_policy custom single-shot	72	83.0	75.0	25.0	✗
#20	rl_v06_run1	rl_policy custom single-shot	72	82.0	75.0	25.0	✗
#21	rl_v06_run1	rl_policy custom single-shot	72	82.0	75.0	25.0	✗
#22	rl_v06_run1	rl_policy custom single-shot	72	82.0	75.0	25.0	✗
#23	rl_v06_run1	rl_policy custom single-shot	72	82.0	75.0	25.0	✗
#24	rl_v06_run2	rl_policy custom single-shot	72	82.0	75.0	25.0	✗
#25	rl_v06_run2	rl_policy custom single-shot	72	83.0	75.0	25.0	✗
#26	rl_smoke8	rl_policy custom single-shot	71	81.0	75.0	25.0	✗
#27	rl_v06_run1	rl_policy custom single-shot	71	81.0	75.0	25.0	✗
#28	rl_v06_run1	rl_policy custom single-shot	71	81.0	75.0	25.0	✗
#29	rl_smoke8	rl_policy custom single-shot	70	66.0	58.0	100.0	✗
#30	rl_v06_run2	rl_policy custom single-shot	70	58.0	67.0	100.0	✗
#31	rl_v06_run2	rl_policy custom single-shot	69	65.0	55.0	60.0	✗
#32	rl_v06_run2	rl_policy custom single-shot	69	69.0	61.0	50.0	✗
#33	rl_v06_run2	rl_policy custom single-shot	69	66.0	56.0	75.0	✗
#34	rl_v06_run2	rl_policy custom single-shot	68	64.0	56.0	75.0	✗
#35	rl_v06_run1	rl_policy custom single-shot	67	64.0	53.0	75.0	✗
#36	rl_v06_run1	rl_policy custom single-shot	67	63.0	52.0	100.0	✗
#37	rl_v06_run2	rl_policy custom single-shot	67	56.0	63.0	75.0	✗
#38	alex	x-ai/grok-4.20 default reflexion	66	63.0	50.0	100.0	✗
#39	rl_v06_run1	rl_policy custom single-shot	66	62.0	51.0	100.0	✗
#40	rl_v06_run2	rl_policy custom single-shot	66	62.0	52.0	100.0	✗
#41	rl_v06_run2	rl_policy custom single-shot	66	62.0	51.0	100.0	✗
#42	rl_v06_run2	rl_policy custom single-shot	65	60.0	51.0	100.0	✗
#43	rl_v06_run2	rl_policy custom single-shot	65	62.0	50.0	75.0	✗
#44	rl_v06_run2	rl_policy custom single-shot	65	60.0	51.0	100.0	✗
#45	rl_v06_run2	rl_policy custom single-shot	65	61.0	49.0	75.0	✗
#46	rl_v06_run2	rl_policy custom single-shot	65	61.0	49.0	75.0	✗
#47	rl_v06_run2	rl_policy custom single-shot	65	60.0	51.0	75.0	✗
#48	rl_v06_run2	rl_policy custom single-shot	65	60.0	51.0	75.0	✗
#49	rl_v06_run2	rl_policy custom single-shot	65	60.0	51.0	75.0	✗
#50	rl_v06_run2	rl_policy custom single-shot	64	58.0	51.0	100.0	✗
#51	rl_v06_run2	rl_policy custom single-shot	63	60.0	47.0	85.0	✗
#52	rl_v06_run2	rl_policy custom single-shot	63	59.0	46.0	100.0	✗
#53	rl_v06_run1	rl_policy custom single-shot	62	62.0	51.0	50.0	✗
#54	rl_v06_run1	rl_policy custom single-shot	62	72.0	37.0	85.0	✗
#55	rl_v06_run2	rl_policy custom single-shot	62	62.0	51.0	60.0	✗
#56	rl_v06_run2	rl_policy custom single-shot	61	60.0	51.0	75.0	✗
#57	rl_v06_run1	rl_policy custom single-shot	60	54.0	45.0	100.0	✗
#58	rl_v06_run2	rl_policy custom single-shot	60	54.0	55.0	75.0	✗
#59	rl_v06_run2	rl_policy custom single-shot	59	55.0	42.0	100.0	✗
#60	rl_v06_run2	rl_policy custom single-shot	59	54.0	41.0	75.0	✗
#61	rl_v06_run2	rl_policy custom single-shot	59	55.0	40.0	85.0	✗
#62	rl_v06_run2	rl_policy custom single-shot	59	59.0	46.0	75.0	✗
#63	rl_v06_run1	rl_policy custom single-shot	58	52.0	41.0	75.0	✗
#64	rl_v06_run2	rl_policy custom single-shot	58	58.0	44.0	60.0	✗
#65	rl_v06_run2	rl_policy custom single-shot	57	52.0	40.0	100.0	✗
#66	rl_v06_run2	rl_policy custom single-shot	56	51.0	36.0	75.0	✗
#67	rl_v06_run2	rl_policy custom single-shot	55	53.0	42.0	50.0	✗
#68	rl_v06_run1	rl_policy custom single-shot	54	54.0	39.0	50.0	✗
#69	rl_v06_run1	rl_policy custom single-shot	54	50.0	43.0	75.0	✗
#70	rl_v06_run2	rl_policy custom single-shot	53	37.0	47.0	75.0	✗
#71	alex	openai/gpt-5.4 default reflexion	36	38.0	0.0	75.0	✗

Per-scenario breakdown of the top run

Scenario	Health	Drop rate	Delivered	Pass
baseline	87.0	0.0%	60	✓
flash-sale	85.0	0.0%	300	✓
payment-slow	87.0	0.0%	15	✓
payment-outage	88.0	0.0%	12	✓

How is this scored? →