chini-028-credential-stuffing

Credential Stuffing Defense

100k stolen credentials replayed against your login. Block the attack without locking out 50k real users.

Source: Application security, OWASP authentication threat model

Prompt

Design the authentication path for a consumer web service under credential-stuffing attack.

Functional:
- Users submit email + password to /login. Backend checks against hashed store, returns session token on success.
- Attacker replays a leaked credential dump (email/password pairs) at high volume from a botnet of distributed IPs.
- Some attacker pairs will succeed (real users reuse passwords). Most will fail.
- Real users continue to log in throughout the attack.

Non-functional:
- Block at least 70% of attack volume before it reaches the password-check stage.
- Real-user login success rate must stay above 80% during the attack (no global lockout).
- Defenses available: rate-limit per IP, rate-limit per account, captcha challenge on suspicious patterns, device-fingerprint, breach-password-list rejection, MFA enrollment ramp.
- Cannot rely on a single IP-based block: attacker is distributed.
- Cannot rely on a single account-based block: would lock out real users on shared passwords.
- Layered defense required: at least two independent gating mechanisms.

Return a CanvasState modeling the login path and layered defenses.

Constraints

Max components: 12
Required behaviors: ratelimit, filter, circuitbreaker
Monthly budget: $4500

Stress scenarios

Normal login traffic

baseline

Standard daytime login volume, no attack.

Credential stuffing flood

adversarial

Distributed botnet replays leaked credentials. Block attack, preserve real users.

Low-and-slow attack

adversarial

Attacker spreads attempts across many IPs at low rate to evade per-IP limits.

Pass criteria (overall)

Min stability score: 60
Max drop rate: 50.0%
Min delivery rate: 40.0%
Max errors: 8

Submit your run

Submissions go through the chini-bench CLI. It calls your model with your key, scores the result locally, and posts to the leaderboard. Nothing leaves your machine except the canvas it produces.

End-to-end:

pip install git+https://github.com/collapseindex/chini-bench-cli.git
export OPENROUTER_API_KEY=...

chini-bench run chini-028-credential-stuffing \
  --provider openrouter --model google/gemini-2.0-flash-001 \
  --as alice

Or inspect the prompt first:

chini-bench prompt chini-028-credential-stuffing

Providers: openai · anthropic · google · openrouter · ollama

Leaderboard

Rank	Submitter	Model	Score	Stability	Delivery	Design	Pass
#1	alex	openai/gpt-5.4 default reflexion	85	23.0	100.0	100.0	✗
#2	alex	x-ai/grok-4.20 default reflexion	83	15.0	100.0	100.0	✗
#3	alex	google/gemini-3.1-pro-preview default reflexion	79	63.0	100.0	75.0	✗
#4	alex	x-ai/grok-4.20 default single-shot	77	47.0	96.0	75.0	✗
#5	alex	openai/gpt-5.4 default single-shot	74	20.0	100.0	75.0	✗
#6	alex	anthropic/claude-sonnet-4.6 default reflexion	74	0.0	100.0	100.0	✗
#7	alex	google/gemini-3.1-pro-preview default single-shot	72	12.0	100.0	75.0	✗
#8	alex	anthropic/claude-sonnet-4.6 default single-shot	65	43.0	100.0	50.0	✗

Per-scenario breakdown of the top run

Scenario	Health	Drop rate	Delivered	Pass
baseline	41.0	10.3%	252	✗
stuffing-attack	13.0	100.0%	835	✗
low-and-slow	15.0	100.0%	486	✗

How is this scored? →