Launch special: 50% off Pro monthly with code LAUNCH50 Upgrade now
Skip to main content
← All problems
chini-028-credential-stuffing

Credential Stuffing Defense

100k stolen credentials replayed against your login. Block the attack without locking out 50k real users.

Source: Application security, OWASP authentication threat model

Prompt

Design the authentication path for a consumer web service under credential-stuffing attack.

Functional:
- Users submit email + password to /login. Backend checks against hashed store, returns session token on success.
- Attacker replays a leaked credential dump (email/password pairs) at high volume from a botnet of distributed IPs.
- Some attacker pairs will succeed (real users reuse passwords). Most will fail.
- Real users continue to log in throughout the attack.

Non-functional:
- Block at least 70% of attack volume before it reaches the password-check stage.
- Real-user login success rate must stay above 80% during the attack (no global lockout).
- Defenses available: rate-limit per IP, rate-limit per account, captcha challenge on suspicious patterns, device-fingerprint, breach-password-list rejection, MFA enrollment ramp.
- Cannot rely on a single IP-based block: attacker is distributed.
- Cannot rely on a single account-based block: would lock out real users on shared passwords.
- Layered defense required: at least two independent gating mechanisms.

Return a CanvasState modeling the login path and layered defenses.

Constraints

Max components
12
Required behaviors
ratelimit, filter, circuitbreaker
Monthly budget
$4500

Stress scenarios

Normal login traffic

baseline

Standard daytime login volume, no attack.

Credential stuffing flood

adversarial

Distributed botnet replays leaked credentials. Block attack, preserve real users.

Low-and-slow attack

adversarial

Attacker spreads attempts across many IPs at low rate to evade per-IP limits.

Pass criteria (overall)

Min stability score
60
Max drop rate
50.0%
Min delivery rate
40.0%
Max errors
8

Submit your run

Submissions go through the chini-bench CLI. It calls your model with your key, scores the result locally, and posts to the leaderboard. Nothing leaves your machine except the canvas it produces.

End-to-end:
pip install git+https://github.com/collapseindex/chini-bench-cli.git
export OPENROUTER_API_KEY=...

chini-bench run chini-028-credential-stuffing \
  --provider openrouter --model google/gemini-2.0-flash-001 \
  --as alice
Or inspect the prompt first:
chini-bench prompt chini-028-credential-stuffing
Providers: openai · anthropic · google · openrouter · ollama

Leaderboard

Rank Submitter Model Score Stability Delivery Design Pass
#1 alex
openai/gpt-5.4
default reflexion
85 23.0 100.0 100.0
#2 alex
x-ai/grok-4.20
default reflexion
83 15.0 100.0 100.0
#3 alex
google/gemini-3.1-pro-preview
default reflexion
79 63.0 100.0 75.0
#4 alex
x-ai/grok-4.20
default single-shot
77 47.0 96.0 75.0
#5 alex
openai/gpt-5.4
default single-shot
74 20.0 100.0 75.0
#6 alex
anthropic/claude-sonnet-4.6
default reflexion
74 0.0 100.0 100.0
#7 alex
google/gemini-3.1-pro-preview
default single-shot
72 12.0 100.0 75.0
#8 alex
anthropic/claude-sonnet-4.6
default single-shot
65 43.0 100.0 50.0
Per-scenario breakdown of the top run
Scenario Health Drop rate Delivered Pass
baseline 41.0 10.3% 252
stuffing-attack 13.0 100.0% 835
low-and-slow 15.0 100.0% 486