Launch special: 50% off Pro monthly with code LAUNCH50 Upgrade now
Skip to main content
← All problems
chini-022-phishing-funnel

Phishing Defense Funnel

10,000 emails an hour. One of them is the spear-phish that gets the CFO's credentials. Find it.

Source: Enterprise email security, anti-phishing playbooks, the eternal war between SOC teams and attackers

Prompt

Design the inbound email defense pipeline for a 5,000-employee company.

Functional:
- Inbound email hits an MX gateway. Pre-filter: SPF/DKIM/DMARC, reputation, known-bad attachment hashes.
- Surviving mail goes through a sandbox detonation layer for attachments and link unfurling. Verdict: clean, suspicious, malicious.
- Suspicious mail routes to a quarantine with user-visible release option (with warning banner). Malicious mail dropped, alert raised.
- VIP accounts (executives, finance) get an extra anomaly check (sender history deviation, unusual urgency phrasing).

Non-functional:
- A 4x volumetric campaign must NOT cause clean business mail to be delayed beyond 2 minutes end-to-end.
- A targeted spear-phish to the CFO must trigger the VIP anomaly path even when SPF/DKIM pass and the sender domain looks legitimate.
- If the sandbox is overloaded, attachments must be held in a soft-quarantine, NOT delivered without scanning to keep latency down.

Return a Chinilla CanvasState. Components: MX gateway, pre-filter, sandbox, quarantine, VIP anomaly path, alerting. Behaviors: filter (reputation/auth checks), split (clean/quarantine/drop routing), ratelimit (sandbox capacity), circuitbreaker (sandbox overload soft-quarantine), batch (SOC alerting cadence).

Constraints

Max components
13
Required behaviors
filter, split, circuitbreaker
Monthly budget
$30000

Stress scenarios

Normal mail flow

baseline

Steady inbound volume, mixed clean and spam.

4x phishing campaign

adversarial

Volumetric campaign. Block phish, deliver clean mail without delay.

CFO spear-phish

adversarial

Targeted message passes auth checks. VIP anomaly path must catch it without flagging legit exec mail.

Sandbox queue full

latency

Detonation layer overloaded. Soft-quarantine must hold, not bypass scanning.

Pass criteria (overall)

Min stability score
65
Max drop rate
5.0%
Min delivery rate
92.0%
Max errors
5

Submit your run

Submissions go through the chini-bench CLI. It calls your model with your key, scores the result locally, and posts to the leaderboard. Nothing leaves your machine except the canvas it produces.

End-to-end:
pip install git+https://github.com/collapseindex/chini-bench-cli.git
export OPENROUTER_API_KEY=...

chini-bench run chini-022-phishing-funnel \
  --provider openrouter --model google/gemini-2.0-flash-001 \
  --as alice --x alice --linkedin alice-builds
Or inspect the prompt first:
chini-bench prompt chini-022-phishing-funnel
Providers: openai · anthropic · google · openrouter · ollama

Leaderboard

Rank Submitter Model Score Stability Delivery Design Pass Links
#1 alex default
A anthropic/claude-sonnet-4.6
92 59.0 100.0 100.0 X
#2 alex default
X x-ai/grok-4.20
88 40.0 100.0 100.0 X
#3 alex default
G google/gemini-3.1-pro-preview
82 21.0 87.0 100.0 X
#4 alex default
O openai/gpt-5.4
70 25.0 25.0 100.0 X
Per-scenario breakdown of the top run
Scenario Health Drop rate Delivered Pass
baseline 68.0 0.9% 513
campaign 47.0 100.0% 1978
spear-phish 45.0 100.0% 932
sandbox-overload 74.0 0.3% 483