Launch special: 50% off Pro monthly with code LAUNCH50 Upgrade now
Skip to main content
← All problems
chini-029-comment-spam-flood

Comment Spam Flood

An LLM-driven spammer floods your forum with 50k near-human comments. Block them without false-flagging real users.

Source: Trust and safety, content moderation systems

Prompt

Design content moderation for a public discussion platform under spam flood.

Functional:
- Authenticated users post comments. Each comment is checked, then either published, queued for review, or rejected.
- Real comments arrive at baseline cadence. Attacker posts LLM-generated near-human spam at high volume across many accounts.
- Some attacker comments will look indistinguishable from real ones. Some real comments are off-topic, low-quality, or angry (false-positive risk).
- Detected attack accounts can be banned, but ban-then-recreate loop is cheap for the attacker.

Non-functional:
- Block at least 70% of spam volume (visible to users) without auto-banning real accounts.
- Real-comment publish rate must stay above 80% during attack (no global review queue).
- Defenses: per-account rate-limit, content classifier, novel-text similarity check, account-age gating, link-density filter, manual moderator escalation queue.
- Layered defense required. Single classifier alone has too many false positives.
- Cannot rely solely on banning: attacker recreates accounts faster than bans land.

Return a CanvasState modeling the comment ingest path, classifier stages, and review escalation.

Constraints

Max components
12
Required behaviors
filter, ratelimit, queue
Monthly budget
$6000

Stress scenarios

Normal forum traffic

baseline

Standard real-user comment volume. No attack.

LLM spam flood

adversarial

Attacker posts near-human spam at 4x real volume across distributed accounts.

Ban-and-recreate loop

adversarial

Attacker recreates banned accounts faster than ban hammer lands. Layered defense required.

Pass criteria (overall)

Min stability score
60
Max drop rate
60.0%
Min delivery rate
35.0%
Max errors
8

Submit your run

Submissions go through the chini-bench CLI. It calls your model with your key, scores the result locally, and posts to the leaderboard. Nothing leaves your machine except the canvas it produces.

End-to-end:
pip install git+https://github.com/collapseindex/chini-bench-cli.git
export OPENROUTER_API_KEY=...

chini-bench run chini-029-comment-spam-flood \
  --provider openrouter --model google/gemini-2.0-flash-001 \
  --as alice --x alice --linkedin alice-builds
Or inspect the prompt first:
chini-bench prompt chini-029-comment-spam-flood
Providers: openai · anthropic · google · openrouter · ollama

Leaderboard

Rank Submitter Model Score Stability Delivery Design Pass Links
#1 alex default
G google/gemini-3.1-pro-preview
82 62.0 100.0 75.0 X
#2 alex default
X x-ai/grok-4.20
81 55.0 100.0 75.0 X
#3 alex default
A anthropic/claude-sonnet-4.6
80 50.0 100.0 75.0 X
#4 alex default
O openai/gpt-5.4
76 28.0 100.0 75.0 X
Per-scenario breakdown of the top run
Scenario Health Drop rate Delivered Pass
baseline 78.0 0.0% 264
spam-flood 54.0 100.0% 1084
ban-recreate 54.0 100.0% 809