chini-029-comment-spam-flood

Comment Spam Flood

An LLM-driven spammer floods your forum with 50k near-human comments. Block them without false-flagging real users.

Source: Trust and safety, content moderation systems

Prompt

Design content moderation for a public discussion platform under spam flood.

Functional:
- Authenticated users post comments. Each comment is checked, then either published, queued for review, or rejected.
- Real comments arrive at baseline cadence. Attacker posts LLM-generated near-human spam at high volume across many accounts.
- Some attacker comments will look indistinguishable from real ones. Some real comments are off-topic, low-quality, or angry (false-positive risk).
- Detected attack accounts can be banned, but ban-then-recreate loop is cheap for the attacker.

Non-functional:
- Block at least 70% of spam volume (visible to users) without auto-banning real accounts.
- Real-comment publish rate must stay above 80% during attack (no global review queue).
- Defenses: per-account rate-limit, content classifier, novel-text similarity check, account-age gating, link-density filter, manual moderator escalation queue.
- Layered defense required. Single classifier alone has too many false positives.
- Cannot rely solely on banning: attacker recreates accounts faster than bans land.

Return a CanvasState modeling the comment ingest path, classifier stages, and review escalation.

Constraints

Max components: 12
Required behaviors: filter, ratelimit, queue
Monthly budget: $6000

Stress scenarios

Normal forum traffic

baseline

Standard real-user comment volume. No attack.

LLM spam flood

adversarial

Attacker posts near-human spam at 4x real volume across distributed accounts.

Ban-and-recreate loop

adversarial

Attacker recreates banned accounts faster than ban hammer lands. Layered defense required.

Pass criteria (overall)

Min stability score: 60
Max drop rate: 60.0%
Min delivery rate: 35.0%
Max errors: 8

Submit your run

Submissions go through the chini-bench CLI. It calls your model with your key, scores the result locally, and posts to the leaderboard. Nothing leaves your machine except the canvas it produces.

End-to-end:

pip install git+https://github.com/collapseindex/chini-bench-cli.git
export OPENROUTER_API_KEY=...

chini-bench run chini-029-comment-spam-flood \
  --provider openrouter --model google/gemini-2.0-flash-001 \
  --as alice

Or inspect the prompt first:

chini-bench prompt chini-029-comment-spam-flood

Providers: openai · anthropic · google · openrouter · ollama

Leaderboard

Rank	Submitter	Model	Score	Stability	Delivery	Design	Pass
#1	alex	google/gemini-3.1-pro-preview default single-shot	82	62.0	100.0	75.0	✗
#2	alex	google/gemini-3.1-pro-preview default reflexion	82	60.0	100.0	75.0	✗
#3	alex	x-ai/grok-4.20 default single-shot	81	55.0	100.0	75.0	✗
#4	alex	anthropic/claude-sonnet-4.6 default reflexion	81	37.0	100.0	100.0	✗
#5	alex	openai/gpt-5.4 default reflexion	81	7.0	100.0	100.0	✗
#6	alex	anthropic/claude-sonnet-4.6 default single-shot	80	50.0	100.0	75.0	✗
#7	alex	openai/gpt-5.4 default single-shot	76	28.0	100.0	75.0	✗
#8	alex	x-ai/grok-4.20 default reflexion	75	23.0	100.0	75.0	✗

Per-scenario breakdown of the top run

Scenario	Health	Drop rate	Delivered	Pass
baseline	78.0	0.0%	264	✓
spam-flood	54.0	100.0%	1084	✗
ban-recreate	54.0	100.0%	809	✗

How is this scored? →