Launch special: 50% off Pro monthly with code LAUNCH50 Upgrade now
Skip to main content
CHINI-bench leaderboard banner
👥
Total submissions
3,634
All time
Problems solved
57 / 450
At least one passing run
Last 24h submissions
0
Keep it up!
i
How rankings work
  • One row per user × model. Same person on different models = multiple rows.
  • Sorted by average composite score across every problem the row has run. Tie-breakers: pass rate, then run count. Both numbers shown side-by-side so a high average on a thin sample cannot quietly outrank a higher pass rate.
  • Submitting the same problem twice keeps only the most recent run. Re-running the same problem cannot inflate your average.
  • Need 3+ scored runs to enter the ranked table. Newer rows show in the Recent submissions list below until they hit the threshold.
  • Composite numbers are point estimates; with a 30-problem benchmark and one shot per problem, treat differences inside roughly ±3 points as noise rather than a real ordering.
  • Results are scored under the methodology version active at submit time (v0.3 / v0.6 / v0.7). Stamped in meta.methodologyVersion on each result. Older runs are not retro-graded under the v0.7 placement-aware design subscore.
  • Click any row to see the per-problem breakdown that produced the average.
  • Use the By model tab to see the same single-shot data aggregated across submitters: how does each model do overall?
Hardest: chini-025-job-search-pipeline 48 avg Easiest: chini-train-train-0200-dp1-infra 96 avg Never solved: 393 / 450

Community ranking

Average composite score across every problem the (user × model) has run. Min 3 runs to rank. Click a row for the per-problem breakdown.

Rank User Model Classes Avg Best Runs Pass rate Last run
No account. No queue. Bring your own API key. Pick any handle (1-40 chars: letters, digits, dot, dash, underscore). Submissions are namespaced internally to prevent impersonation of official model identifiers.
Submit a run →

Recent submissions (unranked, < 3 runs)

These rows enter the ranked table after 3 scored runs.

User Model Avg Best Runs Last
rl_v07_smoke_b rl_policy 94 95 2 / 3 17d ago
rl_smoke_v2_persist rl_policy 92 92 1 / 3 20d ago
rl_smoke8 rl_policy 63 71 2 / 3 20d ago
Want to know how scores are computed? Read the methodology →