AutoBench LLM Leaderboard

Interactive leaderboard for AutoBench, where LLMs rank LLMs' responses. Includes performance, cost, and latency metrics. Use the dropdown below to navigate between different benchmark runs.

📊 Select AutoBench Run

Choose a benchmark run to view its results

Current Run: AutoBench Agentic Run 1 - April 2026 (2026-04-16) - 31 models


Overall Model Performance

Models ranked by AutoBench score. Lower cost ($ Cents), latency (s), and fail rate (%) are better. Iterations shows the number of evaluations per model.

Benchmark Correlations: AutoBench features 82.71% with Artificial Analysis Intelligence Index, 80.25% with Terminal-Bench Hard, 81.81% with GDPval-AA, 66.45% with Tau2-Bench Telecom.

Overall Rankings

Overall Rankings
Gemini-3.1-flash-lite-preview
3.17
2.563
129
147
33.33%
198