AutoBench LLM Leaderboard

Interactive leaderboard for AutoBench, where LLMs rank LLMs' responses. Includes performance, cost, and latency metrics. Use the dropdown below to navigate between different benchmark runs.

📊 Select AutoBench Run

Choose a benchmark run to view its results

Current Run: AutoBench Run 5 - December 2025 (2025-12-19) - 38 models


Overall Model Performance

Models ranked by AutoBench score. Lower cost ($ Cents), latency (s), and fail rate (%) are better. Iterations shows the number of evaluations per model.

Benchmark Correlations: AutoBench features 69.19% with LMArena, 89.52% with Artificial Analysis Intelligence Index, 82.64% with MMLU.

Overall Rankings

Overall Rankings
Llama-3.3-nemotron-super-49b-v1.5
4.476206
81.882
261.3839264
783.8191
0.101587302
303