AutoBench LLM Leaderboard

Interactive leaderboard for AutoBench, where LLMs rank LLMs' responses. Includes performance, cost, and latency metrics. Use the dropdown below to navigate between different benchmark runs.

📊 Select AutoBench Run

Choose a benchmark run to view its results

Current Run: AutoBench Run 5 - December 2025 (2025-12-16) - 35 models


Overall Model Performance

Models ranked by AutoBench score. Lower cost ($ Cents), latency (s), and fail rate (%) are better. Iterations shows the number of evaluations per model.

Benchmark Correlations: AutoBench features 69.19% with LMArena, 89.38% with Artificial Analysis Intelligence Index, 82.21% with MMLU.

Overall Rankings

Overall Rankings
Llama-3.3-nemotron-super-49b-v1.5
4.48
81.88
261
784
10.16%
303