AutoBench LLM Leaderboard

Interactive leaderboard for AutoBench, where LLMs rank LLMs' responses. Includes performance, cost, and latency metrics. Use the dropdown below to navigate between different benchmark runs.

📊 Select AutoBench Run

Choose a benchmark run to view its results

Current Run: AutoBench Run 3 - August 2025 (2025-08-14) - 34 models


Overall Model Performance

Models ranked by AutoBench score. Lower cost ($ Cents), latency (s), and fail rate (%) are better. Iterations shows the number of evaluations per model.

Benchmark Correlations: AutoBench features 86.85% with LMArena, 92.17% with Artificial Analysis Intelligence Index, 75.44% with MMLU.

Overall Rankings

Overall Rankings
llama-3_1-Nemotron-Ultra-253B-v1
4.511567341
4.368
89.99818067
277.6722
19.27%
385