AutoBench LLM Leaderboard

Interactive leaderboard for AutoBench, where LLMs rank LLMs' responses. Includes performance, cost, and latency metrics. Use the dropdown below to navigate between different benchmark runs.

📊 Select AutoBench Run

Choose a benchmark run to view its results

Current Run: AutoBench Agronomy LLM Benchmark - December 2025 (2025-12-10) - 40 models


Overall Model Performance

Models ranked by AutoBench score. Lower cost ($ Cents), latency (s), and fail rate (%) are better. Iterations shows the number of evaluations per model.

Benchmark Correlations: AutoBench features 77.16% with LMArena, 87.08% with Artificial Analysis Intelligence Index, 80.68% with MMLU.

Overall Rankings

Overall Rankings
Llama-3.3-nemotron-super-49b-v1.5
4.849
5.43
140.66
347.66
1.02%
195