AI Agent Evaluations

Performance results of AI coding agents on Nuxt code generation tasks, measuring success rate and execution time.
View on GitHubLast run date: June 3, 2026

Agent Performance Results

ModelAgentAvg DurationSuccess Rate
Claude Opus 4.6
Claude Code244.59s96%
Claude Opus 4.8
Claude Code244.66s92%
Claude Sonnet 4.6
Claude Code238.41s92%
Claude Opus 4.7
Claude Code190.49s88%
GPT 5.3 Codex (xhigh)
Codex246.03s88%
Gemini 3.1 Pro Preview
Gemini CLI345.51s88%
Cursor Composer 2.5
Cursor253.29s80%
Cursor Composer 2.0
Cursor221.47s80%
Cursor Composer 1.5
Cursor201.57s80%
Gemini 3 Pro Preview
Gemini CLI365.91s76%
GPT 5.4 (xhigh)
Codex313.65s72%
MiniMax M2.7
OpenCode259.47s68%
Claude Sonnet 4.5
Claude Code225.92s60%
Devstral 2
OpenCode213.26s36%