AI Agent Evaluations

Performance results of AI coding agents on Nuxt code generation tasks, measuring success rate and execution time.
View on GitHubLast run date: February 24, 2026

Agent Performance Results

ModelAgentTotal EvalsSuccess Rate
Claude Opus 4.6
Claude Code2596%
Claude Sonnet 4.6
Claude Code2592%
GPT 5.3 Codex (xhigh)
Codex2588%
Cursor Composer 1.5
Cursor2584%
Gemini 3 Pro Preview
OpenCode2580%
Gemini 3 Pro Preview
Gemini CLI2580%
Claude Sonnet 4.5
Claude Code2564%