AI Agent Evaluations
Performance results of AI coding agents on Nuxt code generation tasks, measuring success rate and execution time.
Agent Performance Results
| Model | Agent | Total Evals | Success Rate | |
|---|---|---|---|---|
Claude Opus 4.6 | Claude Code | 25 | 96% | |
Claude Sonnet 4.6 | Claude Code | 25 | 92% | |
GPT 5.3 Codex (xhigh) | Codex | 25 | 88% | |
Cursor Composer 1.5 | Cursor | 25 | 84% | |
Gemini 3 Pro Preview | OpenCode | 25 | 80% | |
Gemini 3 Pro Preview | Gemini CLI | 25 | 80% | |
Claude Sonnet 4.5 | Claude Code | 25 | 64% |