AI Agent Evaluations
Performance results of AI coding agents on Nuxt code generation tasks, measuring success rate and execution time.
Agent Performance Results
| Model | Agent | Avg Duration | Success Rate | |
|---|---|---|---|---|
Claude Opus 4.6 | Claude Code | 244.59s | 96% | |
Claude Opus 4.8 | Claude Code | 244.66s | 92% | |
Claude Sonnet 4.6 | Claude Code | 238.41s | 92% | |
Claude Opus 4.7 | Claude Code | 190.49s | 88% | |
GPT 5.3 Codex (xhigh) | Codex | 246.03s | 88% | |
Gemini 3.1 Pro Preview | Gemini CLI | 345.51s | 88% | |
Cursor Composer 2.5 | Cursor | 253.29s | 80% | |
Cursor Composer 2.0 | Cursor | 221.47s | 80% | |
Cursor Composer 1.5 | Cursor | 201.57s | 80% | |
Gemini 3 Pro Preview | Gemini CLI | 365.91s | 76% | |
GPT 5.4 (xhigh) | Codex | 313.65s | 72% | |
MiniMax M2.7 | OpenCode | 259.47s | 68% | |
Claude Sonnet 4.5 | Claude Code | 225.92s | 60% | |
Devstral 2 | OpenCode | 213.26s | 36% |