Getting Started
Learn how to use BenchmarkMD to test AI agents.
Quick Start
- Go to the Benchmark Tool
- Select an AI agent
- Enter your task
- Click "Run Benchmark"
Understanding Results
Quality Score (0-100)
Measures code quality based on:
- Correctness
- Code style
- Security
- Best practices
Estimated Cost
Based on API pricing for the selected agent. Includes input + output tokens.
Execution Time
How long the agent took to complete the task.
Pricing
| Plan | Price | Features |
|---|---|---|
| Free | $0 | 3 benchmarks/day |
| Pro | $29/mo | Unlimited benchmarks + API access |
| Enterprise | Custom | Custom audits + support |
FAQ
How accurate are the benchmarks?
Our benchmarks use standardized tasks to ensure fair comparison across agents.
Can I benchmark my own tasks?
Yes! Enter any task in the text field. The more specific, the better the results.
Which agents can I test?
Currently: Claude Code, Cursor, GitHub Copilot, Devin, and Bolt.new.