Getting Started

Learn how to use BenchmarkMD to test AI agents.

Quick Start

  1. Go to the Benchmark Tool
  2. Select an AI agent
  3. Enter your task
  4. Click "Run Benchmark"

Understanding Results

Quality Score (0-100)

Measures code quality based on:

  • Correctness
  • Code style
  • Security
  • Best practices

Estimated Cost

Based on API pricing for the selected agent. Includes input + output tokens.

Execution Time

How long the agent took to complete the task.

Pricing

PlanPriceFeatures
Free$03 benchmarks/day
Pro$29/moUnlimited benchmarks + API access
EnterpriseCustomCustom audits + support

FAQ

How accurate are the benchmarks?

Our benchmarks use standardized tasks to ensure fair comparison across agents.

Can I benchmark my own tasks?

Yes! Enter any task in the text field. The more specific, the better the results.

Which agents can I test?

Currently: Claude Code, Cursor, GitHub Copilot, Devin, and Bolt.new.