API Reference

Python tools for running your own benchmarks.

Installation

pip install benchmarkmd

Benchmark Class

from benchmarkmd import Benchmark

# Initialize
benchmark = Benchmark(agent="claude-code")

# Run a task
results = benchmark.run(task="Create a Python web app")

# Get metrics
print(results.quality_score)  # 0-100
print(results.cost)           # in USD
print(results.time_seconds)   # execution time

Supported Agents

  • claude-code - Anthropic Claude Code
  • cursor - Cursor IDE
  • devin - Cognition Devin
  • copilot - GitHub Copilot
  • bolt - Bolt.new

Methods

MethodDescription
run(task)Execute a benchmark task
measure_cost()Calculate API costs
analyze_quality()Score output quality
compare(agents)Compare multiple agents

Configuration

benchmark = Benchmark(
    agent="claude-code",
    model="claude-3-5-sonnet-20241022",
    max_tokens=4000,
    temperature=0.7,
    timeout=300
)

Environment Variables

BENCHMARKMD_API_KEY=your_api_key
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...

Example: Compare Agents

from benchmarkmd import compare

results = compare(
    agents=["claude-code", "cursor", "copilot"],
    task="Build a React todo app"
)

for agent, result in results.items():
    print(f"{agent}: {result.quality_score}/100")