API Reference - BenchmarkMD

Installation

pip install benchmarkmd

Benchmark Class

from benchmarkmd import Benchmark

# Initialize
benchmark = Benchmark(agent="claude-code")

# Run a task
results = benchmark.run(task="Create a Python web app")

# Get metrics
print(results.quality_score)  # 0-100
print(results.cost)           # in USD
print(results.time_seconds)   # execution time

Supported Agents

claude-code - Anthropic Claude Code
cursor - Cursor IDE
devin - Cognition Devin
copilot - GitHub Copilot
bolt - Bolt.new

Methods

Method	Description
`run(task)`	Execute a benchmark task
`measure_cost()`	Calculate API costs
`analyze_quality()`	Score output quality
`compare(agents)`	Compare multiple agents

Configuration

benchmark = Benchmark(
    agent="claude-code",
    model="claude-3-5-sonnet-20241022",
    max_tokens=4000,
    temperature=0.7,
    timeout=300
)

Environment Variables

BENCHMARKMD_API_KEY=your_api_key
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...

Example: Compare Agents

from benchmarkmd import compare

results = compare(
    agents=["claude-code", "cursor", "copilot"],
    task="Build a React todo app"
)

for agent, result in results.items():
    print(f"{agent}: {result.quality_score}/100")