API Reference
Python tools for running your own benchmarks.
Installation
pip install benchmarkmdBenchmark Class
from benchmarkmd import Benchmark
# Initialize
benchmark = Benchmark(agent="claude-code")
# Run a task
results = benchmark.run(task="Create a Python web app")
# Get metrics
print(results.quality_score) # 0-100
print(results.cost) # in USD
print(results.time_seconds) # execution timeSupported Agents
claude-code- Anthropic Claude Codecursor- Cursor IDEdevin- Cognition Devincopilot- GitHub Copilotbolt- Bolt.new
Methods
| Method | Description |
|---|---|
run(task) | Execute a benchmark task |
measure_cost() | Calculate API costs |
analyze_quality() | Score output quality |
compare(agents) | Compare multiple agents |
Configuration
benchmark = Benchmark(
agent="claude-code",
model="claude-3-5-sonnet-20241022",
max_tokens=4000,
temperature=0.7,
timeout=300
)Environment Variables
BENCHMARKMD_API_KEY=your_api_key
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...Example: Compare Agents
from benchmarkmd import compare
results = compare(
agents=["claude-code", "cursor", "copilot"],
task="Build a React todo app"
)
for agent, result in results.items():
print(f"{agent}: {result.quality_score}/100")