Give Your AI Coding Tools Access to Your Test Results

AI coding tools can write code and run tests. But when tests fail, they’re working from raw terminal output — no history, no patterns, no context about whether a failure is new or has been happening for weeks.

They can’t tell you that a test has been flipping between pass and fail for the last 30 runs, or that 12 failures in your latest CI run all share the same root cause, or that the file you just changed has 0% test coverage.

That data exists. It’s just not connected to the tools writing your code.

MCP: Structured Data for AI Tools

MCP (Model Context Protocol) lets AI tools call external services through a standard interface. Gaffer’s MCP server exposes your test data as 16 read-only tools that any MCP-compatible editor can call.

The tools break down into five categories:

Test results — list_test_runs, get_test_run_details, get_report, get_report_browser_url

List recent runs, get individual test results with error messages, and retrieve report URLs.

Failure analysis — get_failure_clusters, get_test_history, compare_test_metrics

Group failures by root cause, check if a test has been failing before your changes, compare metrics between commits.

Flaky detection — get_flaky_tests, get_slowest_tests

Find tests with high flip rates (pass/fail oscillation) and tests that are getting slower over time.

Coverage — get_coverage_summary, get_coverage_for_file, get_untested_files, find_uncovered_failure_areas

Overall coverage metrics, per-file coverage, files below a threshold, and files that are both poorly covered and failing.

Project & status — list_projects, get_project_health, get_upload_status

List projects, get health scores, and check whether CI results have finished processing.

Setup

For Claude Code:

claude mcp add gaffer -e GAFFER_API_KEY=gaf_your_api_key -- npx -y @gaffer-sh/mcp

For Cursor or Windsurf, add to your MCP config:

{
  "mcpServers": {
    "gaffer": {
      "command": "npx",
      "args": ["-y", "@gaffer-sh/mcp"],
      "env": {
        "GAFFER_API_KEY": "gaf_your_api_key"
      }
    }
  }
}

You’ll need a Gaffer account and test results uploaded from CI. The upload setup takes one CI step.

Three Things It’s Good At

”Why did this test start failing?”

When a test fails and you’re not sure if your changes caused it, get_test_history returns the pass/fail record across recent runs:

// get_test_history
{
  "history": [
    { "status": "failed", "commitSha": "a1b2c3" },
    { "status": "failed", "commitSha": "d4e5f6" },
    { "status": "passed", "commitSha": "g7h8i9" },
    { "status": "passed", "commitSha": "j0k1l2" }
  ],
  "summary": { "passRate": 50.0, "totalRuns": 4 }
}

Two recent failures, two prior passes. The regression likely started at commit d4e5f6, not your change. Without this context, most AI tools would try to “fix” the test based on the error message alone.

A CI run with 15 failures looks bad. But get_failure_clusters groups them by error similarity:

// get_failure_clusters
{
  "clusters": [
    {
      "representativeError": "Connection refused: localhost:5432",
      "count": 11
    },
    {
      "representativeError": "Expected 200, received 403",
      "count": 4
    }
  ],
  "totalFailures": 15
}

Two root causes, not fifteen. The database isn’t running and there’s a permissions bug. An AI tool with this data fixes two problems instead of flailing at fifteen.

”What’s not covered?”

After writing a feature, find_uncovered_failure_areas identifies files that are both poorly tested and actively failing — the highest-risk targets:

// find_uncovered_failure_areas
{
  "riskAreas": [
    {
      "filePath": "server/services/billing.ts",
      "coverage": 12,
      "failureCount": 8,
      "riskScore": 704
    },
    {
      "filePath": "server/services/auth.ts",
      "coverage": 45,
      "failureCount": 3,
      "riskScore": 165
    }
  ]
}

Coverage percentage alone doesn’t tell you where to focus. Files with low coverage and frequent failures are where tests add the most value.

What It Doesn’t Do

The MCP server is read-only. It can’t modify your test results, trigger CI runs, or access your source code. It works with the structured data your CI pipeline already produces — JUnit XML, coverage reports, Playwright results — and answers questions about that data.

No source code leaves your machine. No secrets are involved beyond the API key. The tools are scoped to test analytics.

Get Started

The MCP server is @gaffer-sh/mcp on npm. Full setup docs are at /docs/mcp/.

Gaffer’s free tier includes test history and analytics with unlimited projects.

Gaffer