Health Score Alerts: Know Before Your Test Suite Degrades

Test suites don’t break overnight. They decay. A test starts flaking on Tuesdays. Pass rate drops from 98% to 95% over two weeks. Three new flaky tests appear across unrelated PRs. No single build triggers alarm bells, but collectively your suite is getting worse. This is the problem health score alerts are designed to catch.

The problem with per-build alerts

Most CI notification setups work like this: a build fails, Slack gets a message, someone investigates. This is reactive monitoring. It answers the question “did this specific build pass?”

That’s fine for catching regressions introduced by a single commit. It’s useless for catching slow decay.

Consider a suite with 200 tests. Pass rate was 99% a month ago. Today it’s 93%. That happened across dozens of builds on multiple branches. No individual failure looked unusual. Each build had one or two failures — below the threshold most teams set for investigation.

Per-build alerts didn’t fire because no single build was bad enough. The suite degraded anyway.

Health score alerts

Gaffer computes a health score for each organization every 4 hours. The formula weighs three factors:

Pass rate (60% weight) — the most direct measure of suite quality
Flaky test percentage (30% weight, inverse) — flaky tests erode trust even when builds pass
Trend direction (10% weight) — whether things are getting better or worse

The score maps to labels:

Score	Label
90-100	Excellent
75-89	Healthy
50-74	Needs Attention
25-49	At Risk
0-24	Critical

Health score alerts fire when the label crosses a boundary in the degradation direction. If your organization drops from “Healthy” (82) to “Needs Attention” (71), you get an alert. If it goes from “Needs Attention” (71) back to “Healthy” (78), you don’t — that’s an improvement, not a problem.

This is the key distinction. Per-build alerts react to individual events. Health score alerts detect trends that span many builds.

New flaky test alerts

The second alert type catches newly flaky tests. Between each analytics computation, Gaffer compares the current flaky test list to the previous one. If new tests appear — tests that started flipping between pass and fail since the last check — an alert fires with the specific test names and their flip rates.

This matters because flaky tests are easiest to diagnose when they’re fresh. A test that started flaking three hours ago probably correlates with a recent deployment or dependency change. A test that’s been flaking for two weeks could be anything.

The alert includes up to 10 new flaky tests, sorted by flakiness score, so you’re looking at the worst offenders first.

Noise reduction

Alerts are only useful if you read them. Two mechanisms keep the volume low.

Label boundaries, not score changes. You won’t get an alert every time the score fluctuates by a point. The score has to cross from one label to another. An organization at 91 dropping to 88 is a 3-point drop, but it’s still “Healthy” — no alert. An organization at 76 dropping to 74 crosses from “Healthy” to “Needs Attention” — alert.

24-hour cooldowns. Each alert type (health transitions and new flaky tests) has an independent 24-hour cooldown per organization. If your suite is in freefall and crosses two label boundaries within a day, you’ll get one alert for the first transition. The cooldown prevents alert storms during genuinely bad periods, when the last thing you need is more noise.

The cooldown is set before the notification is queued, not after. This is deliberate — if the queue delivery fails, the team misses one alert rather than receiving duplicates when the process retries.

Setting it up

Health score alerts go to Slack channels and webhook endpoints. GitHub is not a destination for health alerts — those are scoped to individual test runs and coverage reports.

Slack

Connect Slack from your organization’s notification settings. Select the channels where health alerts should post. The message includes the label transition, the score change, current pass rate, flaky test count, and trend direction — enough context to decide whether to investigate immediately or schedule time for it.

A health label transition message in Slack looks like:

Health Score Alert: acme-corp changed from Healthy to Needs Attention (82 → 71)

Score:      82 → 71
Pass Rate:  93%
Flaky Tests: 7
Trend:      ↓ down

Webhooks

For custom integrations, configure a webhook endpoint. Health alerts send a health_label_transition event; flaky test alerts send a new_flaky_tests event. Both include the full payload — scores, labels, test names, flip rates — so you can route to PagerDuty, Datadog, or whatever your team uses.

What this looks like in practice

Monday: Your project health score is 88 (“Healthy”). A developer merges a PR that adds a flaky test. The next analytics computation detects it and sends a new flaky test alert to Slack: should render dashboard is flipping at a 40% flip rate. A teammate sees it, checks the recent commits, finds the culprit, and adds a waitFor call. Fixed before anyone else notices.

Wednesday: Two more flaky tests appear in a different part of the codebase — a timing issue in the integration test suite. New flaky test alert fires. The team creates a ticket, but other priorities win.

Friday: Pass rate has drifted down to 91%. The three unresolved flaky tests, combined with a few legitimate failures on a feature branch, push the health score from 82 to 72. Label crosses from “Healthy” to “Needs Attention.” Health alert fires.

The team now has three pieces of information: the score dropped, they know which flaky tests contributed (from the earlier alerts), and the trend is pointing down. They can dedicate Monday morning to stabilization instead of discovering the problem two weeks later during a release freeze.

Health score alerts bridge the gap between per-build CI notifications and manually checking test dashboards — automated trend monitoring that flags degradation before it compounds.

Alerts are available on all Gaffer plans. Configure them in your organization’s notification settings. For more on how flaky tests degrade your suite over time, see How Much Are Flaky Tests Costing You?.