The build went green by giving up
When a retry hides the bug it was meant to catch

Introduction
The pull request had a green check beside it, so it got merged. That is the whole point of a green check: it is permission to stop paying attention. Two days later the bug that check was supposed to catch took down a checkout flow for forty minutes, and someone finally went back to ask why the test had passed.
It had passed on the third attempt.
The suite was set to retry flaky tests up to three times before reporting failure. Reasonable on paper. Tests flake for dull reasons: a slow container, a port that isn't ready, a network hiccup in the runner. A retry smooths that noise so humans can trust the signal. Except this test was not flaking. It was failing correctly, about one run in three, because the bug it covered was itself intermittent. The retry could not tell the two apart. It treated a flaky test and a real-but-occasional bug the same way: run it again, and if green ever shows up, call the whole thing green.
A retry cannot tell a flaky test from an intermittent bug. It forgives both.
Green on the third try
So the runner did exactly what it was told. Red, shrug, again. Red, shrug, again. On the third try it caught the one-in-three pass, reported success, and the board went clean. The bug rode through the merge with a passing grade stapled to it.
The cruel part is that this failure mode gets worse the better it works. Every time a retry rescues a genuinely flaky test, the team trusts it a little more. The retry count creeps up. Three becomes five, because five keeps the board even greener. And a higher retry count is a wider net for precisely the bugs you most want to catch: the intermittent ones, the heisenbugs, the failures that only surface under load. You are quietly tuning the system to hide its most dangerous category of problem.
The net you tuned to catch nothing
The fix is not to rip out retries. Flaky tests are real, and blocking every merge on them is its own misery. The fix is to make the retry visible. Count it. A test that only passes on the second or third try is not a pass, it is a warning, and it belongs somewhere a human looks. This test needed two retries this week. This one needed three and is getting worse. A green check that quietly required three attempts is a different fact than one that passed first try, and most teams render the two identically.
Flakiness is not noise to be silenced. It is the system telling you which parts of itself are unstable, and the retry button is, very politely, dropping that message in the bin. The instinct to make the board green is the right instinct pointed at the wrong target. You wanted confidence. You bought the appearance of it.
There is an honest version of green and a cosmetic one, and from across the room they look the same. The difference is whether anyone is counting what it cost to get there.
TensorLabs has spent more than one engagement deleting retry logic that turned out to be the only thing standing between a team and a bug they already had. Unglamorous work. Usually the cheapest forty minutes of downtime a team ever buys back.
You might also like
Keep reading from the journal.
June 22, 2026AI
Building Durable AI Agents with Cloudflare Project Think
On June 16, 2026, Cloudflare shipped Agents SDK v0.16.1 and with it Project Think, a durable runtime for long-running agents.
June 19, 2026AI
Build a Text-to-Image Generation Service with the Reve 2.0 API
On June 3, 2026, Reve 2.0 jumped to second place on the text-to-image Arena leaderboard. A leaderboard position is a benchmark, not a product.
June 19, 2026AI
When engagement is really confusion
A founder thought nobody used his analytics dashboard enough. The real bug was that it never told anyone what to do. Showing data cleanly used to be the hard part. Now anyone can wire numbers into a model over a weekend. The clean chart is table stakes. The answer is the product.