There's a story you tell yourself when you write tests for code you already trust: this is preventive — the code already works; the tests just protect it from later regressions. That was true for ninety-something of the 98 unit tests added this week. The other two found bugs that had been in production all along.
The first was rounding. `scripts/capture-npc.mjs` derives a hero NPC's box collider half-extents from its splat-scene AABB, rounded to three decimal places. The test built an AABB whose half-extent landed exactly on 2.5315 — a number that, written down, you'd round up to 2.532 by the half-up convention every schoolchild learns. JavaScript doesn't. `Number((2.5315).toFixed(3))` returns the string "2.531" and then 2.531 the number. The reason is floating point: 2.5315 stored in a double is actually 2.531499999…, and `toFixed` rounds that toward zero. Every collider half-extent the script has ever produced on a value with that exact representation has been one millimetre smaller than the author thought. Pinned the actual behaviour in the test with a guardrail comment so a future refactor that swaps to a half-up rounder doesn't silently grow every collider by 1 mm and re-trigger physics tunings.
The second was the locale. `/compare`'s HUD reads `[80,500 splats / 1,024 KB / 234 ms]`. The format string passes through `Number.prototype.toLocaleString()`, which respects the runtime locale. The first test assertion expected `"1,234,567 splats"` — en-US thousands grouping. The browser returned `"12,34,567 splats"` — lakh grouping, the Indian-numbering-system separator at the thousand AND lakh positions. Both formats are correct per their locale; both are weird in the wrong context. The dev box lives in en-IN; Cloudflare Pages serves the runtime locale of whoever loads the page. Any number on a polished surface — the title-screen build chip, marketing copy, press-kit boilerplate — needs to pin `'en-US'` explicitly. Internal tools like `/compare` are allowed to follow the visitor's locale; investor-screenshot surfaces are not. The test now matches the separator property with a regex (`\d[,. ]\d`) instead of pinning a style, so a CI runner with yet another locale doesn't flake the suite.
Neither bug is bad. A 1 mm collider error doesn't matter for a chair the player drives past at 30 mph. A press page revenue figure rendered with the wrong separator wouldn't ship a v1 with it. But the patterns are interesting: both bugs were in code that worked, that had been written carefully, that had been read by humans without anyone noticing the wrong thing. The tests didn't catch them because the tests were testing for them — they caught them because the tests had to choose an exact expected value, and choosing forced the question.
That's the part of unit testing that doesn't show up in the discussion about coverage percentages. You write the test, you write an assertion, the assertion fails, and then you look closely at the output for the first time. That looking is where the bug surfaces. The test is a forcing function for paying attention, not a separate piece of code that mechanically checks the first piece.
The ninety-six tests that didn't catch anything aren't wasted, either — they're the price of having the two that did. You can't write the locale test in isolation; you write a battery of them and one of them happens to be the one that surfaces. The hit rate doesn't matter once the floor is "every test runs in milliseconds." 98 tests, 5.1 seconds, `pnpm test:units`.