A post-deployment validation analysis of a commercial CT lung nodule detection algorithm under six real-world imaging perturbations — the conditions your vendor never tested for.
Vendor benchmarks are measured under controlled acquisition protocols. Your scanners don't run under controlled acquisition protocols. This is what the gap looks like.
Effects were most pronounced in the 3–6mm nodule size range — the exact range where clinical detection decisions carry the most ambiguity and where missed findings have the most downstream consequence.
| Imaging Condition | Sensitivity (%) | Δ from Baseline | Clinical Risk Flag |
|---|---|---|---|
| Baseline (thin-slice, standard kernel) | 84.8% | — | REFERENCE |
| Dose Reduction (~25% mAs) | ~80.8% | −4.0pp | LOW |
| Soft Reconstruction Kernel | 74.3% | −10.5pp | HIGH |
| 5mm Slice Thickness | 71.6% | −13.2pp | HIGH |
| Combined: 5mm + Soft Kernel | ~65–68% | −17–20pp est. | CRITICAL |
A 13-point sensitivity drop is not an abstract metric. In the 3–6mm nodule range — where these effects are largest — the AI flag rate falls, meaning cases the algorithm would have surfaced are not surfaced. In a workflow where radiologists rely on AI triage, that gap translates directly into increased miss risk for indeterminate nodules. Missed findings in that size range drive delayed diagnosis, upstaged disease at follow-up, and the defensibility exposure that follows. The physics finding and the clinical consequence are the same event.
Every engagement is structured around reproducible, documentable outputs — not a vendor sales conversation. These are the artifacts your team receives.
Every AI vendor publishes performance numbers. None of them were generated on your scanners, with your protocols, on your patient population. GammaMetric has no equity stake in any AI vendor, no exclusivity agreements, and no financial incentive to certify anything.
A 13-point sensitivity drop from a single protocol parameter change is not an edge case — it is the predictable consequence of deploying a model trained on curated data into an uncurated environment. The benchmark is not a lie; it is simply not a measurement of your site. And no vendor is incentivized to measure that for you.