GammaMetric — CT AI Validation Case Study

This applies to your site if:

You've deployed a CT detection AI

Lung nodule, pulmonary embolism, incidental finding — any model running on live scanner data that came with a vendor sensitivity claim.

Your protocols aren't standardized across scanners

Mixed kernel selections, variable slice thickness, or protocols that have drifted since the scanner was installed — common at multi-scanner sites.

You're evaluating a vendor update or new algorithm

Model updates reset the performance baseline. Without independent measurement, you have no way to know whether performance improved, degraded, or shifted by subgroup.

01 Summary Statistics

The Performance Gap Hidden in Plain Sight

Vendor benchmarks are measured under controlled acquisition protocols. Your scanners don't run under controlled acquisition protocols. This is what the gap looks like.

84.8%

Baseline sensitivity under reference protocol

Standard thin-slice, standard kernel

−13.2pp

Sensitivity drop at 5mm slice thickness

vs. thin-slice baseline

−10.5pp

Sensitivity drop with soft reconstruction kernel

Single parameter change

~−4pp

Sensitivity impact from dose reduction alone

Lowest-risk perturbation tested

Critical Finding

Effects were most pronounced in the 3–6mm nodule size range — the exact range where clinical detection decisions carry the most ambiguity and where missed findings have the most downstream consequence.

02 Performance Profiles

Sensitivity by Condition & Nodule Size

Sensitivity by Imaging Condition

Absolute sensitivity (%) per perturbation relative to reference baseline. Y-axis begins at 55% — all conditions retain meaningful sensitivity; the chart shows relative magnitude of degradation, not proximity to zero.

Sensitivity by Nodule Size Group

Baseline vs. 5mm slice thickness, across all size cohorts. Values are directional, derived from published analysis. Effect is most pronounced in the 3–6mm range.

Imaging Condition	Sensitivity (%)	Δ from Baseline	Clinical Risk Flag
Baseline (thin-slice, standard kernel)	84.8%	—	REFERENCE
Dose Reduction (~25% mAs)	~80.8%	−4.0pp	LOW
Soft Reconstruction Kernel	74.3%	−10.5pp	HIGH
5mm Slice Thickness	71.6%	−13.2pp	HIGH
Combined: 5mm + Soft Kernel	~65–68%	−17–20pp est.	CRITICAL

Clinical Consequence

A 13-point sensitivity drop is not an abstract metric. In the 3–6mm nodule range — where these effects are largest — the AI flag rate falls, meaning cases the algorithm would have surfaced are not surfaced. In a workflow where radiologists rely on AI triage, that gap translates directly into increased miss risk for indeterminate nodules. Missed findings in that size range drive delayed diagnosis, upstaged disease at follow-up, and the defensibility exposure that follows. The physics finding and the clinical consequence are the same event.

03 Client Deliverables

What a GammaMetric Engagement Produces

Every engagement is structured around reproducible, documentable outputs — not a vendor sales conversation. These are the artifacts your team receives.

⬡

Protocol Sensitivity Mapping

Systematic performance profiling of your deployed AI across the acquisition conditions present in your actual scanner fleet — not a vendor reference configuration.

Sensitivity & FP rate per protocol variant
Slice thickness sweep
Kernel characterization
Dose condition testing
Scanner-specific baseline establishment

◈

Statistical Validation Report

A peer-review–grade analysis document using established statistical methodology. Suitable for regulatory response, accreditation, or QA documentation.

McNemar tests for paired comparisons
Confidence intervals per condition
Size-stratified subgroup analysis
Effect size and clinical significance framing
Pre-/post-update change detection

◎

Risk Stratification Summary

A concise, radiologist-readable summary flagging which protocol conditions in your workflow carry the highest detection risk — and what to do about them.

High/medium/low risk protocol flags
Recommended protocol guard rails
Vendor comparison if multiple systems deployed
Patient population impact estimate

▣

Performance Monitoring Dashboard

Ongoing drift detection for longitudinal AI monitoring — identifying when algorithm updates or scanner changes have altered performance from your established baseline.

Baseline sensitivity anchor point
Update-triggered re-validation protocol
Scanner fleet change tracking
Monthly/quarterly report cadence

⊕

CT Dose Protocol Review

Independent review of your CT acquisition protocols with respect to both dose optimization and AI detection performance — identifying where you're leaving sensitivity on the table unnecessarily.

DLP / CTDIvol benchmarking by scanner
SSDE-based dose assessment
Protocol-AI interaction mapping
Leapfrog compliance check (optional)

◻

Vendor Accountability Package

A structured data package for vendor contract negotiations, update accountability, and SLA documentation — grounded in your deployment's measured performance, not their marketing claims.

Vendor benchmark vs. site-measured delta
Update impact quantification
Performance regression documentation
Independent physicist attestation letter

04 Engagement Process

How a Validation Engagement Works

01

Site Assessment & Protocol Audit

Review of current scanner protocols, AI vendor documentation, deployed model version, and acquisition parameter ranges actually in use. No PHI required at this stage.

1–2 weeks

02

Phantom & Retrospective Data Collection

Structured data collection plan using site-appropriate phantom sets and de-identified retrospective cases. Collection protocol designed to avoid workflow disruption.

2–4 weeks

03

Statistical Analysis & Findings Generation

Systematic perturbation analysis, condition-by-condition sensitivity profiling, and statistical comparison across protocol variants. All analysis is independent of vendor input.

2–3 weeks

04

Report Delivery & Stakeholder Briefing

Delivery of full validation report, risk stratification summary, and optional presentation to radiology leadership, QA committee, or AI vendor representatives.

1 week

05

Ongoing Monitoring (Optional)

Retainer-based longitudinal monitoring for algorithm update tracking, protocol change impact assessment, and annual re-validation. Structured to meet emerging ACR and CMS AI oversight guidance.

Ongoing / Quarterly

05 Independence

Why Independent Validation Matters

Every AI vendor publishes performance numbers. None of them were generated on your scanners, with your protocols, on your patient population. GammaMetric has no equity stake in any AI vendor, no exclusivity agreements, and no financial incentive to certify anything.

On Vendor Benchmarks

A 13-point sensitivity drop from a single protocol parameter change is not an edge case — it is the predictable consequence of deploying a model trained on curated data into an uncurated environment. The benchmark is not a lie; it is simply not a measurement of your site. And no vendor is incentivized to measure that for you.

GammaMetric

✓ No vendor relationships
✓ ABR board-eligible physicist
✓ Published validation methodology
✓ Dozens of scanner installations across CT, MRI, and fluoroscopy
✓ Site-specific, reproducible analysis

Vendor Self-Reporting

✗ Financial interest in outcome
✗ Curated benchmark datasets
✗ Protocol conditions not disclosed
✗ Update impacts not measured
✗ Site-specific validation absent

What Your AI Misses After Deployment

The Performance Gap Hidden in Plain Sight

Sensitivity by Condition & Nodule Size

What a GammaMetric Engagement Produces

How a Validation Engagement Works

Why Independent Validation Matters