Google’s AI mammography system passes real-world tests in UK screening centers

Google’s AI mammography system passes real-world tests in UK screening centers

2 0 0

The UK’s NHS Breast Screening Programme is in a bind. It relies on two human readers per mammogram, plus arbitration when they disagree. That double-reading workflow is thorough, but with a 30% shortfall of clinical radiologists today — projected to hit 40% by 2028 — something has to give.

Google Research has been poking at this problem for years, and they just published two companion studies in Nature Cancer that put their AI system through its paces across five NHS screening services. The results are encouraging, though I’d stop short of calling it a slam dunk.

Study 1: Can the AI actually read mammograms?

The first study had two phases. Phase 1 was a retrospective evaluation of 125,000 women’s mammograms across five screening services. That’s a big dataset, and they didn’t just look at whether the AI found cancer. They tracked outcomes over 39 months to catch interval cancers — the ones that show up between screenings — and next-round cancers that the original human readers missed. That’s a much more honest test than just comparing against the initial read.

The AI system’s sensitivity and specificity were measured against the original first reader, and they also checked whether the AI correctly localized the abnormality in the breast. This matters because an AI that flags the right quadrant but points to the wrong spot isn’t much use in practice. The results showed the AI performed comparably to human readers, with some variation across screening services due to differences in local workflows and populations.

Phase 2 was a prospective, non-interventional deployment. They plugged the live AI system into real clinical workflows without changing how radiologists worked. This is the kind of study that actually tells you whether your software will survive contact with reality — appointment scheduling quirks, image format variations, radiologists ignoring alerts. The system held up, which is more than many AI tools can claim after a retrospective paper.

Study 2: AI as a second reader

The second study was an end-to-end reader study comparing the standard double-read-plus-arbitration process against a workflow where the AI replaced the second human reader. This is the scenario that could actually fix the staffing problem — let one radiologist and one AI handle each case, with arbitration only when they disagree.

They ran this on a subset of cases and measured cancer detection rates, recall rates, and workload. The AI-as-second-reader approach didn’t just match the human-only workflow; it slightly improved cancer detection while reducing the number of cases that needed arbitration. The workload reduction was substantial — fewer cases going to arbitration means radiologists spend less time on borderline calls and more time on cases that actually need their expertise.

I should note that this was a reader study, not a live deployment. The radiologists knew they were in a study, which always introduces some Hawthorne effect. But the results are consistent with what we’ve seen in smaller trials elsewhere.

What this means for the NHS

The NHS is not going to replace radiologists tomorrow. These studies are evidence that the technology can work at scale, but they don’t prove it works in prospective clinical practice with all the messiness of real patients, real schedules, and real human judgment calls. Google is careful to say “additional work is needed” — and they’re right.

What these studies do show is that the AI system is ready for the next step: a prospective interventional trial where the AI actually influences clinical decisions. If that pans out, the NHS could start deploying AI as a second reader in screening services that are struggling to staff their double-reading workflow. Given that some services already have to prioritize certain patient groups or delay screenings, this could be a practical lifeline.

My take

I’ve been watching medical AI for a decade now, and mammography screening is one of the few areas where the technology consistently delivers. The problem has never been whether AI can read mammograms — it can, and it’s been doing so for years. The problem is integration into clinical workflows, regulatory approval, and radiologist trust.

These studies address the workflow piece head-on. They tested across multiple sites with different protocols, which is exactly what you need to do before rolling out nationally. The 39-month follow-up is a nice touch — most studies settle for 12 months or less, which misses the slow-growing cancers that AI might catch earlier than humans.

The fairness analysis is also worth noting. They checked whether the AI performed differently across demographic groups, which is a common failure mode for medical AI systems trained on biased data. The results were reassuring, but I’d want to see independent audits before signing off.

If you’re a radiologist worried about your job, don’t be. This system is designed to be a second reader, not a replacement. The shortage of radiologists isn’t going away, and anything that lets existing staff handle more cases while maintaining quality is a win for everyone.

But let’s not get ahead of ourselves. The next step is a prospective trial where the AI actually changes patient management. That’s the hard part. These studies are a strong foundation, but the building isn’t finished yet.

Comments (0)

Be the first to comment!