A new artificial intelligence system developed by Google can decide when to trust AI-based decisions about medical diagnoses and when to refer to a human doctor for a second opinion. Its creators claim it can improve the efficiency of analysing medical scan data, reducing workload by 66 per cent, while maintaining accuracy – but it has yet to be tested in a real clinical environment.
The system, Complementarity-driven Deferral-to-Clinical Workflow (CoDoC), works by helping predictive AI know when it doesn’t know something – heading off issues with the latest AI tools that can make up facts when they don’t have reliable answers.
It is designed to work alongside existing AI systems, which are often used to interpret medical imagery such as chest X-rays or mammograms. For example, if a predictive AI tool is analysing a mammogram, CoDoC will judge whether the perceived confidence of the tool is strong enough to rely on for a diagnosis or whether to involve a human if there is uncertainty.
In a theoretical test of the system conducted by its developers at Google Research and Google DeepMind, the UK AI lab the tech giant bought in 2014, CoDoC reduced the number of false positive interpretations of mammograms by 25 per cent.
Read more:
Medical AIs are advancing – when will they be in a clinic near you?
CoDoC is trained on data containing predictive AI tools’ analyses of medical images and how confident the tool was that it accurately analysed each image. The results were compared with a human clinician’s interpretation of the same images and a post-analysis confirmation via biopsy or other method as to whether a medical issue was found. The system learns how accurate the AI tool is in analysing the images, and how accurate its confidence estimates are, compared with doctors.
Sign up to our Health Check newsletter
Get the most essential health and fitness news in your inbox every Saturday.
Sign up to newsletter
It then uses that training to judge whether an AI analysis of a subsequent scan can be trusted, or whether it needs to be checked by a human. “If you use CoDoC together with the AI tool, and the outputs of a real radiologist, and then CoDoC helps decide which opinion to use, the resulting accuracy is better than either the person or the AI tool alone,” says Alan Karthikesalingam at Google Health UK, who worked on the research.
The test was repeated with different mammography datasets, and X-rays for tuberculosis screening, across a number of predictive AI systems, with similar results. “The advantage of CoDoC is that it’s interoperable with a variety of proprietary AI systems,” says Krishnamurthy “Dj” Dvijotham at Google DeepMind.
It is a welcome development, but mammograms and tuberculosis checks involve fewer variables than most diagnostic decisions, says Helen Salisbury at the University of Oxford, so expanding the use of AI to other applications will be challenging.
“For systems where you have no chance to influence, post-hoc, what comes out the black box, it seems like a good idea to add on machine learning,” she says. “Whether it brings AI that’s going to be there with us all day, every day for our routine work any closer, I don’t know.”
Journal reference:
Nature Medicine DOI: 10.1038/s41591-023-02437-x