Over the last couple of years, there has been much discussion about the benefits of artificial intelligence (AI) for improving healthcare. But how much of this is true and how much simply hype? Is the technology really a godsend to radiologists and other healthcare professionals, or is it making their lives more difficult?
There is no doubt that AI-based image recognition technology has improved enormously in recent years. Many researchers and companies are now working on different types of programs with a view to improving speed, accuracy and costs of cancer screening. Some of these are at early stages, while others are more advanced like Paige.AI’s CE marked ‘digital pathology’ breast and prostate cancer software programs, the latter of which was approved by the FDA this week.
Carla Leibowitz, Chief Business Development Officer at Paige, says: “AI can act as a second pair of eyes for pathologists, helping them see spots of cancer on a microscope slide that they may miss with the naked eye. This can translate into better accuracy and ultimately better insights for patients.”
John Shepherd, a professor and researcher at the University of Hawaii, has a special interest in using AI for improving breast cancer prediction. Earlier this month, he published a paper in the journal Radiology, along with his team, showing that deep learning AI technology can help improve prediction of breast cancer risk when added to clinical risk factors.
The team used a dataset of images from 25,000 mammograms from 6,369 women who were screened for breast cancer. Of these women, 1,600 went on to develop screening-detected breast cancer and 351 developed interval invasive breast cancer.
Notably, the AI improved risk prediction for screening-detected breast cancer, but not interval invasive breast cancer.
“This surprised us,” Shepherd told Clinical Omics. “Breast density, or how opaque the breast tissue was in the mammogram was most predictive of interval cancer risk. We thought AI would pick up on some sort of texture in the breast…it didn’t find a signal like that.”
Shepherd believes AI can have a massive benefit for cancer screening, particularly in resource poor settings. “An AI algorithm can learn from a much larger library than a radiologist can. In some cases, a million images or more. Once trained, the AI could be replicated at really no additional cost except the hardware to run it, while replicating a trained radiologist takes about 8 years of medical school and residency. Thus, it could solve the endemic shortage of highly trained radiologists, which is very severe here in Hawaii, for example,” he told Clinical Omics.
Although he strongly believes in the technology, he cautions that “most studies done to date have reported great promise, but it is early days.”
Indeed, another study published in the BMJ earlier this month casts doubt on the current accuracy of AI programs for breast cancer screening.
Sian Taylor-Phillips, a professor at the University of Warwick in the UK, and colleagues carried out a systematic review of the accuracy of using AI for image analysis in breast cancer screening programs. Overall, 12 studies were included in the analysis, including 131,822 women.
The researchers found that 34 (94%) of 36 AI systems used were less accurate than a single radiologist and all 36 were less accurate than a group of two or more radiologists. Three of the studies included did show that around half of low-risk women could be screened out of additional testing, but (concerningly) also screened out a small number of cancers that were picked up by pathologists (up to 10%).
“AI has a lot of potential for use in interpreting medical images, but it’s really important we invest enough time and resources in developing the technology, and fully testing it before using it in practice on patients,” Taylor-Phillips told Clinical Omics.
“We found that this technology is not yet ready for widespread implementation. We need more research evidence, for example measuring how accurate AI is in combination with radiologists in screening practice, rather than in separate test sets.”
Leibowitz agrees that adequate testing is important. “Not all AI systems perform equally, and not all studies are designed well enough to test their effect on diagnostic decisions. Good study design includes curating datasets with diverse data, robust ground truth and challenging cases.”
In the late 90’s, the FDA approved computer-aided diagnosis in mammography. While not as advanced as current AI-based programs, there was a lot of hope and hype about how this technology could improve diagnostics at the time. However, it was not long before the usefulness of the technology came into question, as it was slow and often less accurate than initially promised. Taylor-Phillips says it is important that the field doesn’t make the same mistakes with AI-based diagnostics.
“Research so far suggests that many AI systems produce too many false alarms, indicating women may have cancer when they do not,” she notes.
An important point to bear in mind with AI-based diagnostics is how they will be used in the field. For example, AI could be used in a number of different ways for cancer screening: for pre-screening to remove low cancer risk patients, to replace radiologist readers entirely, to replace one radiologist in a two (or more) reader configuration, to help radiologists with diagnostic decision making, or to add an additional level of screening after standard radiology assessment. All of these need a slightly different approach to AI technology design and testing.
“It’s imperative to thoroughly test AI systems’ performance and generalizability in datasets that are comparable to what these products would encounter in the real world: different scanners, populations and site-to-site variations. It is also very important to define how these systems are intended to be used, and to study (not just estimate) how diagnosticians perform with and without them,” says Leibowitz. For example, Paige’s prostate-cancer detection system is currently being tested in a study being led by the University of Oxford in the UK.
“As multiple Paige Prostate studies have shown, pathologists rarely ever call false positives. On the other hand, they sometimes do miss cancer. Therefore, an AI product that rarely misses cancers, and perhaps has a few false positives, can help pathologists not miss anything, and they can easily discard the false positives. We have shown that the pathologist working with Paige Prostate produces better results… than either would alone,” she adds.
One of the problems with the original computer-aided diagnostic programs was that computer power was often not high enough and so the programs were not able to add much to what the radiologists could see with their own eyes. Since 2000, both computer power and AI-based image recognition technology have improved exponentially. An AI-based system is now able to spot patterns that are very difficult to see with the human eye alone.
However, while some AI-based diagnostic aids are at a level that can be used in the clinic, many are not yet at that stage.
“Larger prospective studies are needed and these are expensive,” says Shepherd. Access to large amounts of accurate imaging data is also important, particularly for systems that use deep learning to improve as they are used.
For use of AI-based diagnostics to become really widespread in cancer testing, as well as other areas of medicine more testing and development are needed. Open communication between developers with healthcare professionals, as well as the public, is also needed for widespread adoption.
“As AI continues to proliferate throughout the healthcare system, we must lead with data and transparency. That means continuing to engage in scientific discourse, rigorous validation, and publishing research so that the healthcare ecosystem is well informed about the benefits and potential shortcomings of the available solutions,” concludes Leibowitz.