Researchers from the Technical University of Munich in Germany have developed and validated a deep-learning algorithm that accurately differentiates colon cancer from acute diverticulitis on computed tomography (CT) images.
They write in JAMA Network Open that the deep-learning model “may improve the care of patients with large-bowel wall thickening” when used as a support system by radiologists.
Lead author Sebastian Ziegelmayer and colleagues explain that it is currently difficult to differentiate between colon cancer and acute diverticulitis on contrast enhanced CT images because the two conditions often share morphologic features such as bowel wall thickening and enlarged local lymph nodes.
Yet, correct differentiation of the two conditions has major clinical implications as their management may vary substantially; colon cancer requires oncologic resection of the diseased bowel and the entire lymph node basin, whereas a limited resection may be sufficient in acute diverticulitis.
“A high level of certainty in surgical planning improves patient stratification and thus limits postoperative complications and potentially decreases mortality rates,” Ziegelmayer and co-authors write.
In recent years, deep-learning algorithms have been successfully applied to other areas of radiology, such as breast and lung cancer detection as well as gastrointestinal intestinal imaging. However, for colon cancer, the models have generally been developed for use in histopathology and endoscopy and not CT images.
To address this, Ziegelmayer and team used CT images from 585 patients (mean age 63 years, 58% men) with histopathologically confirmed colon cancer (n=318) or acute diverticulitis (n=267) to develop a 3-D convolutional neural network—a type of deep-learning algorithm that predicts an outcome based on the input data—for differentiating between the two groups.
The majority (74.4%) of the images were used to train the algorithm, with 15.4% used for validation and the remaining 10.2% comprising the test set. The test set was used to compare the algorithm’s performance with that of 10 radiologists with different levels of experience (three radiology residents with <3 years’ experience, four radiology residents with ≥3 years’ experience, and three board-certified radiologists).
The investigators report that the deep-learning algorithm correctly classified the test set images as colon cancer rather than diverticulitis with a sensitivity of 83.3% and specificity of 86.6%.
By comparison, the mean reader sensitivity and specificity for all 10 readers combined were 77.6% and 81.6%, respectively, increasing to 85.5% and 86.6% for the board-certified reader group only. Among the residents, mean sensitivity was 74.2% and mean specificity was 84.2%.
Following their initial image classifications, the readers were presented with the algorithm’s prediction, i.e., the probability of a colon cancer or diverticulitis diagnosis, and were allowed to change or keep their initial assessment for each case. They were not aware of the model’s sensitivity or specificity at that time.
When taking the deep-learning prediction into consideration, mean sensitivity and specificity for the combined reader group increased significantly to 85.6% and 91.3%, respectively.
The algorithm boosted performance significantly regardless of experience, but the greatest improvements occurred among the radiology residents. In this group, sensitivity improved by 9.6 percentage points and specificity improved by 7.2% percentage points. For the board-certified radiologists, improvements were a corresponding 4.5 and 4.7 percentage points.
Put differently, without an AI support system, the false-negative rate was 22.4% for all readers, 25.8% for the residents, and 14.5% for the board-certified radiologists. Artificial intelligence support led to substantial reduction in the false-negative rate, to 14.3%, 16.1%, and 10.0%, respectively.
Ziegelmayer et al conclude that their model “significantly increased the diagnostic performance of all readers, proving the feasibility of AI-supported image analysis” in this setting.
However, they caution that the model was trained and tested on data from a single institution and therefore may not be broadly generalizable.
They also note that the “proof-of-concept study only included the most common malignant and benign diagnoses for bowel wall thickening; in further studies the model should be adapted for malignant and benign entities in general.”
Finally, the authors suggest that multi-parametric data integration, including laboratory inflammatory markers, vital signs, and other symptoms, could improve the model and should be included in further projects.