Natural language processing (NLP) software correlated key characteristics expressed verbally in mammography reports with pathologic findings, providing an accurate indication for biopsy referral, a study published in Cancer has shown.1

More than 12.1 million mammographies are performed each year in the United States, prompting more than 1.6 million breast biopsies. However, the Centers for Disease Control and Prevention (CDC) estimates that 50% of mammographies yield false-positive results, with approximately 20% of resulting biopsies obtained from cancer-free breasts. 

To address this issue, a team at Houston Methodist Cancer Center developed natural language processing (NLP) software algorithms to predict risk of breast cancer based on characteristics identified in mammography reports.

Continue Reading

The researchers searched the Methodist Environment for Translational Enhancement and Outcomes Research (METEOR) for mammograms obtained between January 2006 and May 2015 with breast imaging reporting and data system (BI-RADS) category 5 readings and available pathology reports.

The NLP algorithm was used to extract imaging findings from free text reports. One-way analysis of variance and the Fisher exact test were used to analyze the correlation between the imaging study features and pathologic breast cancer subtype.

The researchers pulled reports for 543 patients based on key characteristics in a free text search that met the inclusion criteria. Using the NLP technique to correlate the mammogram findings with the pathology reports on breast cancer subtype revealed that ER-positive tumors were more likely to show speculated margins (P =.0008) on mammography, and images of tumors that overexpressed HER2 were more likely to show heterogeneous calcifications (P =.0078) and pleomorphic calcifications (P = .0002). 

These results demonstrate that applying the NLP technique to mammography reports can determine potential risk of breast cancer and need for biopsy referral, potentially reducing the number of biopsies obtained from cancer-free breasts. 


1. Patel TA, Puppala M, Ogunti RO, et al. Correlating mammographic and pathologic findings in clinical decision support using natural language processing and data mining methods. Cancer. 2016 Aug 29. doi: 10.1002/cncr.30245. [Epub ahead of print]