A machine-learning tool was able to automatically determine a patient’s metastatic status from unstructured electronic health record data. These findings were published in JCO Clinical Cancer Informatics.

To expedite the process of selecting candidates for clinical trials, researchers from Flatiron Health Inc in New York developed a machine-learning tool that automatically extracted snippets from health record data to classify patients into 5 metastatic categories.

The training set for the model comprised 66,532 records of patients who were highly likely (7964) or likely (9543) to have metastatic disease, highly likely (40,111) or likely (8207) to have nonmetastatic disease, or had unknown status (707).

Continue Reading

The developed model had an overall sensitivity of 82.4% and overall specificity of 95.5%; positive predictive value was 89.3% and negative predictive value was 94.0%.

The validation data comprised 200 patients with breast cancer (144), non-small cell lung cancer (23), colorectal cancer (15), prostate cancer (7), bladder cancer (4), renal cell carcinoma (4), and melanoma (3).

With the validation cohort, and after user review, the model had a sensitivity of 97.1%, specificity of 98.2%, positive predictive value of 91.9%, and negative predictive value of 99.4%.

Staff at the clinic indicated this automated tool reduced the volume of charts they needed to screen for full trial eligibility.

This model was limited by focusing on metastatic status. Additional models will be needed to include other eligibility criteria.

This machine-learning approach to screen unstructured electronic health record data for selecting candidates to recruit for clinical trials was found to be highly accurate among the majority of patients and significantly reduced the time-consuming process of manually screening health records for clinic staff.

Disclosure: Some study authors declared affiliations with biotech, pharmaceutical, and/or device companies. Please see the original reference for a full list of authors’ disclosures.


Kirshner J, Cohn K, Dunder S, et al. Automated electronic health record-based tool for identification of patients with metastatic disease to facilitate clinical trial patient ascertainment. JCO Clin Cancer Inform. 2021;5:719-727. doi:10.1200/CCI.20.00180