Automated Identification of Radiotherapy Courses From US Department of Veterans Affairs Administrative Data #MMPMID41359899
Schreyer W; Melson R; Anderson C; Madison C; Katsoulakis E; Thompson RF
JCO Clin Cancer Inform 2025[Dec]; 9 (?): e2500088 PMID41359899show ga
PURPOSE: Radiotherapy is a critically important cancer treatment; however, its details are often not well represented in electronic health record data sets. US Veterans' radiation courses are further distributed across a range of medical centers, both internal and external to the Veterans Health Administration (VHA), inhibiting analysis of radiotherapy treatment across this population. METHODS: We train and test a suite of supervised machine learning models for the accurate prediction of radiation course dates using billing and diagnostic codes from a combination of VHA and Centers for Medicare and Medicaid Services (CMS) databases. We use a separate heuristic algorithm to assemble course date predictions into complete radiation treatments. RESULTS: Our top model predicts radiation course dates with compelling accuracy (macro-average of 0.914 across classes). The retrospective application of our model and assembly algorithm to radiation procedure dates for 1,331,342 patients identified 1,526,660 predicted courses of radiotherapy. CONCLUSION: The identified courses were collected into a shared resource to facilitate future VHA-based studies, and our predictive model is available for application to a wider range of non-VHA data sets, particularly those leveraging CMS data.
|*Electronic Health Records[MESH]
|*Neoplasms/radiotherapy/epidemiology[MESH]
|*Radiotherapy/methods[MESH]
|*United States Department of Veterans Affairs/statistics & numerical data[MESH]