TRIzol br Accurate classification of newly diagnosed prostate
Accurate classification of newly diagnosed prostate cancer patients into low- and high-risk at a tertiary care center, where second opinion patients diagnosed outside of the center and patients with complex histories, multiple comorbidities and advanced disease are common is a challenge for any automated data extraction pipeline. Accurately clas-sifying information on bone scan receipt in EHRs is challenging and requires the fusion of heterogeneous data and the development of dif-ferent data methods.
In this TRIzol study, we classified prostate cancer patients into risk cate-gories and assessed adherence to guideline recommendations on the need for a bone scan using both structured and unstructured EHR data. We compared the results of an NLP rule-based model and a deep learning model. We measured adherence to both the NCCN and AUA guidelines for avoidance of bone scan for staging in low-risk patients (overuse) and use of a bone scan for staging in high-risk patients (un-deruse). We demonstrated the utility of gathering multiple data sources captured in diverse formats to assess the efficient and effective use of bone scans for cancer staging among prostate cancer patients.
A graphical outline of our methods to detect the bone scan use with
structured and unstructured data from EHRs, can be found in Fig. 1.
Patients were identified in a prostate cancer clinical data ware-house, which is described in detail elsewhere.  In brief, data were collected from a tertiary-care academic medical center using the Epic EHR system (Epic Systems, Verona, WI) and managed in an EHR-based relational database. Patients were linked to an internal cancer registry and the California Cancer Registry (CCR) to gather additional in-formation on treatments outside the institute, recurrence and survival. This study received the approval from the institute’s Institutional Re-view Board (IRB).
The study included patients diagnosed with prostate cancer between January 1, 2008 and December 31, 2017. We excluded patients not receiving primary treatment at our medical center and those missing clinical stage, PSA, and Gleason score. PSA is a serum biomarker pro-tein that identifies patients at risk for prostate cancer. For men who have prostate cancer, serum PSA level is associated with prognosis and is used in risk classification. Gleason score is a prognostic grading system that is assigned by a pathologist on prostate cancer tissue samples that is also used in risk classification. Patients were also ex-cluded if Housekeeping (constitutive) genes did not have a clinical note in the EHR prior to their primary treatment. Patient and clinical demographics were captured at the time of diagnosis. As guidelines recommend bone scan use after diagnosis and before first treatment, we restricted the data capture procedures to documentation between these dates.
2.3. Risk classification
The NCCN and AUA guidelines classify patients into different groups according to their risk of developing prostate cancer: high risk, inter-mediate/unfavorable risk, intermediate/favorable risk, and low risk. These classifications are based on clinical tumor stage, PSA value and pre-treatment biopsy Gleason score (Table 1). NCCN guidelines classify patients into several categories: very low, low, favorable intermediate, unfavorable intermediate, high, very high, regional, and metastatic. The categories regional and metastatic are not applicable to this study. We collapsed the NCCN categories into Low- and High-risk groups, since these had historically been used to determine whether a bone scan
Fig. 1. Illustration of our approach to detect if patients underwent a bone scan.
Risk classification groups and inclusion criteria by prostate cancer clinical guidelines.
Guidelines Risk group Criteria Number of patients
NCCN High risk Cancer stage T3 or T4 1047
Gleason score ≥ 8
AUA High risk