Natural language processing in the electronic medical record
Automated Assessment of Adherence to Tobacco Treatment Guidelines
This study evaluates an automated system that abstracts data from the coded and free-text fields of the EMR to assess clinicians’ adherence to tobacco treatment guidelines. Even in state-of-the-art EMR systems, most data are entered as free-text narrative, and are therefore not amenable to currently available automated assessment methods. This study evaluated an automated EMR classifier (MediClass) that incorporates natural language processing techniques and handles both free-text and coded record data.
This test of the MediClass system applied evidence-based guidelines for delivering smoking-cessation treatments in primary care settings. This study compared the presence or absence of each of the 5A’s of smoking cessation care as assessed by human abstractors and by MediClass. The 5A’s of smoking cessation care include:
- Ask if the patient smokes
- Advise the patient to stop
- Assess the patient’s readiness to quit
- Assist the patient in quitting by providing information or medications
- Arrange for appropriate follow up care
A preliminary study was run that allowed human abstractors to try out the entire process on an initial set of 500 records. Significant differences among abstractors were noted and follow up training was conducted. For the final study, the four trained medical record chart abstractors coded 500 records, each representing a single primary care visit.
The human abstractors in this study had a difficult time consistently applying the 5A’s to all 500 records, despite careful training. MediClass agreed with the gold standard 91% if the time. Estimates of sensitivity were found to be 0.97, 0.68. 0.64, and 1.0, while for specificity, they were 0.95, 1.0, 0.96, and 0.82, respectively, across the four A’s (ask, assess, advise, and assist) for which measurement was possible.
This study demonstrates the feasibility of an automated coding system for processing the entire EMR, enabling automated assessment of smoking-cessation care delivery.
This is a well designed study. Significant effort was made in training the human abstractors. Even with this training, adjudication was still needed to establish a “gold standard” (thus it is more likely a silver standard). This coding task is difficult for people to agree on so I have to agree with the conclusion that MediClass is a practical alternative to expensive, inconsistent manual data abstraction.
Dr Hazelhurst, Dr Sittig, et al published a study that evaluated the accuracy of using MediClass [a Natural Language Processing (NLP) technology] within an electronic medical record system to assess clinician adherence to Tobacco Treatment Guidlelines. The electronic medical records of known smokers at 4 HMOs were analyzed for the presence or absence of each of the Five A’s of smoking cessation.
Only some of the 5A steps (e.g. identification of smoking status and prescriptions for smoking cessation) are codified data in the electronic medical record, other steps (e.g. assessment of readiness to change, provision of behavior change counseling) are entered within the free text clinician notes making NLP technology essential for assessing clinician adherence to tobacco CPG for treatment and prevention.
Initially, the research team developed a controlled set of clinical concepts for MediClass based on common phrases found in the free-text sections of the EMR and codes found in the structured sections of the EMR. Human coders also went through an extensive training program that concluded with a test. Once this initial design,training and development was completed, the researcher first had human coders and MediClass code 125 randomly selected electronic medical records from each of the four sites. This allowed researchers to further ‘train’ MediClass by assessing the disagreements between the two, enhancing MediClass and then rerunning the coding for accuracy. This also ensured new discrepancies were not introduced with these changes. These medical records were then removed from the study data. The remaining 500 medical records were then evaluated by MediClass and the human coders and are the final study results published in the article.
The researcher found that for the ‘Ask’ and ‘Assist’ steps where the data is most commonly codified, the mean agreement between the human coders was indistinguishable from the humans and MediClass (student’s t-test, p>0.05). For the ‘Assess’ and ‘Advise’ steps where the data is most commonly found in the free-text, the humans agreed more often with each other than they did with MediClass (student’s t-test, p<0.01). Arrange was coded to infrequently to by the human abstractors to be compared and was dropped from the analysis. The researcher then further analyzed the data by creating a ‘gold standard’ using the majority opinion of the humans and then computed the accuracy of MediClass against the ‘gold standard’. In this analysis. MediClass agreed with the gold standard 91%.
The researchers did a thorough job of analyzing, discussing and accounting for the lower specificity on ‘Assist’ and lower sensitivity on ‘Advise’ which were the result of 12 specific medical records. They went on to conclude that using a natural language processing application to automate the processing of the entire EMR to evaluate quality measures is both practical and also has the potential to provide high-quality measuring capabilities.
A very well designed study which acknowledged the important need for both codified and free-text data within currently available electronic medical record systems. They chose a relatively simply, but well defined clinical practice guideline as the basis for their study which included both codified and free-text data. They also used innovative and thoughful study design to overcome the misconception that free-text data in electronic medical records can not be successfully used to measure and assess adherence and/or quality. Though desired, it is not practical in the short-term that all data in electronic medical record system be codified. This study should encourage all of us to not see free-text data in our electronic medical record systems as a barrier to the measuring of health care performance, quality and outcomes. Though this smoking cessation CPG example may not be as complex as other the CPGs, I think their study design and findings are mostly generalizable and display the value of using well-designed NLP technology to measure health care quality.