Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research
The purpose of this paper is to review the methods and dimensions of data quality assessment in regards to the secondary use of electronic health records (EHR) data for research. The goal is to develop a knowledge base on the methods used in establishing the suitability of EHR data for specific research goals. 
A literature review was performed on clinical research literature discussing data quality assessment methods for EHR. A search was performed using standard electronic bibliographic tools. Through iterative review of the data retrieved, broad dimensions of data quality and general categories of assessment strategies were derived.
The majority of literature reviewed focused on structured data alone (73%), or a combination of structured and unstructured data (22%). Five different dimensions of data quality were derived from the literature, which include: completeness, correctness, concordance, plausibility, and currency. Similarly, the common methods of data quality assessment were identified in seven categories, which includes: gold standard, data element agreement, element presence, data source agreement, distribution comparison, validity check, and log review.
Secondary use of clinical data is essential to research. Through examination of methods used by clinical researchers to investigate the quality and suitability of EHR data, it is shown that quality may be difficult to measure. Recommendations are given for researchers interested in the reuse of EHR data for clinical research. Researchers may consider the adoption of consistent taxonomy of EHR quality; remain aware of the task-dependence of data quality; integrate work on data quality assessment from other fields; and adopt systematic, empirically driven, statistically based methods of data quality assessment.
The reuse of EHR data is promising in the area of research, but problems arise with data quality derived from the EHR. This in turn necessitates the use of quality assessment methodologies to determine the suitability of these data for any given research.
Secondary use of health data is essential to the clinical research field as well as quality improvement and patient safety. That being said, it is imperative to ensure the quality of data derived from EHR, and whether it is suitable for the specific research goals. This paper looks at the assessment of EHR data quality and makes recommendations on what researchers should adopt with secondary use of EHR.
Introduction and objectives
Reuse of EHR data for clinical research has been a growing interest. However, there are concerns about the quality of data and their suitability clinical research. Studies have revealed the quality of EHR data is highly variable. Systematic methods for assessing the quality of EHR data for research are needed. This paper proposed a conceptual model of the dimensions of EHR data quality and summarized the methods that have been used to assess the quality of EHR data.
The author of this paper searched for literature for original studies that used data quality assessment methods and focused on data derived from EHR. They abstracted the features of data quality examined and the methods of assessment used. They derived dimensions of data quality and categories of assessment strategies described in the literature. This paper used an inductive, data-driven approach to propose a model for EHR data quality instead of imposing an existing model from other discipline.
From 27 unique data quality terms, five different dimensions of data quality were identified: completeness, correctness, concordance, plausibility and currency. Using similar strategy, seven categories of methods for data quality assessment identified are comparison with gold standards, data element agreement, data source agreement, distribution comparison, validity checks, log review and element presence. The mapping between dimensions of data quality dimensions and data quality assessment methods shows the strength of dimension and assessment relationship.
This examination of data quality dimensions and quality assessment methods reveals some patterns and gaps in knowledge such as inconsistency of data quality terminology lacking “gold standards” for data quality assessment, relying on intuitive understanding of data quality, and ad hoc methods for assessing data quality. To reuse HER data for clinical research, the clinical research community will need to adopt a consistent taxonomy of data quality, increase awareness of task dependence, adope data quality assessment methods from other fields and develop standardized systematic methods for data quality assessment.
The secondary use of EHR data is becoming common and promising. However, the problems with EHR data quality and inconsistency, lack of generalizability in methods for data quality assessment necessitate the researchers adopt validated, systematic methods of EHR quality assessment.
This is a nice summary for data quality issues of reusing EHR data for clinical research. This review not only studied the features and terms for EHR data quality from current literatures, it also studied the methods used for assess data quality. The 5 dimensions of data quality issues and seven categories for data quality assessment methods are well summarized. I found their recommendations for building taxonomy of data quality that will enable a structured discourse and contextualized assessment methodologies would be particularly helpful for making EHR data suitable for clinical research.
- Weiskopf, N. G., & Weng, C. (2013). Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. Journal of the American Medical Informatics Association, 20(1), 144-151.