EHR Data Quality

Assessment of data quality from electronic health record (EHR) ensures its “fitness for use” in secondary research.

Introduction

Electronic health record collects clinical data to improve patient care. In addition, EHR has many potential secondary uses such as billing, clinical quality assessment, and research. Since data are not collected systematically for EHR as for research, there are concerns about research’s validity and reproducibility when using EHR data.

Hierarchy of Data Quality Assessment

In general, individual EHR data constructs, or dimensions, are assessed for its veracity (verification) and validity.^[1] Veracity evaluates internal validity of data such as conformance to metadata constraints and local knowledge. External validity assesses data against external gold standards or knowledge.

Wang and Strong proposed 4 categories to classify data quality constructs: ^[2] intrinsic, contextual, representational, and accessibility. Intrinsic data quality includes concordance, correctness, and plausibility. Contextaual data quality consists of completeness, currency, granularity, and signal-to-noise. Representational data quality comprises of fragmentation and structuredness.^[3]

More recent harmonization of data quality assessments suggests use of “conformance,” which falls under the category of representational data quality.^[1]^[4]

Data Quality Construct Categories

Intrinsic data quality: quality of data independent of its intended use, such as accuracy, believability, objectivity, and reputation of data
Contextual data quality: quality of data for its intended task
Representational data quality: format (concise and consistent representation) and meaning (interpretability) of data
Accessibility: data accessible (available, retrievable, up-to-date) to data consumers

Data Quality Constructs/Dimension

Completeness: complete representation of a patient in the EHR with sufficient data for each patient, each health concept, and for each measurement time
Correctness: accuracy of EHR elements in representation of the patient
Concordance (consistency): agreement between EHR data elements
Plausibility (validity): agreement between EHR data elements and general medical knowledge
1. Uniqueness plausibility: individual patient represented by a single, non-duplicated record
2. Atemporal plausibility: data values and distributions agree with internal and external knowledge or reference standards
3. Temporal plausibility: sequence of values conform to expected temporal properties
Currency: reasonably updated measurements to represent patient state at an instance in time
Granularity: sufficient information in representing a health concept
Fragmentation: a complete concept comprised of EHR data elements stored in multiple locations
Signal-to-noise: “information of interest can be distinguished from irrelevant data”^[3]
Structuredness: “data recorded in a format that enables reliable extraction”^[3]
Conformance: compliance of EHR data elements to “internal or external formatting, relational, or computational definitions.”^[1] Conformance can be further categorized to value, relational, and computational conformance:
1. Value conformance: agreement of EHR data elements to constraint-driven data architecture (e.g., data type, acceptable range of values, etc.)
2. Relational conformance: agreement of EHR data to structural constraints of database architecture (e.g., primary and foreign key relationships)
3. Computational conformance: agreement between calculated value and entered value elsewhere (e.g., calculation from weight and height agrees with entered BMI)

Methods of Data Quality Assessment

Data quality is task-dependent: operationalization of constructs and method(s) to assess constructs should depend on dataset and intended research. A literature review by Weiskopf and Weng identified 7 general methods used to assess data quality:^[5]

Gold standard: comparison of EHR data element to gold standard (e.g., data from one or multiple alternative sources)
Data element agreement: concordance between data elements within EHR
Element presence: presence of desired or expected data
Data source agreement: concordance between EHR data element with data from another source
Distribution comparison: comparison of aggregated data to external sources (e.g., disease prevalence)
Validity check: face validity (plausibility) of value
Log review: review of metadata

References

↑ ^1.0 ^1.1 ^1.2 Kahn MG, Callahan TJ, Barnard J, et al. A Harmonized data quality assessment terminology and framework for the secondary use of electronic health record data. EGEMS (Wash DC). 2016;4(1):1244. Published 2016 Sep 11. doi:10.13063/2327-9214.1244.
↑ Wang RY, Strong DM. Beyond accuracy: what data quality means to data consumers. J Manag Inf Syst. 1996;12(4):5-34. doi: 10.1080/07421222.1996.11518099.
↑ ^3.0 ^3.1 ^3.2 Weiskopf NG, Bakken S, Hripcsak G, Weng C. A Data quality assessment guideline for electronic health record data reuse. EGEMS (Wash DC). 2017;5(1):14. Published 2017 Sep 4. doi:10.5334/egems.218.
↑ Lee K, Weiskopf N, Pathak J. A Framework for data quality assessment in clinical research datasets. AMIA Annu Symp Proc. 2018;2017:1080-1089. Published 2018 Apr 16.
↑ Weiskopf NG, Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J Am Med Inform Assoc. 2013;20(1):144-151. doi:10.1136/amiajnl-2011-000681.

Submitted by Frances Hsu.

[Kahn-1] 1.0 ^1.1 ^1.2 Kahn MG, Callahan TJ, Barnard J, et al. A Harmonized data quality assessment terminology and framework for the secondary use of electronic health record data. EGEMS (Wash DC). 2016;4(1):1244. Published 2016 Sep 11. doi:10.13063/2327-9214.1244.

[Wang-2] Wang RY, Strong DM. Beyond accuracy: what data quality means to data consumers. J Manag Inf Syst. 1996;12(4):5-34. doi: 10.1080/07421222.1996.11518099.

[Weiskopf_2017-3] 3.0 ^3.1 ^3.2 Weiskopf NG, Bakken S, Hripcsak G, Weng C. A Data quality assessment guideline for electronic health record data reuse. EGEMS (Wash DC). 2017;5(1):14. Published 2017 Sep 4. doi:10.5334/egems.218.

[Lee-4] Lee K, Weiskopf N, Pathak J. A Framework for data quality assessment in clinical research datasets. AMIA Annu Symp Proc. 2018;2017:1080-1089. Published 2018 Apr 16.

[5] Weiskopf NG, Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J Am Med Inform Assoc. 2013;20(1):144-151. doi:10.1136/amiajnl-2011-000681.

[1]

[2]

[3]

[4]

[5]

EHR Data Quality

Contents

Introduction

Hierarchy of Data Quality Assessment

Data Quality Construct Categories

Data Quality Constructs/Dimension

Methods of Data Quality Assessment

References

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Navigation

Tools