EHR Data Quality

From Clinfowiki
Revision as of 00:58, 28 October 2021 by Hsufr (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Assessment of data quality from electronic health record (EHR) ensures its “fitness for use” in secondary research.

Introduction

Electronic health record collects clinical data to improve patient care. It has many potential secondary uses, such as for billing, clinical quality assessment, and research. Data are not collected systematically for EHR as for research, raising concerns about research using EHR data’s validity and reproducibility.

Hierarchy of Data Quality Assessment

EHR data constructs/dimensions are assessed for its veracity (verification) and validity.[1] Veracity evaluates internal validity of such as conformance to metadata constraints and local knowledge. External validity assesses data against external gold standards or knowledge.

Constructs of data quality falls under 4 categories proposed by Wang and Strong: intrinsic, contextual, representational, and accessibility.[2] Intrinsic data quality includes concordance, correctness, and plausibility. Context data quality consists of completeness, currency, granularity, and signal-to-noise. Representational data quality comprises of fragmentation and structuredness.[3]

More recent harmonization of data quality assessments suggests use of “conformance,” which falls under the category of representational data quality.[1][4]

Data Quality Construct Categories

  • Intrinsic data quality: quality of data independent of its intended use, includes accuracy, believability, objectivity, and reputation of data
  • Contextual data quality: quality of data for its intended task
  • Representational data quality: format (concise and consistent representation) and meaning (interpretability)
  • Accessibility: data accessible (available, retrievable, up-to-date) to data consumers

Data Quality Constructs/Dimension

  • Completeness: complete representation of a patient in the EHR with sufficient data for each patient, each health concept, and for each measurement time
  • Correctness: accuracy of EHR elements in representation of the patient
  • Concordance (consistency): agreement between EHR data elements
  • Plausibility (validity): agreement between EHR data elements and general medical knowledge (validity)
    1. Uniqueness plausibility: individual patient represented by a single, non-duplicated record
    2. Atemporal plausibility: data values and distributions agree with internal and external knowledge or reference standards
    3. Temporal plausibility: sequence of values conform to expected temporal properties
  • Currency: reasonably updated measurements to represent patient state at an instance in time
  • Granularity: sufficient information in representing a health concept
  • Fragmentation: a complete concept comprised of EHR data elements stored in multiple locations
  • Signal-to-noise: “information of interest can be distinguished from irrelevant data”[3]
  • Structuredness: “data recorded in a format that enables reliable extraction”[3]
  • Conformance: compliance of EHR data elements to “internal or external formatting, relational, or computational definitions.”[1] Conformance can be further categorized to value, relational, and computational conformance
    1. Value conformance: agreement of EHR data elements to constraint-driven data architecture (e.g., data type, acceptable range of values, etc.)
    2. Relational conformance: agreement of EHR data to structural constraints of database architecture (e.g., primary and foreign key relationships)
    3. Computational conformance: agreement between calculated value and entered value elsewhere (e.g., calculation from weight and height agrees with entered BMI)

Methods of Data Quality Assessment

Data quality is task-dependent, meaning constructs operationalization and methods to assess constructs should depend on dataset and intended study. A literature review by Weiskopf and Weng identified 7 general methods used to assess data quality.[5]

  • Gold standard: comparison of EHR data element to gold standard (e.g., data from one or multiple alternative sources)
  • Data element agreement: concordance between data elements within EHR
  • Element presence: presence of desired or expected data
  • Data source agreement: concordance between EHR data element with data from another source
  • Distribution comparison: comparison of aggregated data to external sources (e.g., disease prevalence)
  • Validity check: face validity (plausibility) of value
  • Log review: review of metadata

References

  1. 1.0 1.1 1.2 Kahn MG, Callahan TJ, Barnard J, et al. A Harmonized data quality assessment terminology and framework for the secondary use of electronic health record data. EGEMS (Wash DC). 2016;4(1):1244. Published 2016 Sep 11. doi:10.13063/2327-9214.1244.
  2. Wang RY, Strong DM. Beyond accuracy: what data quality means to data consumers. J Manag Inf Syst. 1996;12(4):5-34. doi: 10.1080/07421222.1996.11518099.
  3. 3.0 3.1 3.2 Weiskopf NG, Bakken S, Hripcsak G, Weng C. A Data quality assessment guideline for electronic health record data reuse. EGEMS (Wash DC). 2017;5(1):14. Published 2017 Sep 4. doi:10.5334/egems.218.
  4. Lee K, Weiskopf N, Pathak J. A Framework for data quality assessment in clinical research datasets. AMIA Annu Symp Proc. 2018;2017:1080-1089. Published 2018 Apr 16.
  5. Weiskopf NG, Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J Am Med Inform Assoc. 2013;20(1):144-151. doi:10.1136/amiajnl-2011-000681.


Submitted by Frances Hsu.