Difference between revisions of "Health Record De-identification and Anonymization"

From Clinfowiki
Jump to: navigation, search
(Created page with "The secondary use of health data is central to biomedical research. For instance, clinical trial datasets made available to wider scientific community after the end of the tri...")
(No difference)

Revision as of 00:15, 5 May 2024

The secondary use of health data is central to biomedical research. For instance, clinical trial datasets made available to wider scientific community after the end of the trial contains considerable amounts of data not analyzed as part of the published results and can be used for meta-analyses[1]. Given how time and resource intensive clinical data collection is, to not use the data fully is wasteful. In another example, there is ever growing need for health records to train deep learning health AI models. However, data availability is the first and foremost barrier to developing higher performing models that require large, annotated training datasets[2]. Health data is not only expensive, proprietary, and siloed, but heavily regulated to protect patient privacy. One way to circumvent the data availability issue is to generate high-fidelity synthetic electronic health records, but the scope of the latest research is limited to structured data[3]. Unstructured data is rich with insights, captures granular details, and comprises majority (around 80%) of EHR data[4]. De-identification or anonymization of health data is therefore a key and rapidly evolving sector of health data analytics. Conversely, to enable and support biomedical research has been cited as the purpose of de-identification and anonymization most frequently by the researchers of the field[5].