Results from simulated data sets: probabilistic record linkage outperforms deterministic record linkage

From Clinfowiki
Revision as of 00:03, 5 November 2015 by Ooabiri (Talk | contribs)

Jump to: navigation, search

Tromp, M., Ravelli, A. C., Bonsel, G. J., Hasman, A., & Reitsma, J. B. (2011)


Introduction

Record linkage is possible when a combined set of partially identifying variables from different sources are combined to uniquely identify a patient. Two frequently applied strategies are the deterministic record linkage (DRL) and probabilistic record linkage (PRL). With DRL, all or predefined subset of linkage variables have to agree to consider a pair as a link. The PRL approach uses a probability based weights of agreement or disagreement between the paired variables to match or mismatch a patient. Two types of errors can occur in linked records-False Nonlink which is the failure to link two records that truly belong to the same person. False Link is linking two records that belong to different persons. [1]

Method

The authors used simulated data sets with varying amount of registration errors and discriminating power of linking variables that mimicked a range of realistic scenarios. They created a total of 10 scenarios. For each scenario, they compared the results of the two linkage strategies. Two datasets with four linking variables and specified sample size were used in the study. They stimulated 100 datasets by comparing the true status of linkage with that linkage result using either of the two methods. [1]

Results

The probabilistic strategy outperformed the deterministic strategy in all scenarios. In linking situations with few or no errors with a powerful discriminating key, the simple deterministic full linkage strategy perform as good as that of probabilistic approach. The full deterministic strategy produced the lowest number of false positive links at the expense of missing considerable number of matches. [1]

Conclusion

The PLR was found to be more flexible and provides data about the quality of the linkage process that can minimize the degree of linkage errors per given data. [1]

Remarks about the article

The article is very important study that can be helpful in helping organization decide which strategy adopt given the characteristics of data to be linked and the inherited shortcomings of both strategies. The article structure made it very difficult to read. It doesn’t have clearly spelled out conclusion except in the abstract.

Related Topics

Master patient index

Performance of probabilistic method to detect duplicate individual case safety reports

Matching identifiers in electronic health records: implications for duplicate records and patient safety

Improving record linkage performance in the presence of missing linkage data

Reference

  1. 1.0 1.1 1.2 1.3 Tromp, 2011. Results from simulated data sets: probabilistic record linkage outperforms deterministic record linkage http://www.jclinepi.com/article/S0895-4356(10)00225-8/pdf/