Difference between revisions of "Analysis of a Probabilistic Record Linkage Technique without Human Review"

From Clinfowiki
Jump to: navigation, search
Line 32: Line 32:
 
[[Category: Reviews]]
 
[[Category: Reviews]]
 
[[Category: Patient Matching Algorithm]]
 
[[Category: Patient Matching Algorithm]]
 +
[[Category: HI5313-2015-Fall]]
 +
[[Category:

Revision as of 00:46, 19 October 2015

Article by: Grannis, S. J., Overhage, J. M., Hui, S., & McDonald, C. J. (2003)


Introduction

Record linkage is the process of combining information from two or more databases about an individual, family, or entity. A method of record linkage is the probabilistic linkage without human intervention. With this methodology, an algorithm is used to generate a match of the likelihood score, which is compared to a predetermined threshold for which, if this likelihood score is above a link is established and below it is a non-link. [1]

Method

The authors compared the performance of a deterministic method (from a previous study) to an unsupervised probabilistic method using the say gold-standard datasets for two hospital registries. In this particular study, the authors generated match likelihood scores for each record-pair using the Felligi-Sunter model which sums the component weights of each identifier in the record pair. Each pair was labeled as linked or non-linked. To ensure non-human review, the authors used an estimator function using the Expectation Maximization (EM) to establish a single true-link threshold.

Results

The authors reported a 99.98% and 99.80% true link and identifier agreement for registry A for manual review and EM estimator respectively. For registry B, they reported 99.99% and 99.89% for manual review and EM estimator respectively. The authors also reported an improvement in the sensitivity and specificity with use of the probabilistic method over the deterministic method (about 6 to 7 percent improvement in sensitivities with minimal decrease in specificity).


Conclusion

In record linkage in which human intervention is not practical or possible, the use of the EM algorithm accurately estimated linkage parameters.

Remarks about the article

The methodology used in this study is limited to small datasets. The methodology is limited in that the authors didn’t take into consideration minor spelling variation and topographical errors in data. It would have been helpful as well for the authors to include a website where reviewers and critics can reproduce or run their algorithm on sample datasets to test out accuracy as reported.


Related Topics

Master patient index

Performance of probabilistic method to detect duplicate individual case safety reports

Matching identifiers in electronic health records: implications for duplicate records and patient safety

Reference

  1. Grannis, 2003. Analysis of a Probabilistic Record Linkage Technique without Human Review http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1479910/
[[Category: