Patient Matching Algorithms

From Clinfowiki
Jump to: navigation, search

What is patient matching

The US does not currently have a National Patient Health Identification Number though there is mounting support within the health information industry to develop one. [1] Because no such national master patient index exists, healthcare organizations and vendors use patient or record matching algorithms to identify patients, match records, and build and maintain their internal master patient indexes.[2]


IHE (Integrating the Healthcare Enterprise)has developed two profiles for use by HIE's for querying patient data. These are the Patient Identifier Cross Referencing (PIX)/Patient Demographics Query (PDQ) and Cross-Community Patient Discovery (XCPD). These profiles govern the transmission of a patient’s demographic information and the querying in search of matching patient data at the target organization and the formulation of a response to that query. Yet there is currently no standard set of patient identifying or demographic data mandated for use to identify patients at the time of service or used for record matching within and across healthcare information systems. Organizations rely on internal patient access policies and data governance principles to maintain the fidelity of their internal master patient index. The risks and the failure rate of current patient matching algorithms is underrecognized. With an error rate of less than 8% considered the industry standard, contemporary patient matching algorithms fall well short of the 0.1% error rate advocated by the HIT Standards commitee.[2][3]

Health Information Management professionals and patient safety advocates have long recognized the importance of strong patient identification methodology. With the promotion of data exchange and interoperability fostered by the ONC’s Meaningful Use incentive program, the need to address the gaps in patient identification and matching has gained increased focus across the HIT industry.

Many in the healthcare IT industry[4] (but not all[5]) have advocated for the development of the National Patient Health Identification Number (PHIN) as a panacea for the patient matching challenges. The 1996 Health Insurance Portability and Accountability Act (HIPAA) directed Health and Human Services (HHS) to develop a PHIN to support the legislation’s primary goal of portable and private healthcare records. Funding for this HHS work was withdrawn by Congress in 1998. Quashing of PHIN development was precipitated by privacy rights advocates and libertarians concerned about the undue intrusion on the lives of US citizens and the threat of hacking and identity theft.

In the context of this void, health information exchange organizations, EHR vendors, healthcare delivery organizations, and payers, have developed algorithms, either proprietary or open-source, to perform patient matching. And while patient matching is a cornerstone of contemporary interoperability and data sharing strategies, there is relatively little appreciation for how complex and error-prone the process really is. Moreover, there is insufficient governance or standardization around the data used by these algorithms. In the last year, the ONC convened a patient identification and matching initiative to identify opportunities around this critical area of health information technology.

The fundamentals

Record matching algorithms are often embedded into core systems and are largely assumed to be sufficiently accurate. Before the push to exchange data across disparate systems, that assumption may have been safe. But as regional HIE’s have grown into national players and as vendors have consolidated their customer base, the volumes of patients and records to be matched has grown exponentially. This growth is challenging algorithms that had heretofore functioned well within a local framework or for tightly fused integrated delivery network.[6] One integrated delivery network reported 90% matching within individual instances of a vendor EHR but when matching was attempted across the IDN's 17 different instances of the same vednor's EHR, the accuracy of the patient matching dropped to 50 - 60%.[2] Additionally, our increasingly multicultural and mobile society with fluid names and addresses result in any one patient's demographics changing possibly many times over a lifetime.[3] In a 2015 opinion piece, the author reports that the Harris County, Texas Hospital District’s database contains medical records for close to 2,500 people named Maria Garcia and 231 of them have the same birth date.[7] This example brings the issue of accuracy in patient matching into focus.

How does the accuracy of patient matching algorithms impact on key priorities of US HealthCare reform?

Safety and Quality:

Patient matching algorithms dictate the integrity and fidelity of the information contained within any healthcare record. As robust data exchange expands, the implications of mismatched records and missing data jeopardizes the very improvements in safety and quality that data liquidity and portability are intended to foster. While the absence of data during a patient encounter dimishes the efficiency and cost-effectiveness of the care provided, inaccurate or misattributed information has the potential to cause egregious errors. False positive matches result in linking of the wrong patients' records together. False-negative matches result in missing an important linkage between a patient and some part of his or her record.[8]


Overlaid records can result in information being disclosed to the wrong individual or being denied to an individual who has the right to see their own healthcare information.


Remediation of inaccurate patient matching is costly to healthcare organizations and health information exchanges alike. Most organizations employ teams of staff to work error queues of suspected mismatches. The process to rectify record duplication and erroneous merging is labor-intensive and expensive. Matching errors in one organization can replicate exponentially when those same mismatched records are conveyed via HIE to downstream consumers of the data. Organizations report that it costs between $60 - $90 to remediate one incorrect record match.[2] Exacerbating the problem, while most EHR’s accommodate merging records, many do not have an easily support undoing an incorrect merge making that remediation task particularly costly.

Types of patient matching algorithms and other related terms

Matching algorithms can be embedded within health information systems, they can be home-grown, or they can be supplied by 3rd -party patient matching vendors who specialize in this service as a bolt on module.[9]

Basic Algorithms:

  • also called the deterministic matching
  • the algorithm compares selected elements within specified fields to identify one for one character matches
  • can include phonetic matches and wildcards

Intermediate Algorithms:

  • can incorporate fuzzy logic
  • assign a scoring system to the various elements of the match
  • a match on an element that is more likely to be unique such as a Social Security number is weighted more heavily than a match on a 1st or last name which may be less specific.
  • A match of a value that is very common is weighted less than a match of a rare value
  • example - a match for last name “Smith” is weighted less than a match for last name of “Sigurdottir.”
  • can account for possible nicknames
  • makes allowances for frequently transposed digits or other typographical errors
  • example - 1st name “GOERGE” will tentatively match for patients records with the 1st name of “GEORGE”.

Advanced Algorithms:

  • use probabilistic and mathematical models to determine likelihood of a match
  • incorporate machine learning and artificial intelligence
  • matching algorithm will evolve to optimize matching depending on the regional variations in names

Other related terms:

  • auto linking: records are automatically matched in overlaid by the system without human oversight

Data attributes commonly used by record matching algorithms and their associated risks

Data Attribute Area of Risk[2][3]
All data
  • While there are HL 7 standards for ADT transactions, there are no standards for the format of the data with in each of the HL7 segments
Patient name -1st and last
  • cultural factors influence compilation of children’s last names
  • newborns names change on subsequent admissions
  • hyphenated names and names with apostrophes are problematic and lack consistent formatting
  • handling of the middle name field:middle initial versus entire name
Date of birth
  • patients may change their date of birth seem younger or older
  • gender may change with gender reassignment surgery
Social Security Number
  • newborns and foreign visitors do not have a SSN
  • patients are often unwilling to share SSN
  • SSN's may be stolen
Phone numbers
  • Often change, particularly with population moving increasingly to mobile phones
Addresses with or without zip code
  • tend to be in flux with more risk in urban compared to rural areas
Mother’s maiden name or father’s 1st name
  • These will improve matching accuracy but many information systems cannot handle this type of data element in the ADT feed

Other Areas of Risk Associated with Patient Matching Algorithms:

Beyond the specific risks associated with individual data elements, there are operational reasons for patient matching errors even using the most robust algorithm.

Data governance and data integrity at registration is a key factor in the accuracy of patient matching. Typographical errors and transpositions as well as empty fields and false data can pollute the fidelity of the elements used in patient matching algorithms. Additionally, personel employed in registration areas and at the offices' front desks often have high turnover rate and are relatively poorly trained. There is growing support for creating a certification program for registrars through the National Association of Health Access Management (NAHAM). Kiosks have been proposed as one useful tool to allow patients to validate their own data at the time of registration but that some have raised concerns about the intentional falsification of data and the attribution or provenance of the data enetred into a kiosk and fed into registration systems.

Recommended best practices for record matching:

  1. Expand and deepen understanding within the Health Care and HIT community of the strengths and weaknesses of specific patient matching algorithms.
  2. Industry-wide there is some consensus that basic deterministic algorithms should be abandoned in favor of more advanced probablistic algorithms.
  3. While there is lack of consensus on the need to standardize matching algorithms, there is consensus for the standardization of the data sets fed into these algorithms including:
    • standardizing the format of the data within HL7 segments and
    • more rigorous ASC X12 standards for patient demographics
  4. CEHRT certification standards for vendors on the performance of patient matching algorithms and for the processes of merging and unmerging erroneously matched records.
  5. Stronger data integrity and data governance policies for all organizations, and specifically for organizations that enter into data sharing agreements.
  6. Better collaboration among participants of regional HIE’s to harmonize their patient matching in data-gathering standards


  1. CHIME Issues National Patient ID Challenge. Press Release. March 17, 2015. Accessed 4/27/2015.
  2. 2.0 2.1 2.2 2.3 2.4 Patient Identification and Matching Final Report. Prepared for the Office of the National Coordinator for Health Information Technology. February 7, 2014.
  3. 3.0 3.1 3.2 HIT Standards Committee Patient Matching Power Team. Letter to National Coordinator for HIT. August 17, 2011.
  4. Identity Crisis? Approaches to Patient Identification in a National Health Information Network. Rand Corporation.>
  5. Wheatley V. National Patient Identifier: Why Patient-Matching Technology May be a Better Solution. HISTalk. March 3, 2014. Accessed 4/27/2015.>
  6. Landsbach G, Just BH. Five Risky HIE Practices that Threaten Data Integrity. Journal of AHIMA 84, no.11 (November–December 2013): 40-42.>
  7. Healthcare Informatics. Why You Should Care About Patient-Matching Algorithms. February 4, 2015. Accessed 4/28/2015.>
  8. Fernandes L. Patient Identification in Three Acts. Journal of AHIMA 79, no.4 (April 2008): 46-49.>
  9. Just BH. Record-Matching Integrity: An Algorithm Primer. Health Data Management. October 23,2012.

Submitted by Karen Pinsky