The registry case finding engine: An automated tool to identify cancer cases from unstructured, free-text pathology reports and clinical notes

From Clinfowiki
Jump to: navigation, search

The registry case finding engine: An automated tool to identify cancer cases from unstructured, free-text pathology reports and clinical notes Hanauer DA, Miela G, Chinnaiyan AM, Chang AE, Blayney DW. J Am Coll Surg 2007;205:690–697.


Can an automated search engine be used to extract relevant data from surgical pathology reports and clinical notes, to populate a regional cancer registry and complement manual case-finding performed by registry personnel?

Background and Rationale

Maintaining a registry of all cancer cases is a legal requirement in many jurisdictions and is necessary for a cancer center to be accredited by the American College of Surgeons. Cancer cases to be entered into the registry are typically identified by manual review of surgical pathology reports, clinic visit notes, and tumor board presentations performed by trained cancer registrars, an expensive and labor intensive process. Investigators at the University of Michigan sought to improve the efficiency and completeness of the case ascertainment process, particularly since that institution processes more than 60,000 surgical pathology reports and over 63,000 outpatient oncology visits per year. They created a search engine called CaFE (Registry Case Finding Engine) and compared it to the accuracy and completeness of human coders.


CaFE used modifiable lists of terminology including case-sensitive and case-insensitive positive or negative individual words and phrases, meant to include or exclude, respectively, potential cancer cases. Data from the institution’s surgical pathology reports and oncology clinic notes (from the EHR) were processed by CaFE and compared to the manual extractions performed by certified tumor registrars (CTRs), considered the reference standard. Of note, these pathology reports and clinic notes were written in free-text, although a minority also had SNOMED codes assigned to them. CaFE was designed to be highly sensitive, in order to minimize false negatives (overlooked cancer cases), at the expense of less specificity (more false positives).

Main Results

Using an input of 2220 surgical pathology reports and clinic visit notes from an additional 476 patients, the CaFE system had a sensitivity of 100% (zero false negatives) when compared to human coders, with a specificity of 85% and positive predictive value (PPV) of 78.8% for the pathology reports. For the clinic notes, sensitivity was 100% and specificity and PPV were both 73.7%. The investigators found that 25-40% of the false positives were due to incorrect inclusion of basal cell and squamous cell carcinomas of the skin, tumors not traditionally included in cancer registries. Other false positives were due to incorrect SNOMED codes, a mention of a cancer in the family history, and the use of ambiguous modifiers, among other causes. CaFE was estimated to reduce the case-finding workload of the registrars by up to 50%, and resulted in 19% more cases being accessioned. The system compared favorably to other commercially available and institutional products.


The failure to identify a cancer case and enter it into a registry can have legal, economic, clinical, and public health consequences. The CaFE system enhanced the ability of the Univ. of Michigan CTRs to identify cases, although their overall workload increased, since CaFE did not perform the actual abstraction process, and they still had to manually review all of the positives to exclude the false positives. However, the interface was user-friendly (see screenshot in the article) which sped up this latter process. The investigators and registrars alike seemed satisfied with these trade-offs, since the final result was that their cancer registry database was more accurate and complete. It is interesting to note how the need for such an engine would be eliminated if all of the records had coded data fields for the diagnosis. However, CaFE performed admirably, with no false negatives, demonstrating that technology can compensate for the lack of rigor associated with free text entry in the medical record.

Robert S. Miller, MD