Recognition and Evaluation of Clinical Section Headings in Clinical Documents Using Token-Based Formulation with Conditional Random Fields

From Clinfowiki
Revision as of 16:07, 6 October 2015 by URumana (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


Electronic health records (EHRs) have been developed to be able “to recognize clinical concepts mentioned in EHRs."[1] However, there is an inherent challenge to having the EHR recognize the concepts and be useful after recognition; this paper attempts to relieve some of the challenge with a new type of tool that makes use of “machine learning approach based on the conditional random fields (CRF) model."[1] The proposed method is the development of a section heading recognition system for clinical documents.


The tool itself and a dataset to help with the testing of the tool were both created by the team; the step-by-step process of the making of the tool is outlined. The creation of the tool involved the use of creating an algorithm based on the conditional random fields (CRF) model; this CRF model is described as “an undirected graphical model that is trained to maximize a conditional probability of random variables.”[1] All of the features of the new tool are also described. To test the new recognition tool, a performance comparison between the new tool and other tools using other creation methods is used. The metrics for the performance comparison were mathematical equations defined as “precision,” “recall,” and the “F-measure.”[1]


From the performance comparison, the new tool did better across most of the datasets tested against the tool. However, the other tools—particularly the dictionary-based tool—did better in one version of the performance test. To be specific, the dictionary-based tool did better for the recall score across two datasets while the new tool, which was token-based, did better in terms of the precision and F-score.


The paper itself is relatively novel in concept as there are not too many papers like it; the paper even acknowledges it. The paper does a good job of discussing the errors involved and some potential problems. Because the creation of datasets had to be made, there may be some unintentional skewing, potentially causing certain results from being completely trustworthy.

Never the less, technology that is aimed to improve data access and mining is important to explore. At the time this article was published, 50% of EHR data was stored as free text[1]; in the aspect of user satisfaction is a great tool, but it negatively impacts the information extraction process.

Related Articles


  1. 1.0 1.1 1.2 1.3 1.4 Clinical Section Headings in Clinical Documents Using Token-Based Formulation with Conditional Random Fields. BioMed Research International, 2015.