Natural language processing in EHRs

From Clinfowiki
Revision as of 20:52, 22 November 2009 by Wmadhavi (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

The term Natural Language Processing is used to describe the function of software or hardware components in a computer system which analyze or synthesize spoken or written language. The term ‘natural’ is meant to distinguish human speech and writing from more formal languages, such as mathematical notations or programming languages where the vocabulary and syntax are comparatively restricted. (Jackson & Moulinier, 2007)

One of the goals of using EHRs in healthcare is to represent health information in a structured form so that it can be re-used for decision support and quality assurance. However, free text fields are often used in EHRs to facilitate rapid data entry by clinicians. These free text items create a major limitation because of the inability to reliably access the vast amount of clinical information that is locked in the application as free text. Natural language processing systems potentially offer a solution because they can extract individual words and also represent well-defined relations among words.

Understanding natural language involves three components (Friedman & Hripcsak, 1999): 1. Syntax, or understanding the structure of sentences 2. Semantics, or understanding the meaning of words and how they are combined to form the meaning of a sentence 3. Domain knowledge, or information about the subject matter

Natural language processing has made good progress in the field of medicine because its domain is restricted and it uses a sub-language which has less variety, ambiguity and complexity than a general language and it involves only specific information and relations relevant to the particular subject. For example, there is an informational category associated with body location (e.g., chest), with symptom (e.g., pain) and with severity (e.g., severe). All the systems that have been developed use different amounts of syntactic, semantic and domain knowledge.

For example, some of the systems that are well known in the clinical world are: LSP (Linguistic String Project) which is a pioneer in general English language processing and has been adapted to medical text, MENELAS was created by a consortium that aimed to provide better access to patient discharge summaries. Both LSP and MENELAS use comprehensive syntactic and semantic knowledge about the structure of the complete sentence. The MedLEE system operates as an independent module of a clinical information system at New York Presbyterian Hospital and is used daily. It is the first NLP system used for actual patient care that was shown to improve care. It has also been integrated into voice recognition systems. The MedLEE system relies heavily on general semantic patterns interleaved with some syntax and also includes knowledge of the entire sentence (Friedman, Hripcsak, & Shablinsky, An evaluation of natural language processing methodologies, 1998).

Once information is extracted from text, it can be stored in any well-defined format. For example, information stored in XML can be used in web reports (Friedman & Hripcsak, 1999). Besides free text, Natural language processing can also be used in conjunction with continuous voice recognition systems in EHRs. Integrating a voice recognition system with a natural language processing system substantially enhances the functionality of the voice system. Physicians can dictate notes in their usual fashion while the natural language processor translates the report into a structured encoded form in the background (Friedman & Hripcsak, 1999).

Another application of Natural language processing in EHRs is to provide the ability to link from the EHR to on-line information resources. This is done by parsing the plain text reports from Web based EHRs and using the results to identify clinical findings in the text. This information is used to provide automated links to on-line information resources via Infobuttons. (Janetzki, Allen, & Cimino, 2004) A study to evaluate the automated detection of clinical conditions described in narrative reports found that the natural language processor was not distinguishable from physicians on how it interpreted narrative reports. It was also found to be superior to other comparison subjects used in the study which included internists, radiologists, laypersons and other computer methods. This study seems to show that natural language processing has the ability to extract clinical information from narrative reports in a manner that would support automated decision support and clinical research. (Hripcsak, Friedman, Alderson, DuMouchel, Johnson, & Clayton, 1995)

References Friedman, C., & Hripcsak, G. (1999). Natural language processing and its future in medicine. Journal of the Association of American Medical Colleges . Friedman, C., Hripcsak, G., & Shablinsky, I. (1998). An evaluation of natural language processing methodologies. AMIA . Hripcsak, G., Friedman, C., Alderson, P. O., DuMouchel, W., Johnson, S., & Clayton, P. (1995). Unlocking clinical data from narrative reports: a study of natural language processing. Annals of Internal Medicine . Jackson, P., & Moulinier, I. (2007). Natural Language Processing for Online Applications. John Benjamins Publishing Co. Janetzki, V., Allen, M., & Cimino, J. J. (2004). Using Natural Language Processing to Link from Medical text to On-Line Information Resources. MEDINFO .

Submitted by Madhavi Bharadwaj