Semantic MEDLINE

From Clinfowiki
Jump to: navigation, search


Semantic MEDLINE is under development at the Lister Hill Center for Biomedical Communications, a research division of the National Library of Medicine. The goal of Semantic MEDLINE is to “support more effective biomedical information management” by combining existing technologies (document retrieval, natural language processing, automatic summarization and visualization) into a single product. [1] Semantic MEDLINE can be accessed and used after applying for a free UMLS license here: Semantic MEDLINE.

Traditional IR vs. Semantic Search

Searching the medical literature through traditional information retrieval (IR) systems equates to entering a string of text and searching the database using that exact string. While Boolean operators can be utilized to combine various strings, this adjunct is a poor substitute for the ability to search the literature by ideas and meaning.

Rather than search the MEDLINE database with fixed text strings, Semantic MEDLINE attempts to improve information retrieval by harnessing several technologies to search concepts and ideas.


Semantic MEDLINE relies upon the MEDLINE database, SemRep NLP System and the UMLS to process and deliver semantic search results.

MEDLINE database [2]

Maintained by the National Library of Medicine, this database serves as one of the primary biomedical research repositories and is often the primary resource for providers seeking biomedical literature through the Pubmed search engine [3]. Semantic MEDLINE searches citations found in the MEDLINE database.

SemRep Natural Language Processing System and SemRepDB [4][5]

SemRep is a natural language processing system used to parse input text for subject-relation-object triples in order to extract meaning from strings of text. Semantic MEDLINE uses SemRep to parse citations within the MEDLINE database.

See below for sample extraction available on the SemRep website:

Input text: “We used hemofiltration to treat a patient with digoxin overdose that was complicated by refractory hyperkalemia.”

Subject-relation-objects triples extracted
Digoxin overdose-PROCESS_OF-Patients 
hyperkalemia-COMPLICATES-Digoxin overdose 
Hemofiltration-TREATS(INFER)-Digoxin overdose

In the provided example, SemRep parses the text string supplied and distills the string into subject-relation-object triples that represent the ideas found within the text. SemRep is used to parse MEDLINE/PubMed citations and the subject-relation-object triples are stored in SemRepDB for ease and speed of use.

Unified Medical Language System (UMLS) [6][7]

The UMLS consists of three knowledge sources: the SPECIALIST Lexicon and Lexical Tools, The Metathesaurus and The Semantic Network. There last two are knowledge sources are central to Semantic MEDLINE's functionality:

The Metathesaurus [8]

The Metathesaurus strives to map different sources of medical terminology such as ICD-9, ICD-10, SNOMED together under concepts. Terminology encountered in input text strings is mapped to concepts in the Metathesaurus and these concepts are then used as subject and object, or arguments, in the triples extracted by SemRep.

The Semantic Network [9]

The Semantic Network is is a collection of relationships between semantic types used to map the relationships found between arguments in SemRep’s triples.

How Semantic MEDLINE Delivers Semantic Search Results

Semantic MEDLINE searches PubMed for the text string entered then, using SemRep natural language processing, builds semantic predications, or apparent relationships between arguments (subject and object) identified in the citations retrieved. These semantic predications are then mapped and graphically presented to the user. Semantic MEDLINE is able to display both hierarchical and network representations of the relationships identified. These visual results emphasize semantic nature of the search result and presents users with a broad overview of their search results in a single image. This method of searching allows researchers several advantages:

  1. Semantic search more closely mimics the process of biomedical information retrieval wherein a user is searching for ideas and relationships moreso than exact strings of text.
  2. Semantic MEDLINE allows users to visualize relationships identified in their search results. This visualization creates an opportunity to identify relationships that may not be immediately apparent while examining search results from traditional IR systems.
  3. The results of semantic MEDLINE can be further filter results based on the intended meaning of their search, not the string of text they incidentally choose. This allows for more direct and efficient access to search results which are desired and intended rather than prolonged sorting through citations, which are only tangentially related to the initial search.

Excellent visual aids, along with a clinical example of searching Semantic MEDLINE can be found in the Rindflesch paper which is available in the public domain.

Submitted by Philip Hagedorn


  1. Rindflesch, T.C., et al., Semantic MEDLINE: An advanced information management application for biomedicine - IOS Press. Information Services and Use, 2011. 31(1-2): p. 15-21.
  2. Fact SheetMEDLINE®. [Fact Sheets] 2016 2016-02-12; Available from:
  3. Fact SheetPubMed®: MEDLINE® Retrieval on the World Wide Web. [Fact Sheets] 2015 2015-08-26; Available from:
  4. Semantic Knowledge Representation. 2015; Available from:
  5. Kilicoglu, H., et al., SemMedDB: a PubMed-scale repository of biomedical semantic predications. Bioinformatics, 2012. 28(23): p. 3158-60.
  6. Humphreys, B.L., et al., The Unified Medical Language System. 1998.
  7. Lindberg, D.A., B.L. Humphreys, and A.T. McCray, The Unified Medical Language System, in Methods Inf Med. 1993: p. 281-91.
  8. The Metathesaurus. [Training Material and Manuals] 2015 2015-08-29; Available from:
  9. The Semantic Network. [Training Material and Manuals] 2015 2015-08-29; Available from: