Bioinformatics linkage of heterogeneous clinical and genomic information in support of personalized medicine

From Clinfowiki
Jump to: navigation, search

Bioinformatics Linkage of Heterogeneous Clinical and Genomic Information in Support of Personalized Medicine

Frey LJ, Maojo V, Mitchell JA

Methods Inf Med. 2007;46 Suppl 1:98-105'

The authors review the complexities of modeling a combination of clinical medicine along with the relationships between genotype, phenotype, and environment to lead to better diagnostic and therapeutic treatments in support of personalized medicine. Different data repositories will need to be integrated to understand the significance and cost effectiveness of specific biomarkers. This paper reviews research on the linkage of bioinformatics with clinical data, the current applications, and discusses issues in bridging the gap between genotype data and clinical practice.

Biological data repositories are rapidly growing and have different data consistencies and completeness. For researchers to integrate information and extract it for usable patterns, the representation of genotype-phenotype information requires standardization to overcome many differences. These differences include variation in semantics, experiments, techniques, and procedures. Common data models are needed to standardize the representation of the scientific domain knowledge about the data in a meaningful way and to create re-usable objects or objects that are easier to map across systems. There is also a lack of ontologies to map the existing gaps between heterogeneous data sources. Currently no worldwide standard exists to represent genotype-phenotype data models for information storage and exchange.

Many initiatives are underway to combine clinical and genomic data for research, clinical trials, and treatment. Overviews of several are discussed, including initiatives at the National Cancer Institute, the Advanced Clinico-Genomics Trials on Cancer by the European Commission, the NIH, an international consortium effort called the Polymorphism Markup Language, and several other organizations.

Comments: The authors provide an overview of several issues that need to be overcome to integrate genotype-phenotype data. A common domain model and ontologies are discussed as integration requirements between the various sources to store and exchange data “in scientifically meaningful and productive systems” to support research and personalized medicine.

Personalized medicine has been for long a topic of interest for researchers and those involved in the field of medicine. Scientifically personalized medicine is known as pharmacogenomics or pharmacogenetics(1). Personalized medicine essentially involves the use of a patient’s genomic data such as the information about patient’s genotype, gene expression and other clinical information to stratify disease, select medication, provide a therapy or implement preventive measure that is particularly suited to that patient at the time of administration. Though the approach seems to be highly appealing, personalized medicine is still in its early stages and used only on limited basis and involves the use of tests such as Cytochrome P450 genotyping test, Thiopurine methyltransferase test etc to predict likely response or a bad reaction to specific medications(2).

One of the major hindrances to the application of personalized medicine is the large amount of heterogeneous data that is available. Such heterogeneous clinical and genomic information has to be integrated for successful application of personalized medicine. The amount genomic and clinical data generated by research projects is enormous. Such enormous data is usually stored in databases. These databases are of different types some store information about DNA or genes, others about genetic disorders e.g. OMIM, still other store information on proteins and RNA. Another issue that has been raised by the authors Frey. L.J., Maojo.V, Mitchell .J.A. in the paper being reviewed is the differences in the databases such as the hardware software applications, semantics, differences in scientific approach, cultural environment and cognitive biases etc. The authors suggest that these differences must be solved before attempting any integration. Besides those applications described by the authors a number of other projects have been initiated

One such project is the INFOGENMED project. The project is funded by European commission. This project aims at building a software environment to access and integrate genetic and medical information and allowing unified access to multiple, heterogeneous and medical databases over internet and involves the integration of genetic and medical terminologies in a vocabulary server. The biomedical database integration is done by an ontology based system OntoFusion(3, 4).

In 2006 Maojo.V et al described yet another effort to integrate distributed and heterogeneous medical and genetic information for use in clinical trials and research. The paper describes the use of domain independent approach to biomedical data integration. The authors used Ontology based approaches for mapping and unification. The work revealed that both mapping and Unification can be improved by use of domain ontologies. A number of private clinical databases and public genomic and disease databases (OMIM, Prosite etc) were integrated (5).

Another project funded by European commission is the ARMEDA II. The project used the virtual repository approach where virtual repository contains metadata and gives users the perception of working with local repositories that integrate data from different sources. This system like the above uses mapping and unification approach. The authors of this paper tested the system using two tumor databases, one containing information from hospital and other containing the genetic data associated to the tumor samples (6).

A number of other integration platforms are currently being used to integrate information from various databases such as SRS (Sequence Retrieval System). The SRS application allows easy and fast access to a number of diverse biology related data such as genetic, protein, molecular, cellular, clinical etc from both public as well as proprietary sources. SRS is used widely by pharmaceutical companies like Johnson and Johnson, AstraZeneca for data integration and sharing(7,8). Other systems commonly used include Data Annotation System, a client-server system used by ENSEMBL, that integrates information from multiple servers , BioMoby that integrates distributed and heterogeneous bioinformatics web services and allows data interoperability with ease and Mygrid which has developed initial tools for distribution services in Biomedicine(9).

In addition the authors of the paper being reviewed point out the need for the data to be normalized and the need to develop common data models and coding system for standardizing representation of genotype-phenotype information. A number of efforts are being carried out to improve representation of genotype-phenotype information.

In 2007 A. Shabo and D. Dotan proposed the use of a Clinical genomics level 7(CGL7) a set of web service applications for clinical genomics decision support that follow the HL-7 Clinical Genomics standard for representation and exchange of clinical genomics data. The Clinical genomics elements are manipulated effectively by representing them in memory as JAVA objects(conversion of XML to JAVA and vice versa is done by an open source API) and this computes object graphs effectively.CGL7 is thus a pilot middleware aimed at closing gaps in the current clinical and genomic infrastructure. Since it is compliant with HL-7 standards it allows the genotype-phenotype relations to be dynamically created as new data and knowledge become available(10).

Another effort The Prognochip Project aims at developing an interoperable and efficient clinical information system for integration of heterogeneous data. The middleware layer of this project aims at uniform information representation by using the XML/RDF technologies(11).

Besides these considerations it is also important to understand the application of Ontologies as pointed out by the authors provide the semantics required to bridge the gap between heterogeneous data sources and a formal language for information retrieval. Ontologies are essential for information integration and knowledge based systems in a variety of disciplines. Ontologies essentially permit users to define and share domain specific vocabularies. A number of such ontologies, vocabularies and terminologies are being used at different institutes world wide and include such vocabularies as Thesaurus, Metathesaurus,UMLS and so on . Each of these includes certain terminologies. The metathesaurus for e.g. includes Gene Ontology, SNOMED/CT, LOINC etc. The representing and combining of clinical and genomic information can be achieved by mapping these terminologies. A number of coding systems MeSH, ICD-9 etc can also be used for the above task. With genetics the most common ontology used is Gene Ontology whereas the vocabularies used in clinical medical domains include such vocabularies as UMLS, ICD9CM, LOINC, GALEN, MeSH(12,13).

With enormous clinical and genomic data being generated and number of efforts discussed above that help integrate clinical and genomic information, personalized medicine has now begin to grow and it will only be a matter of few years before personalized medicine completely replaces the traditional medicine approach. The key to the growth and application of personalized medicine is the integration of clinical and genomic information and various efforts to develop systems for rapid integration of the clinical and genetic data and sharing of data from such clinico-genomic repositories.






5. Maojo.V et al Designing new methodologies for integrating biomedical information in clinical trials(2006)Methods of Information in Medicine 45(2):180-183


7. .


9. Mark D. Wilkinson and Matthew Links BioMOBY: An open source biological web services proposal (2002) Briefings in Bioinformatics ;3(4):331-332. 10.