From Clinfowiki
Jump to: navigation, search

Phenome-wide association studies (PheWAS) is a quantitative research technique used by scientists trying to solve the mystery of: What disease associations can we make with a given gene? This is in contrast to GWAS (genome-wide association studies) which asks: What gene is associated with a given disease? Living organisms possess physical and biochemical properties referred to as phenomes.


PheWAS is an attractive technique for geneticists, informaticians, epidemiologists, and others because it exploits information rich environments. Two of these information sources for clinical information are genomic databases and electronic health records (EHR). Scientists believe that human disease results from complex interactions between genes and environmental risk factors, and that variants from a few (<20) susceptibility genes variants are responsible for >50% of this disease burden. Hence, given a specific limited set of genes (or polymorphisms) scientists can determine the associations with phenomic markers such as, CPT codes, medications, ICD-9 codes.


PheWAS and GWAS when studied in combination can provide new insights to the encyclopedia of clinical knowledge. For example, given a single nucleotide polymorphism (SNP) identified by GWAS (SNP: rs17234657) and association with infection, one may conclude that the SNP increases susceptibility of the host. In contrast, with PheWAS new putative associations may be identified through interrogation of phenomic markers within the EHR. Hence, an alternative mechanism is identified, where rs17234657 is found to be associated with an increase in autoimmune disease and the treatment used (immunosuppressive medication) is the cause of the infection.

Article review

Denny et al., describe the use of PheWAS as a proof of concept study. They selected 5 SNPs, known for their association with disease. The primary outcome of the study was to replicate known SNPdisease associations for 7 conditions:-atrial fibrillation, coronary artery disease, carotid artery stenosis, multiple sclerosis, Crohn’s disease, rheumatoid arthritis, and systemic lupus erythematosus. BioVU, the Vanderbilt University DNA repository provided the genomic data for the study population of ~6000 subjects. Genetic data from BioVU are linked to a de-identified EHR called “the synthetic derivative.” A modified, ICD-9 coding system in the synthetic derivative was used for linking genetic data to the phenome. The PheWAS algorithm described above was able to detect 4 out of the 7 known SNP-disease associations (p<0.01). Most interestingly, the algorithm picked up an additional 19 unknown associations between the SNPs and diseases within the ICD-9 codes used. Although, the majority of these new associations were of limited power, in the future, the increasing size and availability of genetic banks will only strengthen the signals generated by PheWAS. The 3 undetected SNP-disease associations were (according to the study authors) secondary to false positive ICD-9 codes in the dataset were billing codes were recorded as “hypothetical reason for a test.”


Finally, scientists are beginning to explore the increasingly abundant genomic database using common data within the EHR. As an emerging domain, PheWAS, has a broad appeal to personalized medicine. Not only will individuals know their phenome and genome; physicians will soon be able to counsel and recommend therapies that can adjust environmental risk factors at the individual level. Current limitations will be overcome as the science and methodology becomes more available and easier to use.


  1. Phenomics-
  2. Ghebranious N, McCarty C, Wilke R. Clinical phenome scanning Personalized Medicine 2007;4(2):175-182.
  3. Yang Q, Khoury M, Friedman J, Little J, Flanders D. How many genes underlie the occurrence of common complex diseases in the population? International Journal of Epidemiology 2005;34:1129-1137.
  4. Denny J. Ritchie M, Basford M, et al,. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics 2010;26(9):1205-1210.