Projects

Phenotyping: Cohort discovery using EHR data

Return to Projects Page.

Phenotyping is the practice of developing algorithms designed to identify specific phenomic traits within an individual. These algorithms are created using multiple variables, thus enabling researchers to accurately identify traits and perform analyses. Best practice materials and data standardization tools have been developed to aid with phenotyping protocols and collaboration. The Phenotyping Workgroup also seeks to advance the science of de-identification, transportable phenotyping methods, structure and standards, and portable components of algorithms and methods. Within the eMERGE Network, computable phenotype algorithm development focuses both disease-related and pharmacogenomic-related phenotypes.

Creating Phenotypes

A variety of data can be extracted from electronic health records including structured and unstructred formats, billing codes, laboratory results, medication data and natural language processing (NLP) which searches text (like doctors notes and reports) for key words and information. The ability to pull the correct patient records from a phenotype depends on narrowing down the proper search criteria to target the ‘true’ case. See figure below and learn more, here.

EHR data structure and accurate phenotyping. (a) Electronic health record (EHR) data can be structured or unstructured. Structured data are easy to retrieve whereas unstructured data require additional tools to be used for phenotyping, such as natural language processing (NLP). (b) Accurate phenotyping often requires extracting information from billing codes, prescriptions, laboratory tests and clinical notes. This information can be either structured or unstructured. ICD-9, International Classification of Diseases, Ninth Revision.

Figure and caption courtesy of: Wei-Qi, W. & Denny, J.C. Extracting research-quality phenotypes from electronic health records to support precision medicine. Genome Medicine, 2015 April 30; 7:41. PMID: 25937834

Phenotype Algorithms

Current phenotyping efforts span multiple diseases in both pediatric and adult participants including adult familial hypercholesterolemia, colorectal cancer, migranes, epilepsy, chronic rhinosinusitis, chronic kidney disease, and hearing loss, with many more currently in development.

* Click here to view a listing of published eMERGE Phenotypes on PheKB.

Phenotype Data

* Click here to view a listing of eMERGE Studies currently available in dbGaP.
* Click here to view a listing of phenotype data collected for eMERGE studies.

Tools

eRC | eMERGE Record Counter, provides exploratory data figures for research planning purposes and feasibility assessment
PheKB | Phenotype Knowledgebase, offers a collaborative environment to build and validate electronic phenotype algorithms
PheWAS | PheWAS Catalog, functions as a platform for analysis of phenotypes against single gene variants
SPHINX | Sequence and Phenotype Integration Exchange, tool for exploring data for hypothesis generation, especially around drug response implications of genetic variation across the eMERGE PGx cohort

In the News

Here's what's happening with eMerge:

31
AUG

An upcoming scheduled maintenance window is starting Tuesday, September 5th and ending Wednesday, September 6th. During this time, the RD Discover, SD Discover, Subject Locator and Record Counter applications...

25
JUL

The NIH/NHGRI has published IGNITE II RFAs which are linked below for your convenience.

  • Implementing Genomics in Practice (IGNITE) II: Pragmatic Clinical Trials – Clinical Groups...

22
JUN

CMS announced last week it will award up to $30 million in grant funding to clinical specialty societies, clinical professional organizations and independent research organizations to develop quality measures...

10
FEB

eMERGE is excited to share the results of the national multi-site survey it conducted to investigate public attitudes towards consent and data sharing in biobank research recently publish in AJHG.  For...