clingen: the clinical genome resource - heidi rehm
DESCRIPTION
Recently, three NIH-funded efforts were aligned with the National Center for Biotechnology Information’s (NCBI) ClinVar database under the collaborative Clinical Genome Resource Program (ClinGen - http://www.iccg.org/about-the-iccg/clingen). ClinGen is developing interconnected resources for the community to improve our understanding of genomic variation and optimize its use in genomic medicine. A unique aspect of ClinGen is that it represents a strong public-academic-private partnership that relies on the collaboration between NIH, academic and commercial genetic testing laboratories. The project includes the development of standards for variant interpretation as well as data submission and sharing. ClinVar, launched in April 2013, is a cornerstone of the project as it serves as the primary site for deposition and retrieval of variant data and annotations. As of February 1st, 2014 ClinVar contains 73,487 submissions across 18,702 genes (66,956 unique variants) with interpretations from OMIM, GeneReviews, 60 laboratories, and 23 locus-specific databases (LSDBs). The dataset includes 5454 variant submissions (2095 unique variants) from the Sharing Clinical Reports Project (SCRP - http://sharingclinicalreports.org) on BRCA1/2 and 4100 copy number variants from the International Standards for Cytogenomic Arrays consortium. New policies and data structures are being considered to support controlled access to patient-level data. ClinGen is currently working with many laboratories and LSDBs to support robust mechanisms to share their data in an ongoing manner and increase the content of structured data and supporting evidence. Other parts of the project include computational and machine-learning approaches for identifying clinically relevant variants, and the development of expert working groups across many clinical domains to support consensus-driven evidence-based curation of genes-disease associations and genomic variant interpretations. Groups have already been formed in the areas of cardiovascular disease, hereditary cancer, metabolic disease, rasopathies, congenital muscular dystrophy, and developmental delay. The project is also interfacing with a large and diverse community of stakeholders including professional organizations, patient advocacy groups, regulatory agencies, research consortia and other projects from both national and international sites which is facilitated by working with the existing International Collaboration for Clinical Genomics (ICCG - http://www.iccg.org). This talk will give an overview of the ClinGen resource and progress made to date.TRANSCRIPT
Heidi L. Rehm, PhD
Partners Healthcare and Harvard Medical School
ClinGen Clinical Genome Resource
The Problem
> 100 million genomic variants in humans
>20,000 genes
Most we don’t understand
0
200
400
600
800
1000
1200
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0
2
4
6
8
10
12
14
16
18
20
Lung Cancer
KRAS EGFR
G12C L858R
GJB2
35delG
GJB2
M34T PTPN11
N308D
MYBPC3
R502W
68% of
pathogenic/
likely pathogenic
variants are seen
only once
96% of variants are
seen <10 times
Number of Probands
Num
ber
of V
ariants
Histogram of Pathogenic Variants from Diagnostic Testing of 15,000 Probands
(cardiomyopathy, hearing loss, rasopathies, aortopathies, somatic and hereditary cancer
pulmonary disorders, skin disorders, other genetic syndromes)
31%
VUS
25%
Positive 61%
Negative 14%
Inconclusive nclusive
52%
Benign
17%
Path
Clinical Domain WGs
Chairs: Jonathan Berg & Sharon Plon
Cancer co-chairs: Matthew Ferber, Ken
Offit, Sharon Plon
Cardiovascular co-chairs: Euan Ashley,
Birgit Funke, Ray Hershberger
Metabolic co-chairs: Rong Mao, Robert Steiner, David Valle
Pharmacogenomic co-chairs: Teri Klein, Howard McLeod
ClinGen Working Groups (WG) ClinGen Working Groups (WG)
Actionability WG
Chair: Jim Evans
Informatics WG
Chair: Carlos Bustamante
EHR WG
Chair: Marc Williams
ClinVar IT Standards and Data Submission
WG
Chairs: Sandy Aronson & Karen Eilbeck
Gene Curation WG
Chairs: Jonathan Berg & Christa Martin
Sequence Variant WG
Chairs: Sherri Bale, Heidi Rehm, &
Madhuri Hegde
Structural Variant WG
Chairs: Swaroop Arahdya & Erik
Thorland ELSI and Genetic Counseling WG
Chair: Andy Faucett & Kelly Ormond
Education, Engagement, Access
WG
Chair: Andy Faucett
Phenotyping WG
Chair: David Miller
ClinGen The Clinical Genome Resource
Launched Sept 2013
NCBI ClinVar Leads Melissa Landrum
Jennifer Lee Donna Maglott
George Riley Steve Sherry
U41 Grant PIs David Ledbetter Christa Martin Bob Nussbaum
Heidi Rehm
U01 PIs Jonathan Berg
Jim Evans David Ledbetter
Mike Watson
U01 PIs Carlos Bustamante
Sharon Plon
NHGRI Program Directors
Lisa Brooks Erin Ramos
ClinVar vs. ClinGen? • ClinVar is a database
• ClinGen, The Clinical Genome Resource, is an NIH funded program supporting a wide range of activities encompassing both support for ClinVar (funded through NCBI) as well as other projects
• ClinGen includes 3 grants primarily funded by the National Human Genome Research Institute at NIH
Clinical Domain WGs
Chairs: Jonathan Berg & Sharon Plon
Cancer co-chairs: Matthew Ferber, Ken
Offit, Sharon Plon
Cardiovascular co-chairs: Euan Ashley,
Birgit Funke, Ray Hershberger
Metabolic co-chairs: Rong Mao, Robert Steiner, David Valle
Pharmacogenomic co-chairs: Teri Klein, Howard McLeod
ClinGen Working Groups (WG) ClinGen Working Groups (WG)
Actionability WG
Chair: Jim Evans
Informatics WG
Chair: Carlos Bustamante
EHR WG
Chair: Marc Williams
ClinVar IT Standards and Data Submission
WG
Chairs: Sandy Aronson & Karen Eilbeck
Gene Curation WG
Chairs: Jonathan Berg & Christa Martin
Sequence Variant WG
Chairs: Sherri Bale, Heidi Rehm, &
Madhuri Hegde
Structural Variant WG
Chairs: Swaroop Arahdya & Erik
Thorland ELSI and Genetic Counseling WG
Chair: Andy Faucett & Kelly Ormond
Education, Engagement, Access
WG
Chair: Andy Faucett
Phenotyping WG
Chair: David Miller
ClinGen The Clinical Genome Resource
Launched Sept 2013
NCBI ClinVar Leads Melissa Landrum
Jennifer Lee Donna Maglott
George Riley Steve Sherry
U41 Grant PIs David Ledbetter Christa Martin Bob Nussbaum
Heidi Rehm
U01 PIs Jonathan Berg
Jim Evans David Ledbetter
Mike Watson
U01 PIs Carlos Bustamante
Sharon Plon
NHGRI Program Directors
Lisa Brooks Erin Ramos
Goals of ClinGen To raise the quality of patient care by:
• Standardizing the annotation and interpretation of genomic variants
• Sharing variant and case level data through a centralized database for clinical and research use
• Developing machine-learning algorithms to improve the throughput of variant interpretation
• Implementing an evidence-based expert consensus process for curating genes and variants
• Assessing the actionability of genes and variants and supporting their use in clinical care systems
Public LSDBs
>600
Pharm
GKB
Population
Databases
EVS
1000G
dbSNP
Medical
Literature
Clinical Lab
Databases
OMIM
Variant Databases
COSMIC
HGMD
$$$
HGMD
$$$
Research Lab
Databases
Largely absent from
the public domain
Largely without
standardized
assertions
Review of Published Pathogenic Variants Found in WGS
3-5 million variants 3-5 million variants
~20,000 Coding/Splice Variants ~20,000 Coding/Splice Variants
20-40 “Pathogenic”
Variants
20-40 “Pathogenic”
Variants
Published as Disease-Causing
Genes
<1%
Rare CDS/Splice Variants Rare CDS/Splice Variants
LOF in Disease Associated Genes
10-20 Variants 10-20 Variants
Review evidence for
gene-disease association
and LOF role
Review evidence for
variant pathogenicity
92%
Excluded 67% Excluded
Acknowledgements:
Heather McLaughlin
Kalotina Machini
Ozge Ceyhan Birsoy
Matt Lebo
Danielle Metterville
Weak disease association 65%
Not medically relevant 33%
Somatic 2%
MedSeq Project:
PI: Robert Green
Rating System for Gene Dosage
Highest -- 3, 2, 1, 0, unlikely dosage sensitive -- Lowest
Proposed Revisions to ACMG
Guideline for Interpretation of
Sequence Variants
ACMG
Sue Richards, Chair Heidi Rehm, Co-chair
Sherri Bale Wayne Grody
David Bick Madhuri Hegde
Soma Das Elaine Spector
AMP
Elaine Lyon Julie Gastier-Foster
CAP
Karl Voelkerding Nazneen Aziz
On behalf of the ACMG Laboratory Quality Assurance Committee
Terminology
Mendelian disease variant terminology
• Pathogenic
• Likely pathogenic ← (≥90% confidence)
• Uncertain significance (VUS)
• Likely benign
• Benign
Replace terms “mutation” and “polymorphism” with “variant”
Defined other areas that need variant terminology:
Complex traits, Pharmacogenetics, Cancer
Population Data
Computational And Predictive Data
Segregation Data
Other Database
Prevalence in affecteds statistically increased over controls PS4
MAF frequency is too high for disorder BSI OR observation in controls inconsistent with disease penetrance BS2
Truncating variant in a gene where LOF is a known mechanism of disease PVS1
De novo (paternity & maternity confirmed)
PS2
Well-established functional studies show a deleterious effect PS3
Novel missense change at an amino acid residue where a different pathogenic missense change has been seen before PM5
Multiple lines of computational evidence support a deleterious effect on the gene /gene product PP3
De novo (without paternity & maternity confirmed) PM6
Non-segregation with disease BS4
Patient’s phenotype or FH highly specific for gene PP4
For recessive disorders, detected in trans with a pathogenic variant PM3
Found in case with an alternate cause BP5
Missense in gene where only truncating cause disease BP1
Multiple lines of computational evidence suggest no impact on gene /gene product BP4
Well-established functional studies show no deleterious effect BS3
Located in a mutational hot spot
and/or known functional domain PM1
In-frame indels in a repetitive region without a known function BP3
Same amino acid change as an established pathogenic variant PS1
In-frame indels in a non-repeat region or stop-loss variants PM4
Observed in trans with a dominant variant BP2
Functional Data
Co-segregation with disease in multiple affected family members PP1
De novo Data
Allelic Data
Absent in 1000G and ESP PM2
Strong
Observed in cis with a pathogenic variant BP2
Reputable database = benign BP6
Strong Very Strong Moderate Supporting Supporting
Reputable database = pathogenic PP5
Missense in gene with low rate of benign missense variants and path. missenses common PP2
Other Data
Benign Pathogenic
Increased segregation data
The Scoring Rules for Classification
Pathogenic
1 Very Strong AND
1 Strong OR
≥2 (Moderate OR Supporting)
2 Strong
1 Strong AND
≥3 Moderate OR
≥2 Moderate and 2 Supporting OR
≥1 Moderate and 4 Supporting
Likely Pathogenic
1 Very strong or Strong AND
≥1 Moderate OR
≥2 Supporting
≥3 Moderate
≥2 Moderate AND 2 Supporting
≥1 Moderate AND 4 Supporting
Very Strong: PVS1
Strong: PS1-PS4
Moderate: PM1-PM6
Supporting: PP1-PP5
Stand-Alone: BA1
Strong: BS1-BS4
Supporting: BP1-BP6
Benign
1 Stand Alone OR
≥ 2 Strong
Likely Benign
1 Strong and ≥1 Supporting OR
>2 Supporting
Uncertain Significance
If other criteria are unmet or
arguments for benign and
pathogenic are equal in strength
www.ncbi.nlm.nih.gov/clinvar
ClinGenDB ClinGenDB
Data Flows in ClinGen
Expert
Curated
Variants
Case-level Data
Variant-level Data
ClinVar
Data
Locus-Specific Databases Locus-Specific Databases
Clinical Labs Clinical Labs Clinics Clinics Patients Patients
Sharing Clinical Reports Project
Curation Interface
Free-the-Data Campaign
Patient Registries
Researchers Researchers
Unpublished or Literature Citations
InSiGHT
CFTR2 PharmGKB
Submitter Variants Genes Clinical Labs Harvard Medical School and Partners Healthcare 6996 155 Emory Genetics Laboratory 5252 507 International Standards For Cytogenomic Arrays 4134 17711 University of Chicago 3687 462 Sharing Clinical Reports Project 2045 2 GeneDx 1436 40 ARUP Laboratories 1417 7 LabCorp 1391 140 University Pennsylvania Genetic Diagnostic Lab 68 1 American College of Med Genetics and Genomics 23 1 Ambry Genetics 10 1
26459
General Databases OMIM 24443 3360 GeneReviews 3738 406
28181 LSDB/Researcher – Assertions Submitted Breast Cancer Information Core (BIC) 3793 2 InSiGHT 2360 4 Juha Muilu Group; FIMM, Finland (FIMM) 840 39 ClinSeq Project 425 35 Martin Pollak (Nephrology, BIDMC, Harvard) 234 39 CFTR2 133 1
7785 LSDB/Researcher – No Assertions 111 Submitters 50063 >6957
ClinVar – 117,115 submissions/104,217 unique variants
50,063 variants
without assertions
from 111 submitters
62,425 variants
with assertions
from >3360 genes
The Sharing Clinical Reports Project and Free-the-Data Campaign for BRCA1 and BRCA2
Goal: Improve the care and safety of patients through
data sharing
Method: Request clinical lab reports from clinics and
patients
Status: >60 clinics and > 200 patients have submitted
de-identified reports leading to 4278 variants collected
sharingclinicalreports.org
Acknowledgements:
Bob Nussbaum (UCSF)
Danielle Metterville (ICCG)
Laura Swaminathan
George Riley (NCBI)
Larry Brody (BIC)
Sharon Terry (Genetic Alliance)
Genetic Alliance Staff and SC
www.free-the-data.org
ClinVar BRCA1/2 Variants
Total 9703 variants – Pathogenic (2232)
– Likely pathogenic (26)
– Uncertain significance (2191)
– Likely benign (565)
– Benign (169)
– Conflicting interpretations (397)
– Literature reference only (4223)
Pathogenic
Uncertain Significance
Likely Benign
Benign
Not Provided
Conflicting
Likely Pathogenic
53 discrepancies: 60% differ based upon likelihood (Benign vs LB, P vs LP) 34% differed VUS vs Likely Pathogenic/Likely Benign 6% differed VUS vs Pathogenic
20% discrepant
ClinVar Pilot Project
Scope Number of alleles
Total submitted to ClinVar 997
Multiple assertions 269
Comparison of three laboratories classifications for variants in 12 RASopathy genes: BRAF, CBL, HRAS, KRAS, MAP2K1, MAP2K2, NRAS, PTPN11, RAF1, SHOC2, SOS1, SPRED1
Summary Assertions in ClinVar
Clinical Assertions
ClinVar Evidence Tab
Expert Panel
Single-Source
Evidence and Methods Provided
1. Assertions without evidence and method provided
2. Literature references without assertions
3. Inconsistency in assertions
Multi-Source Consistency
Evidence and Methods Provided
Practice
Guideline
ClinVar Review Levels
Mendelian Categories:
Pathogenic
Likely pathogenic
Uncertain significance
Likely benign
Benign
(e.g. InSiGHT and CFTR2)
(e.g. 23 CF)
No stars
ClinVar Expert Panel Designation (3 stars)
• Download submission form on ClinVar website
• Panel should include multiple institutions and expertise
– medical specialists in disease area
– medical geneticists
– clinical laboratory diagnosticians/ molecular pathologists
– researchers relevant to the disease, gene, functional assays and statistical analyses
• Process for COI review and updating assertions
• Publications or links that describe annotation process
• Information provided is reviewed by ClinGen Executive Committee and posted on ClinVar w/designation
Proposal to Develop Level 2 Environment for
Submitting and Accessing Case-Level Data
ICCG Annual Conference
June 10-12, 2014, Bethesda, MD
www.iccg.org
clinicalgenome.org
ClinGen Acknowledgements
Jonathan Berg
Carlos Bustamante
Jim Evans
David Ledbetter
Christa Martin
Robert Nussbaum
Sharon Plon
Heidi Rehm
Michael Watson
Erica Anderson
Swaroop Arahdya
Sandy Aronson
Euan Ashley
Larry Babb
Erin Baldwin
Sherri Bale
Louisa Baroudi
Les Biesecker
Chris Bizon
David Borland
Rhonda Brandon
Lisa Brooks
Michael Brudno
Damien Bruno
Atul Butte
Hailin Chen
Mike Cherry
Eugene Clark
Soma Das
Johan den Dunnen
Edwin Dodson
Karen Eilbeck
Marni Falk
Andy Faucett
Xin Feng
Mike Feolo
Matthew Ferber
Penelope Freire
Birgit Funke
Monica Giovanni
Katrina Goddard
Robert Green
Marc Greenblatt
Robert Greenes
Ada Hamosh
Bret Heale
Madhuri Hegde
Ray Hershberger
Lucia Hindorff
Sibel Kantarci
Hutton Kearney
Melissa Kelly
Muin Khoury
Eric Klee
Patti Krautscheid
Joel Krier
Danuta Krotoski
Shashi Kulkarni
Melissa Landrum
Matthew Lebo
Charles Lee
Jennifer Lee
Elaine Lyon
Subha Madhavan
Donna Maglott
Teri Manolio
Rong Mao
Daniel Masys
Peter McGarvey
Dominic McMullan
Danielle Metterville
Laura Milko
David Miller
Aleksander Milosavljevic
Rosario Monge
Stephen Montgomery
Michael Murray
Rakesh Nagarajan
Preetha Nandi
Teja Nelakuditi
Elke Norwig-Eastaugh
Brendon O’Fallon
Kelly Ormond
Daniel Pineda-Alvaraz
Erin Ramos
Darlene Reithmaier
Erin Riggs
George Riley
Peter Robinson
Wendy Rubinstein
Shawn Rynearson
Cody Sam
Avni Santani
Neil Sarkar
Melissa Savage
Jeffery Schloss
Charles Schmitt
Sheri Schully
Alan Scott
Chad Shaw
Steve Sherry
Weronika Sikora-Wohlfield
Bethanny Smith Packard
Tam Sneddon
Sarah South
Marsha Speevak
Justin Starren
Jim Stavropoulos
Greer Stephens
Christopher Tan
Peter Tarczy-Hornoch
Erik Thorland
Stuart Tinker
David Valle
Steven Van Vooren
Matthew Varugheese
Yekaterina Vaydylevich
Lisa Vincent
Karen Wain
Meredith Weaver
Kirk Wilhelmsen
Patrick Willems
Marc Williams
Eli Williams