variant effect prediction - amazon s3 · exac, cosmic, omim, etc), “a mile wide / an inch...

23

Upload: others

Post on 24-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Variant Effect Prediction - Amazon S3 · ExAC, COSMIC, OMIM, etc), “a mile wide / an inch deep”, collecting large-scale genome-wide information. On the other hand there are the
Page 2: Variant Effect Prediction - Amazon S3 · ExAC, COSMIC, OMIM, etc), “a mile wide / an inch deep”, collecting large-scale genome-wide information. On the other hand there are the

!

Variant Effect Prediction Training Course

31st October - 3rd November 2016

Astoria Capsis Hotel Heraklion, Crete

Greece

Page 3: Variant Effect Prediction - Amazon S3 · ExAC, COSMIC, OMIM, etc), “a mile wide / an inch deep”, collecting large-scale genome-wide information. On the other hand there are the

The Human Variome Project and the Scientific Organising Committee wishes to express its gratitude to the following sponsors for their support of this event.

Page 4: Variant Effect Prediction - Amazon S3 · ExAC, COSMIC, OMIM, etc), “a mile wide / an inch deep”, collecting large-scale genome-wide information. On the other hand there are the

ProgramMonday 31st October - The Basics

11.30 – 12.45 REGISTRATION - hotel foyer

13.00 - 14.45 PLENARY SESSION 1 Room: Ariadni

13.00 - 13.15 Welcome & Introduction

Johan T. den Dunnen Leiden Univ. Medical Center, Leiden, Netherlands

13.15 - 14.00 Variants in the genome, position and possible consequences

Jan Traeger-Synodinos Dept. of Medical Genetics, National and Kapodistrian University of Athens, Greece

14.00 - 14.45 Sequencing technology; Sanger, NGS and single molecule

Henk Buermans Dept. of Human Genetics; Leiden Genome Technology Center, Leiden, The Netherlands

14.45 - 15.15 Coffee Break

15.15 - 16.00 PLENARY SESSION 2 Room: Ariadni

15.15 - 16.00 Calling DNA variants

Steven Laurie CNAG (RD Connect), Barcelona, Spain

16.00 - 17.30 CONCURRENT PRACTICALS 1

Alamut (Interactive Biosoftware)

Andre Blavier & Alexandre Hatzoglou

Room: Ariadni

CONCURRENT PRACTICALS 2

VarAFT & UMD Predictor

Jean-Pierre Desvignes & Christophe Béroud

Room: Cellar

18.00 - 19.30 WELCOME RECEPTION – Astoria Capsis Hotel

Page 5: Variant Effect Prediction - Amazon S3 · ExAC, COSMIC, OMIM, etc), “a mile wide / an inch deep”, collecting large-scale genome-wide information. On the other hand there are the

Tuesday 1st November - Gathering Information

8.30 - 10.30 PLENARY SESSION 3 Room: Ariadni

8.30 – 9.15 DNA variants - the big databases

Robert Kuhn UCSC Genome Browser, UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA

9.15 - 9.45 Gene variant databases & sharing information

Johan den Dunnen Leiden Univ. Medical Center, Leiden, The Netherlands

9.45 - 10.30 Human Phenotype Ontology (HPO)

Sebastian Köhler Institute of Medical Genetics, Charité Universitätsklinikum Berlin, Germany

10.30 - 11.00 Coffee Break

11.00 - 12.30 PLENARY SESSION 4 Room: Ariadni

11.00 - 11.45 The Ensembl genome browser and its possibilities

Helen Sparrow Ensembl, EMBL - EBI, Cambridge, UK

11.45 - 12.30 Prioritize: annotate and filter variants

Christophe Béroud INSERM UMR_S910, Aix Marseille University, France

12.30 - 14.00 Lunch Break

14.00 - 15.30 CONCURRENT PRACTICALS 3

Ensembl genome browser

Helen Sparrow

Room: Ariadni

CONCURRENT PRACTICALS 4

The RD-Connect platform

Steve Laurie

Room: Cellar

CONCURRENT PRACTICALS 5

Alamut (Interactive Biosoftware) (repeated)

Andre Blavier & Alexandre Hatzoglou

Room: Mouses

15.30 - 16.00 Coffee Break

Page 6: Variant Effect Prediction - Amazon S3 · ExAC, COSMIC, OMIM, etc), “a mile wide / an inch deep”, collecting large-scale genome-wide information. On the other hand there are the

16.00 - 17.30 CONCURRENT PRACTICALS 6

Ensembl genome browser (repeated)

Helen Sparrow

Room: Ariadni

CONCURRENT PRACTICALS 7

The RD-Connect platform (repeated)

Steve Laurie

Room: Cellar

CONCURRENT PRACTICALS 8

Sophia Genetics

Gaetano Bonifacio & Nicole Grieder

Room: Mouses

17.30 DAY END - Evening at leisure

Page 7: Variant Effect Prediction - Amazon S3 · ExAC, COSMIC, OMIM, etc), “a mile wide / an inch deep”, collecting large-scale genome-wide information. On the other hand there are the

Wednesday 2nd November - Predicting Consequences & Variant Classification

8.30 - 10.30 PLENARY SESSION 5 Room: Ariadni

8.30 - 9.00 Summary session

Andreas Laner MGZ - Medical Genetics Centre, Munich, Germany

9.00 - 9.45 Potential consequences on RNA level

Andreas Laner MGZ - Medical Genetics Centre, Munich, Germany

9.45 - 10.30 HGVS nomenclature - describing variants

Johan den Dunnen Leiden Univ. Medical Center, Leiden, The Netherlands

10.30 - 11.00 Coffee Break

11.00 - 12.30 PLENARY SESSION 6 Room: Ariadni

11.00 - 11.45 Variant annotation

Helen Sparrow (EBI) Ensembl, EMBL - EBI, Cambridge, UK

11.45 - 12.30 The UCSC genome browser and its possibilities

Robert Kuhn UC Santa Cruz Genomics Institute

12.30 - 14.00 Lunch Break

14.00 - 15.30 CONCURRENT PRACTICALS 9

UCSC genome browser

Robert Kuhn

Room: Ariadni

CONCURRENT PRACTICALS 10

VarAFT & UMD Predictor (repeated)

Jean-Pierre Desvignes & Christophe Béroud

Room: Cellar

CONCURRENT PRACTICALS 11

HPO (Phenomizer) and WES/WGS analysis using Exomiser

Sebastian Köhler

Room: Mouses

15.30 - 16.00 Coffee Break

Page 8: Variant Effect Prediction - Amazon S3 · ExAC, COSMIC, OMIM, etc), “a mile wide / an inch deep”, collecting large-scale genome-wide information. On the other hand there are the

16.00 - 17.30 CONCURRENT PRACTICALS 12

UCSC genome browser (repeated)

Robert Kuhn

Room: Ariadni

CONCURRENT PRACTICALS 13

Sophia Genetics (repeated)

Gaetano Bonifacio & Nicole Grieder

Room: Cellar

CONCURRENT PRACTICALS 14

HPO (Phenomizer) and WES/WGS analysis using Exomiser (repeated)

Sebastian Köhler

Room: Mouses

17.30 - 18.45 Free time

18.45 for 19.00

CONFERENCE DINNER Meet in hotel lobby at 18.45 for short walk to the restaurant or meet us directly there by 19.00 (see map provided)

Page 9: Variant Effect Prediction - Amazon S3 · ExAC, COSMIC, OMIM, etc), “a mile wide / an inch deep”, collecting large-scale genome-wide information. On the other hand there are the

Thursday 3rd November - Functional Testing & Reporting

8.30 - 10.20 PLENARY SESSION 7 Room: Ariadni

8.30 – 9.15 Variant classification

Andreas Laner MGZ - Medical Genetics Centre, Munich, Germany

9.15 - 10.20 PRESENTATIONS FROM ABSTRACTS

9.15 - 9.35 Next generation sequencing in the diagnosis of inherited cardiac disorders

Sara Benedetti Clinical Molecular Biology Laboratory, IRCCS San Raffaele Scientific Institute, via Olgettina 60, 20132 Milano, Italy

9.35 - 9.55 Full-length CYP2D6 diplotyping using PacBio RSII long reads for better drug dosage and response management

Henk P.J. Buermans Dept. of Human Genetics; Leiden Genome Technology Center, Leiden, The Netherlands

9.55 - 10.15 Targeted RNASeq helps to improve prediction of the effect of identified variants/mutations

Bernd Dworniczak Department of Human Genetics, University Hospital Muenster, Vesaliusweg 12, Muenster, Germany

10.15 - 10.20 Discussion

10.20 - 10.50 Coffee Break

PLENARY SESSION 8 Room: Ariadni

10.50 - 11.35 NGS in diagnostics Anna Beret-PagesMGZ - Medical Genetics Centre, Munich, Germany

11.35- 12.30 Future developments & meeting evaluation

Johan den Dunnen Leiden Univ. Medical Center, Leiden, The Netherlands

12.30 COURSE END

Page 10: Variant Effect Prediction - Amazon S3 · ExAC, COSMIC, OMIM, etc), “a mile wide / an inch deep”, collecting large-scale genome-wide information. On the other hand there are the

Abstracts

PLENARY SESSION 1

Variants in the genome, position and possible consequences

Joanne Traeger-Synodinos

Dept. of Medical Genetics, National and Kapodistrian University of Athens, Greece

Databases recording the genetic variation of the human genome include thousands of entries. These numbers have increased exponentially in the last few years due to advances in the technologies available for genetic and genomic analysis, most notably NGS. One big challenge is the precise interpretation of genome variants in a diagnostic setting to support prognostic, therapeutic and reproductive advice with respect to human phenotypes, especially rare Mendelian diseases. There are many categories of variants at the DNA level, including single nucleotide substitutions within genes and their flanking (regulatory) regions, lesions involving 20 bp or less (micro-deletions, micro-insertions and combined micro-insertions/micro-deletions or indels), repeat variations, gross aberrations (deletions, insertions and duplications extending from 10’s of bp’s up to thousands which disrupt and/or remove an entire gene or even a group of contiguous genes) and finally complex rearrangements (inversions, translocations and complex indels) involving extensive chromosome regions. This presentation will summarize the potential consequences of variants, mainly focusing on nucleotide variants as these represent the category most abundantly generated by NGS technologies. Additionally examples of unusual and rare variants will be presented that highlight many of the complexities associated with variant interpretation.

Page 11: Variant Effect Prediction - Amazon S3 · ExAC, COSMIC, OMIM, etc), “a mile wide / an inch deep”, collecting large-scale genome-wide information. On the other hand there are the

Sequencing technology; Sanger, NGS and single molecule

Henk Buermans

Dept. of Human Genetics; Leiden Genome Technology Center, Leiden, The Netherlands

Impressive progress has been made in the field of Next Generation Sequencing (NGS). Through advancements in the fields of molecular biology and technical engineering, parallelization of the sequencing reaction has profoundly increased the total number of produced sequence reads per run. Current sequencing platforms allow for a previously unprecedented view into complex mixtures of RNA and DNA samples. NGS is currently evolving into a molecular microscope finding its way into virtually every fields of biomedical research.

The technical background of the different commercially available NGS platforms will be covered with respect to template generation and the sequencing reaction, with focus on differences, strong/weak points and possible sequencing errors.

PLENARY SESSION 2

Calling DNA Variants

Steven Laurie

CNAG (RD Connect), Barcelona, Spain

Accuracy of identification of SNVs and short InDels from short-read next generation sequencing data is affected by many variables, independent of the quality of the sample analysed. Two key processes are alignment to the reference genome, and identification of positions that are different from the reference sequence (i.e. variants). Here I will show the results obtained from benchmarking a variety of state-of-the-art aligner/variant caller combinations, and discuss their successes and limitations, and how variant calling is likely to improve in the future.

Page 12: Variant Effect Prediction - Amazon S3 · ExAC, COSMIC, OMIM, etc), “a mile wide / an inch deep”, collecting large-scale genome-wide information. On the other hand there are the

PLENARY SESSION 3

DNA variants - the big databases

Robert Kuhn

UCSC Genome Browser, UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA

There are two major types of variation databases, copy-number (CNVs), addressing large-scale variants and single-nucleotide (SNVs or SNPs). Technically, the term "polymorphism" refers to common variants, but the acronym "SNP" is widely used to include all single-nucleotide variants, irrespective of allele frequency. This presentation will discuss variant resources both large and small and both benign and pathogenic. For example, the Database of Genomic Variants (DGV) and ExAC annotate large CNVs from "normal" individuals, while DECIPHER and ClinGen represent large variants that are associated with disease. Similarly, for short SNVs, common variants are annotated by dbSNP and HapMap and typically removed by a diagnostic pipeline, while disease-associated SNVs are found in many databases, including OMIM Allelic Variants, LOVD, and UniProt. UCSC's pipeline for processing dbSNP variants makes separate tracks for Common SNPs and SNPs flagged as potentially clinically relevant to assist in the evaluation of new variants.

Gene variant databases & sharing information

Johan den Dunnen

Leiden Univ. Medical Center, Leiden, The Netherlands

It seems so simple: DNA diagnostics is based on sharing data on genes, variants and phenotypes. Without sharing DNA diagnostics is not possible. When we do not share, we do not offer optimal care to the patients and their families. One would therefore expect that (i) sharing is the standard, and (ii) excellent well-funded databases are available displaying all information known.

Unfortunately, reality is quite different. Sharing is not the standard, far from. Although funding agencies and journals try to force improvements developments are slow. Many databases, especially the gene variant databases (GVDBs or LSDBs), struggle to survive caused by lacking funding and/or the availability of active database curators. As a consequence, available knowledge is spread over a range of databases. On one hand there are the general databases (likeHGMD, dbSNP, EVA, ExAC, COSMIC, OMIM, etc), “a mile wide / an inch deep”, collecting large-scale genome-wide information. On the other hand there are the gene variant databases (like ClinVar, LOVD, UMD, etc), “an inch wide / a mile deep”, collecting individual genetic information including detailed phenotype data. In my presentation I will give a brief overview of the available databases, their differences, the type of information they contain and the quality to expect. In addition I will stress the importance of sharing data; the focus should be on the interest of the patient, not on personal interest.

Page 13: Variant Effect Prediction - Amazon S3 · ExAC, COSMIC, OMIM, etc), “a mile wide / an inch deep”, collecting large-scale genome-wide information. On the other hand there are the

Human Phenotype Ontology (HPO)

Sebastian Köhler

Institute of Medical Genetics, Charité Universitätsklinikum Berlin, Germany

Future medicine will require a precise understanding of genotype-phenotype relationships. For this purpose, precise disease groupings and patient cohort definition based on clinical features and symptoms will be an essential part.

I am going to introduce the Human Phenotype Ontology (HPO) and give an abstract of the history of this resource. An important part of the HPO are logical definitions that enable phenotype analyses across different species. I present active areas of development in the HPO, e.g. its opening to non-clinical experts. In order to increase HPO's usability and impact, we are assigning plain language synonyms and aim to translate the HPO into several languages in a crowd-sourcing approach.

I will show how the HPO can be a resource to capture phenotype information and how it can be used to align phenotype with genotype information. More details on the computational part will be shown in the practical session in the following days.

PLENARY SESSION 4

The Ensembl genome browser and its possibilities

Helen Sparrow

Ensembl, EMBL - EBI, Cambridge, UK

Viewing the data (EBI): This session will provide a brief introduction to Ensembl, a freely available project offering one of the most comprehensive and integrated genomic resources. We currently have over 80 species, including human - our most highly accessed genome, whether in its latest assembly (GRCh38) or previous ones (GRCh37 and NCBI36).

Ensembl annotates genes and transcripts based on biological evidence, generates gene trees (both protein coding and non-coding) and whole genome alignments. To annotate other genomic features such as SNPs, CNVs and regulatory elements, Ensembl draws on major biological projects including; 1000 Genomes, ENCODE, Roadmap Epigenomics, and Blueprint epigenome. We also integrate data from reference databases such as dbSNP, the NHGRI-EBI GWAS catalogue and OMIM. These data can be accessed through our web browser, APIs (Perl and REST), MySQL and FTP dumps, and our toolkit (e.g. our popular VEP, BioMart, BLAST/BLAT).

Page 14: Variant Effect Prediction - Amazon S3 · ExAC, COSMIC, OMIM, etc), “a mile wide / an inch deep”, collecting large-scale genome-wide information. On the other hand there are the

Variants prioritization: annotation and filtration steps Jean-Pierre Desvignes1, David Salgado1 and Christophe Béroud1,2 1 Aix-Marseille University, INSERM, GMGF, Marseille, France 2 APHM, Hôpital Timone Enfants, Laboratoire de Génétique Moléculaire, Marseille, France High-throughput sequencing technologies are now fundamental for the identification of disease-causing mutations in human genetic diseases both in research and clinical testing contexts. More than 1,000 genes have been identified between 2010 and 2014 thanks to the early adoption of Whole Exome Sequencing (WES) technologies. However, despite this encouraging figure, the success rate of clinical exome diagnosis remains low (between 23% and 26%). It is due to several factors such as technical factors, mutation types, bioinformatics suite of tools and methods used to generate VCF files, and wrong variant annotation and non-optimal filtration practices. In this presentation, we will describe the critical steps of variant annotation and filtration processes to highlight a handful of potential disease-causing mutations for downstream analysis. We will review the key annotation elements to gather at multiple levels for each mutation, and which systems are designed to help in collecting this critical information. We will also describe filtration options, their efficiency and limits, and provide a generic filtration workflow. Finally, we will demonstrate this workflow in action and highlight potential pitfalls through a use case. EndFragment

SERM UMR_S910, Aix Marseille University, France

PLENARY SESSION 5

Potential consequences on RNA level

Andreas Laner

MGZ - Medical Genetics Centre, Munich, Germany

Pathogenic DNA variants are classically thought of as truncating variants (e.g. PTC, indels, single or multi exon deletion/duplication, canonical +/- 1 or 2 splice sites etc.) or missense substitutions altering the biological function of the gene product. However, studies during the last two decades suggest that approximately 15 % of disease-causing variants by altering RNA1, even though the frequency of these variants varies considerably between individual genes. This is a rather conservative estimate, as research has only recently begun to routinely assess e.g. splicing abnormalities, and there is evidence that many unclassified genetic variants might turn out to result in splicing aberrations or other consequences on the RNA level. This presentation provides an overview and discussion of different functional mechanisms leading to potentially deleterious consequences on the RNA level (for a review of mechanisms see Diederichs et al., 2016 2 and Scotti et al 2015 3).

Topics covered in this presentation include variants affecting

Page 15: Variant Effect Prediction - Amazon S3 · ExAC, COSMIC, OMIM, etc), “a mile wide / an inch deep”, collecting large-scale genome-wide information. On the other hand there are the

• Splice acceptor and splice donor sites

• Branch sites

• ESE and ESS(ESR) sites

• mRNA stability

• micro-RNA binding

• Translational folding / Codon usage

HGVS nomenclature - describing variants

Johan den Dunnen

Clinical Genetics & Human Genetics, Leiden University Medical Center (LUMC), Leiden, Nederland

The consistent and unambiguous description of sequence variants is essential to report and exchange information on the analysis of a genome. In particular, DNA diagnostics critically depends on accurate and standardized description and sharing of the variants detected. The sequence variant nomenclature system proposed in 2000 by the Human Genome Variation Society has been widely adopted and has developed into an internationally accepted standard. The recommendations are currently commissioned through the Sequence Variant Description Working Group (SVD-WG) operating under the auspices of three international organizations: the Human Genome Variation Society (HGVS), the Human Variome Project (HVP), and the Human Genome Organization (HUGO). Requests for modifications and extensions go through the SVD-WG following a standard procedure including a community consultation step. Version numbers are assigned to the nomenclature system to allow users to specify the version used in their variant descriptions. In my presentation I will summarise the current recommendations, HGVS version 15.11, focussing on the changes/additions that were made recently. Most focus has been on removing inconsistencies and tightening definitions allowing automatic data processing.

An extensive version of the recommendations is available online, at

https://www.HGVS.org/varnomen.

Page 16: Variant Effect Prediction - Amazon S3 · ExAC, COSMIC, OMIM, etc), “a mile wide / an inch deep”, collecting large-scale genome-wide information. On the other hand there are the

PLENARY SESSION 6

Variant annotation

Helen Sparrow (EBI)

Ensembl, EMBL - EBI, Cambridge, UK

Variant annotation (EBI) - An introduction to Ensembl’s VEP: The VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions. The VEP accepts various variant input formats to find out the: -Genes and transcripts affected by the variants -Location of the variants (e.g. upstream of a transcript, in coding sequence, in non-coding RNA, in regulatory regions) -Consequence of your variants on the protein sequence (e.g. stop gained, missense, stop lost, frameshift) -Known variants that match yours, and associated minor allele frequencies from the 1000 Genomes Project -SIFT and PolyPhen scores for changes to protein sequence

The UCSC genome browser and its possibilities

Robert Kuhn

UC Santa Cruz Genomics Institute

Viewing the data with the UCSC Genome Browser

The UCSC Genome Browser is a visualization tool for reference genomes and their annotations. Using the genome coordinates from a reference assembly as the x-axis, anything that can be aligned to the genome may be displayed on the Browser. Co-visualization of diverse datasets allows the researcher to pursue inspiration and curiosity about genomic annotations and molecular scenarios. A wide variety of annotations are available, including large and small variants relative to the reference, histone modifications across many cell lines, gene expression data from human tissue and many others. A new display mode called "multi-region" allows display of exons only, useful for display of results from whole-exome sequencing experiments or RNA-seq.

Page 17: Variant Effect Prediction - Amazon S3 · ExAC, COSMIC, OMIM, etc), “a mile wide / an inch deep”, collecting large-scale genome-wide information. On the other hand there are the

PLENARY SESSION 7

Variant classification

Andreas Laner

MGZ - Medical Genetics Centre, Munich, Germany

The dramatic progress in sequence technology, lab automatization, and bioinformatic data processing in the last decade has made next-generation sequencing the standard method in molecular diagnostics. Especially after the development of benchtop NGS machines, almost every lab can create vast amounts of high-quality sequence data. However, there are some important hurdles to overcome, especially in the interpretation of sequence variants with a view to providing correct clinical recommendations. Evaluating the pathogenicity of a variant is challenging given the plethora of types of genetic evidence that laboratories need to consider. Deciding how to weigh each type of evidence is difficult, and standards have been set. In 2015, the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) published guidelines for the assessment of variants in genes associated with Mendelian diseases.1

In this presentation, the ACMG classification rules are presented and compared with other published classification systems. Furthermore, possible pitfalls leading to discrepancies in inter- and intra-laboratory classifications are discussed.2,3

(1) Richards et al.; Genet. Med. 17, 405–424, 2015

(2) Amendola et al.; Am J Hum Genet 98, 1067–1076, June 2, 2016

(3) Maxwell et al.; Am J Hum Genet 98, 801–817, May 5, 2016

Next generation sequencing in the diagnosis of inherited cardiac disorders

Sara Benedetti1*, Monica Zanussi1, Chiara Di Resta1,2, Alessandra Foglio1, Stefania Merella1,3, Giovanni Pipitone1, Pucci Paolo2, Maurizio Ferrari1,2,4, Paola Carrera1,4.

1 Clinical Molecular Biology Laboratory; 2 University Vita San Salute San Raffaele, 3 Centre for Translational Genomics and Bioinformatics; 4 Unit of Genomics for human disease diagnosis; IRCCS San Raffaele Scientific Institute, via Olgettina 60, 20132 Milano, Italy. - Email of the presenting author: [email protected]

Purpose: Inherited cardiac disorders are characterized by a complex genotype and phenotype picture, with many genes overlapping the different forms of arrhythmogenic and structural diseases. The identification of genetic variants associated with pathological phenotypes is pivotal for correct diagnosis and patient clinical management. Today the use of next-generation sequencing (NGS) in clinical

Page 18: Variant Effect Prediction - Amazon S3 · ExAC, COSMIC, OMIM, etc), “a mile wide / an inch deep”, collecting large-scale genome-wide information. On the other hand there are the

laboratories allows to perform fast sequencing of wide target regions, bringing many advantages in genetic testing of these heterogeneous disorders. We evaluated a commercial NGS panel and set up bioinformatic pipeline for diagnostic application.

Methodology: NGS analysis of genomic DNA was performed using Trusight Cardio protocol (iLLUMINA), allowing analysis of 174 genes involved in several cardiac disorders on MiSeq platform. Moreover we developed a protocol for the amplification and enrichment of SCN10A gene, which has been recently associated to Brugada syndrome (BrS) and not present in the panel.

Results: To validate this NGS approach we performed two runs including 18 patients with known genotype. NGS yielded a coverage >20X in 99.4% of targeted regions, with sensitivity =100% and specificity ≥94%. We subsequently performed analysis of 48 patients with different cardiac phenotypes. The higher diagnostic yield was observed for hypertrophic cardiomyopathy and catecholaminergic ventricular tachycardia (100%), while for BrS we detected a variant in only 25% of patients, reflecting current limitations in clinical definition. In order to increase diagnostic power we decided to add SCN10A gene (associated with up to 8% BrS cases) to the NGS panel. Overall, 62% of the identified variants were classified as variants of unknown significance, requiring further investigation to assign a pathogenetic role. In addition, 15% patients carried multiple variants, suggesting a more complex inheritance.

Conclusion: These results underline potential and limits of NGS for heterogeneous conditions: NGS allows fast and efficient variant detection to ameliorate patient clinical management and family counseling, however it highlights the need of precise clinical definition and better strategies for the determination of variant pathogenetic role.

Full-length CYP2D6 diplotyping using PacBio RSII long reads for better drug dosage and response management Henk P.J. Buermans*1, Rolf H.A.M. Vossen1, Seyed Yahya Anvar1, William G. Allard1, Henk-Jan Guchelaar2, Stefan J. White1, Johan T. den Dunnen1,3, Jesse J. Swen2 and Tahar van der Straaten2 Leiden University Medical Center 1 Department of Human Genetics; Leiden Genome Technology Center, Einthovenweg 20, 2333ZC, Leiden, the Netherlands 2 Department of Clinical Pharmacy & Toxicology, Albinusdreef 2, 2333ZA, Leiden, the Netherlands 3 Department of Clinical Genetics, Einthovenweg 20, 2333ZC, Leiden, the Netherlands * Corresponding author: [email protected] The Cytochrome P450 2D6 enzyme, encoded by CYP2D6, is among the most important enzymes involved in drug metabolism. Specific variants in the gene are associated with changes in the enzyme's amount and enzymatic activity, which determines the rate at which drugs get metabolised. Different technologies exist to determine these sequence variants, such as the Roche AmpliChip CYP450 GeneChip, Taqman qPCR or Next Generation Sequencing. However, sequence homology between several cytochrome P450 genes and pseudogene CYP2D7 impairs reliable CYP2D6 genotyping. In addition, and phasing, i.e. to identify the linkage of SNVs or haplotypes present in a subject, cannot be accurately determined with these assays.

Page 19: Variant Effect Prediction - Amazon S3 · ExAC, COSMIC, OMIM, etc), “a mile wide / an inch deep”, collecting large-scale genome-wide information. On the other hand there are the

To circumvent this, we sequenced CYP2D6 for 24 samples with 12 different, clinically relevant, haplotypes using the Pacific Biosciences RSII and obtained high-quality, full-length, phased CYP2D6 sequences, enabling accurate variant calling and haplotyping of the entire gene-locus including exonic, intronic and up and downstream regions. Unphased diplotypes, previously determined with the Roche GeneChip, were confirmed for 21 samples, including a duplication of one of the haplogroup sequences for three of the samples. However, a *5 gene deletion and a tandem duplication of the *2 haplogroup could not be detected due to the ablation of the primer recognition sites on these alleles, or a too long amplicon for these alleles, resulting in the detection of a single haplogroup sequence for these samples. In total 61 unique variants were detected across the 24 samples, as well as a range of variants that had not previously been associated with the described haplotypes. To further aid genomic analysis using standard reference sequences we have established an LOVD-powered CYP2D6 gene-variant database (http://www.LOVD.nl/CYP2D6) and added all reference haplotypes and data reported here. We conclude that our CYP2D6 genotyping approach produces reliable CYP2D6 diplotypes and reveals information about additional variants, including phasing and copy-number variation.

Targeted RNASeq helps to improve prediction of the effect of identified variants/mutations

Bernd Dworniczak*, Diana Frank, Carolin Dreier, Melina Bockermann, Petra Pennekamp Department of Human Genetics, University Hospital Muenster, Vesaliusweg 12, Muenster, Germany [email protected] Petra Pennekamp Department of General Pediatrics, University Children´s Hospital Muenster, Albert-Schweitzer Campus 1, Muenster, Germany Next generation sequencing techniques tremendously improved chances to identify sequence variants. However fixing disease causing mutation still lack behind by several reasons: inappropriate gene specific data bank, insufficient prediction tools and others. In addition the situation is complicated because identified sequence varaints are a mixture of severe, disease causing mutations and a myriad of variants of less pathogenicity. At that an unknown number of silent mutations, neutral polymorphism and sequence variants deeply buried in introns might severely influence splicing of the premature RNA molecule. By solely analysis of the DNA sequence this impact onto the integrity of the mRNA will be completely ignored. In order to catalog the mRNA isoforms derived from genes of our interest we started to set up RNAseq technologies in our routine lab. To reduce the amount of data, to improve the power of analyses and to identify rare isoforms of transcripts we use targeted RNAseq to characterize the mRNA molecules derived from our gene panels we are analyzing (e.g. : hereditary breast cancer core genes (10 genes), hereditary colon cancer (23 genes), primary ciliary dyskinesia (PCD)(40 genes)). Genes involved in PCD offer the invaluable advantage that the tissue where these genes are normally expressed can be recovered very easy from the nose either from healthy probands or from patients suffering from PCD. In addition to direct preparation of RNA from these cilia, cilia carrying cells or tissue can be cultured and manipulated to investigate ciliogenesis. Data resulting from RNASeq experiments are analyzed by established bioinformatics tools (TopHat, Cufflinks and derivatives thereof). We will show results from our work in progress and we hope to convince people to intensify RNA analyses even in routine labs to uncover hidden mechanism and/or mutations impacting mRNA splicing and thereby causing human disease.

Page 20: Variant Effect Prediction - Amazon S3 · ExAC, COSMIC, OMIM, etc), “a mile wide / an inch deep”, collecting large-scale genome-wide information. On the other hand there are the

PLENARY SESSION 8

NGS in diagnostics Anna Beret-Pages MGZ - Medical Genetics Centre, Munich, Germany

The implementation of Next Generation Sequencing (NGS) technology in a clinical laboratory environment is complex and requires substantial infrastructure and expertise in clinical, scientific, and informatics specialties. The main point that should prevent laboratories from prematurely offering NGS diagnostics is insufficient quality. Since it is not possible to simply translate the rules of classical laboratory tests to NGS, quality criteria must be redefined for the use of this new technology in a clinical setting. During this session, the main topics of quality management for NGS in a clinical setting will be addressed: validation of platforms, tests and informatics pipelines. Quality control procedures to maintain accurate performance must be tightly focused around particular NGS applications, such as germline-or somatic-targeted DNA enrichment as well as cell-free DNA. In a similar manner, the use of reference materials to validate analytic and informatics processes required for accurate variant calling should be evaluated depending upon the purpose of the bioinformatics pipeline. Correct detection of SNVs, CNVs or mosaicism depends not only on the pipeline parameters but also on the technical platform used. Moreover, limitations and critical issues have to be stated. Limitations such as pseudogenes and homologous sequences influence the sequence analysis, but also a broad spectrum of other crucial factors such as kit enrichment performance, third party software, structure of databases, and lack of reference dataset may affect accuracy and quality. In addition, revalidation, gene nomenclature, and variant interpretation bias also compromise the analysis. During this session examples of our experiences and useful suggestions about how to deal with the main issues and recommendations on how to establish performance specifications will be thoroughly discussed.

Page 21: Variant Effect Prediction - Amazon S3 · ExAC, COSMIC, OMIM, etc), “a mile wide / an inch deep”, collecting large-scale genome-wide information. On the other hand there are the

DemonstrationsAlamut (Interactive Biosoftware) Andre Blavier & Alexandre Hatzoglou

In the Alamut practical session we will demonstrate and use the Alamut® Visual software package for variant interpretation.

Using Alamut Visual with practical examples we will review core concepts presented in lecture sessions, including:

- Variant features and their potential impact on transcription, RNA processing, translation, protein function, regulation - Variant description using the HGVS nomenclature - Gene variant databases - In silico predictions: Splicing and protein effects - NGS data visualisation: BAM alignments and VCF variant files - Variant sharing

Participants will have the opportunity to install Alamut® Visual on their laptop computer and to use it during the Training Course week.

The Alamut practical session will be run by André Blavier, principal architect of the Alamut software, and Alexandros Hatzoglou, head of software development at Interactive Biosoftware.

Downloads required BEFORE you arrive at the meeting. See website: vep.variome.org

VarAFT & UMD Predictor Christophe Béroud1,2 and Jean-Pierre Desvignes1

1 Aix-Marseille University, INSERM, GMGF, Marseille, France 2 APHM, Hôpital Timone Enfants, Laboratoire de Génétique Moléculaire, Marseille, France

This practical course will allow users to discover two new bioinformatics systems dedicated to the management of NGS data through use cases. The first system is the Variant Annotation and Filtration Tool (VarAFT), which is a freely available standalone multiplatform application for research with an easy to use graphical interface. It provides an overview of experiments’ quality, annotates, and allows the combination and filtration of data stored in VCF, gVCF, ANNOVAR or CLCBIO files. Data from multiple samples or individuals may be combined to address different Mendelian mode of inheritance (autosomal recessive, autosomal dominant, X-linked dominant or recessive and Y-linked also known as holandric), Population Genetics or Cancers. The advanced filtration features allow various granularity searches (mutation, gene, tissue expression) and the incorporation of data from unique systems such as the Human Splicing Finder (HSF - impact on splicing signals) and UMD-Predictor (see below) makes it one of the most efficient NGS data analysis systems. During the course, we will also review the applications of the second system: the UMD-Predictor system, which is today the most efficient pathogenicity prediction tool

Page 22: Variant Effect Prediction - Amazon S3 · ExAC, COSMIC, OMIM, etc), “a mile wide / an inch deep”, collecting large-scale genome-wide information. On the other hand there are the

for missense and synonymous mutations (evaluation on >140,000 annotated variations), the fastest one (3 to 20 times faster) and the more specific, resulting in a shorter list of candidate pathogenic mutations (25% to 50% of other tools on average), thus reducing downstream validation analysis.

Downloads required BEFORE you arrive at the meeting. See website:vep.variome.org

Ensembl genome browser Helen Sparrow

In this workshop we will learn about genes, transcripts and variants in Ensembl. We will explore ways of viewing variants in the Ensembl genome browser; viewing them on the genome (location tab), finding all variants associated with a gene (gene tab), and searching for details of a specific variant (variant tab).

The RD-Connect platform

Steve Laurie

The RD-Connect Platform is an integrated platform connecting -omics data, clinical information, patient registries and biobanks, with the goal of facilitating and accelerating research into rare diseases.

In this hands-on practical session participants will be able to perform candidate variant filtration and prioritisation, using real rare-disease case variant datasets, using the RD-Connect Genomics platform. We will also show how phenotypic information is seamlessly integrated, and how internal and external matchmaking can be performed, all of which is freely available to institutions that are willing to contribute and share genomic data pertaining to rare disease cases.

Sophia Genetics

Leveraging the collective knowledge of the largest clinical genomics community to democratize Data-Driven Medicine Gaetano Bonifacio & Nicole Grieder

Today Sophia Genetics is the global leader in Data-Driven Medicnnine (DDM). The Sophia DDM® platform facilitates and accelerates patients’ diagnosis. Powered by SOPHiA, the collective artificial intelligence, our core technologies PEPPER™, MUSKAT™ and MOKA™ process and analyse raw genomic data to spot pathogenic variants responsible for diseases. Our intuitive user interface allows you to directly interpret results and generate detailed variant reports. From DNA extraction to data analysis, we understand your requirements and help you validate your NGS tests in the lab. We are both ISO 13485 (Medical Devices Quality Management) and ISO 27001 (Information Security Management) certified. Since inception, Privacy and Security have always been part of our corporate DNA. Over 180 healthcare institutions in more than 30 countries trust Sophia Genetics, performing thousands of

Page 23: Variant Effect Prediction - Amazon S3 · ExAC, COSMIC, OMIM, etc), “a mile wide / an inch deep”, collecting large-scale genome-wide information. On the other hand there are the

genome analyses every month… Come and benefit from the world’s largest clinical genomics community!

UCSC genome browser Robert Kuhn

UCSC Hands-on Demonstration: Variant Annotation Integrator (twice)

The Variant Annotation Integrator (VAI) is a web-based (though soon to be also command-line) interpreter of single-nucleotide variants and sort insertions and deletions (indels). Variants are loaded into the Genome Browser as custom tracks in one of two formats: pgSNP and VCF. The former may be loaded by direct upload, while the VCF file must be saved to a server exposed to the internet and loaded via URL. The VAI uses user-configurable resources such as gene sets, conservation scores and SIFT and PolyPhen predictions to assess the biochemical consequences of variants. Output is in Ensembl's VEP format and classifies variants using Sequence Ontology (SO) nomenclature as missense, splice_junction, etc. VCF files may also be loaded directly from the desktop into the Genome Browser if using the Genome Browser-in-a-Box, circumventing the requirement for http:-accessible space and allowing full data privacy.

HPO (Phenomizer) and WES/ WGS analysis using Exomiser Sebastian Köhler

Clinical interpretation of a patient's sequence variants is often a bottleneck. For coding variants, we have previously shown the advantage of a simultaneous consideration of variant pathogenicity score and phenotypic relevance of the affected gene to the patients clinical phenotypes. This combination is boosting the ability of computational tools to identify the disease-causing genomic variation.

The Human Phenotype Ontology (HPO) enables computational and mathematical approaches to estimate how well the patients phenotypes align to a particular disease, gene, or variant. We are going to see how the Phenomizer works - a tool to help in differential diagnostics. Afterwards, we are going to have a look at PhenIX and Exomiser, both of which are web-based tools that provide several algorithms for the prioritization of genomic variants in the context of phenotype information.

We will analyze an example VCF-file and, if you are interested, have a look at a VCF-file that you provide.