ensembl online training series 2016 › training › online › sites ›...

32
EBI is an Outstation of the European Molecular Biology Laboratory. Ensembl online training series 2016 Helen Sparrow Ensembl Outreach team EMBL-EBI

Upload: others

Post on 29-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

EBI is an Outstation of the European Molecular Biology Laboratory.

Ensembl online training series

2016

Helen SparrowEnsembl Outreach team

EMBL-EBI

Course Objectives

● What is Ensembl?● What types of data you can get in Ensembl● How to navigate the Ensembl browser website● Where to go for help and documentation

This webinar courseDate Webinar topic Instructor

24th March

Introduction to Ensembl Emily Perry

31st March

Ensembl genes Denise Carvalho-Silva

7th April Data export with BioMart Helen Sparrow

14th April

Variation data in Ensembl and the Ensembl VEP Denise Carvalho-Silva

21st April

Comparing genes and genomes with Ensembl Compara Helen Sparrow

28th April

Finding features that regulate genes – the Ensembl Regulatory Build

Emily Perry

5th May Uploading your data to Ensembl and advanced ways to access Ensembl data

Ben Moore

Questions?

• Use the Chat box in the webinar interface

• My Ensembl colleagues will respond

• There’s no threading so please start responses with @username

Emily Perry Denise Carvalho-Silva

Ben Moore

Structure

Presentation:How we produce/process the data

Demo:Viewing the data

Exercises:On the train online course

EBI is an Outstation of the European Molecular Biology Laboratory.

Module 5:Comparing genes and

genomes with

Ensembl Compara

Outline

• Comparative genomics: applications

• Protein alignments• Gene trees

• Homology predictions

• Whole genome alignments• pairwise

• multiple

• Shared synteny

Applications of comparative genomics

Comparative genomics allows us to understand:

• vertebrate evolution

• differences between species at the genome level

• gene function based on homology

• the distribution of highly conserved regions

Gene Level

● Protein alignment

● Protein/Gene Trees

● Homologues: Orthologues and Paralogues

● Pan-compara

● Gene families

Whole Genome

● Whole genome Alignments

● Syntenic Regions

Comparative Genomics in Ensembl

Gene Trees & Homologues

• Based on protein alignments

• Representative protein of each Ensembl gene

• Blast+

• Multiple protein alignment with M-coffee

• Build phylogenetic tree with TreeBeST

• Reconciliation with species tree (to infer ancestral nodes)

• Orthologue/Paralogue inference

http://www.ensembl.org/info/docs/compara/homology_method.html

all-vs-all blastp + hcluster_sg

Orthologues and Paralogues

Homology relationships

SpeciationDuplication

c1 h1 c2 h2

ParaloguesGenes emerged through a duplication eventc1 and c2h1 and h2

OrthologuesGenes emerged through a speciation eventc1 and h1h2 and mc2 and m

m

One-to-one One-to-many

Comparative analysis by taxa

Ensembl Compara

Ensembl Metazoa Compara

Pan-taxonomic compara● Gene trees and homologous genes across a wider taxonomic

range of species

● An extended analysis including several vertebrates, protists, plants, bacteria, fungi, and invertebrate metazoa

10 Ensembl Vertebrates

9 Ensembl Plants

7 Ensembl Fungi

18 Ensembl Metazoa

14 Ensembl Protists

137 Ensembl Bacteria

http://ensemblgenomes.org/info/genomes?pan_compara=1

Pan-taxonomic compara

Ensembl Metazoa Compara

Homologues in BioMart

Dataset Genes

Filters Has

homologues in species

Attributes Homologue

ID, type, ancestor

Results table

In the gene tab:

● We analyse gene families using every Ensembl isoform

● We import additional Uniprot metazoa sequences

● Defined by an HMM library, based on Panther Database

Gene families

Hands on

• We’re going to look at the human BRCA2 gene to find homologues

• Search the ensembl.org homepage for BRCA2 and go to the gene tab

Whole genome alignments

• To identify highly conserved regions• sequences that evolve slowly

• regions likely to be functional

• both coding and non-coding sequences

• To spot trouble gene predictions

• To define syntenic regions

• Types: pairwise and multiple (specified groups)

Whole Genome Alignments

Pairwise alignments

• LASTZ-net

Multi-species Groups

• Pre-selected sets• EPO (Enredo-Pecan-Ortheus) analysis

• (11 fish, 7 sauropsids, 39 eutherian, 8 primates)

• Mercator-Pecan analysis• For 23 amniota vertebrates (mammals+birds)

http://www.ensembl.org/info/genome/compara/analyses.html#pecan

Constrained Elements

• GERP scores: for every nucleotide in a multi-species alignment we calculate how conserved it is

• Peaks show high sequence conservation

• Constrained elements - blocks of high sequence conservation

Shared synteny

http://www.ensembl.org/info/docs/compara/analyses.html

100 kb regions with high sequence conservation and gene order

Hands on

• We will look at a human genomic region 2:176087000-176202000 which contains the HoxD cluster to find alignments and conservation regions.

• The HoxD cluster is involved in limb development and is highly conserved between species.

Next webinar courseDate Webinar topic Instructor

24th March

Introduction to Ensembl Emily Perry

31st March

Ensembl genes Denise Carvalho-Silva

7th April Data export with BioMart Helen Sparrow

14th April

Variation data in Ensembl and the Ensembl VEP Denise Carvalho-Silva

21st April

Comparing genes and genomes with Ensembl Compara Helen Sparrow

28th April

Finding features that regulate genes – the Ensembl Regulatory Build

Emily Perry

5th May Uploading your data to Ensembl and advanced ways to access Ensembl data

Ben Moore

Next webinar – Finding features that regulate genes The Ensembl Regulatory Build

28th April, 4pm BST

The Ensembl Regulatory Build incorporates data from sources

including ENCODE, Roadmap Epigenomics and Blueprint to predict

the positions of features involved in regulating gene expression,

such as promoters and enhancers. Learn about how the build

works and how to find regulatory features on the genome.

Note that these data are currently only available for human and

mouse.

Course exerciseshttp://www.ebi.ac.uk/training/online/course/ensembl-browser-webinar-

series-2016

This text will be replaced by a YouTube (link to YouKu too) video of the webinar

and a pdf of the slides.

The “next page” will be the exercisesA link to exercises and

their solutions will appear in the page

hierarchy

Get help with the exercises

• Use the exercise solutions in the online course

• Join our Facebook group and discuss the exercises with everybody (see the online course for the link)

• Email us: [email protected]

Help and documentationCourse online http://www.ebi.ac.uk/training/online/subjects/11

Tutorials www.ensembl.org/info/website/tutorials

Flash animations

www.youtube.com/user/EnsemblHelpdesk

http://u.youku.com/Ensemblhelpdesk

Email us [email protected]

Ensembl public mailing lists [email protected], [email protected]

Follow us

www.facebook.com/Ensembl.org

@Ensembl

www.ensembl.info

Publications

Yates, A. et al

Ensembl 2016

Nucleic Acids Research

http://europepmc.org/articles/4702834

Xosé M. Fernández-Suárez and Michael K. SchusterUsing the Ensembl Genome Server to Browse Genomic Sequence Data.Current Protocols in Bioinformatics 1.15.1-1.15.48 (2010)www.ncbi.nlm.nih.gov/pubmed/20521244

Giulietta M Spudich and Xosé M Fernández-SuárezTouring Ensembl: A practical guide to genome browsingBMC Genomics 11:295 (2010)www.biomedcentral.com/1471-2164/11/295

http://www.ensembl.org/info/about/publications.html

Ensembl 2015

AcknowledgementsThe Entire Ensembl Team

Funding

Co-funded by the European Union

Questions?

• You can continue to use the chat box

• I will read out loud any further questions and answer on the screen

Emily Perry Denise Carvalho-Silva

Ben Moore