![Page 1: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/1.jpg)
How to access genomic information
using Ensembl
Damian Smedley and Xosé Fernández
Ensembl Project
European Bioinformatics InstituteCambridge, UK
November 2004
![Page 2: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/2.jpg)
2 of 45
Schedule
Today
Introduction to the Ensembl system
Hands-on examples to introduce the system
Evaluating genes and transcripts
Variation in Ensembl (SNPs, haplotypes)
Tomorrow
Data mining with EnsMart
Comparative genomics and proteomics in Ensembl
BioMart
Advanced topics (Upload your own data, DAS)
![Page 4: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/4.jpg)
4 of 45
Other ordering data
to 26,720 overlapping clones
From 325,109 initial contigs
Assembly
non-redundant, “virtual contig” view
![Page 5: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/5.jpg)
finished BAC
draft
sequenceassembly
WGS
fragment
pUCsavg size 2-4 kb
Bentley et al 2001Bruls et al 2001McPherson et al 2001Montgomery et al 2001Tilford et al 2001
mapOsoegawa et al 2001
fragment
BACsbacterial artificial chromosomesavg size 150 kb
Shizuya et al 1992Dib et al 1996Deloukas et al 1998
Mapping and Sequencing the human genome
![Page 6: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/6.jpg)
Status of the human sequencefinished red /orange~96% (99.999% accurate)
30-40% repetitive elements (eg Alpha satellite, Alu repeats)
All known genes, correctly identified (99.74%)
heterochromatin~4% grey
Assembled draft sequence totals 2.85 Gb
![Page 7: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/7.jpg)
7 of 45
Human genome: Current status
• 22,287 'gene loci‘ defined, consisting of 19,599 protein-coding genes in the human genome and 2,188 DNA additional segments ‘predicted’ to be protein-coding genes
– 1183 genes ‘were born’ in the last 60-100 My– ~ 30 genes ‘died’ in a similar time period
Finishing the euchromatic sequence of the human genome, Nature 431:931-45 (2004)
![Page 8: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/8.jpg)
8 of 45
Ensembl - project aims
• funded to provide metazoan genomes to the world• aims to provide the world’s best automated
genome annotation• a leading group for human and mouse analysis• all software, data and results freely available
![Page 9: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/9.jpg)
9 of 45
Ensembl - project background
• group split between EBI and Sanger• mainly Wellcome Trust funded • largest dedicated compute in biology in Europe• developer community > 100 people, including
companies
![Page 10: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/10.jpg)
10 of 45
Freely-availableCommunity development.
– >51 Ensembl installs worldwide.
– Both public and commercial,
e.g. Gramene (CSHL)
Fugu-sg (ICMB)
Ciona-sg (Temasek)
Ensembl – Open source
![Page 11: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/11.jpg)
11 of 45
Analysis DB
CPU
Final DB
SupportingDatabases
SNP
ManualAnnotation
Ensembl
![Page 12: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/12.jpg)
12 of 45
Genome browsingwhy present the whole genome?
• Explore what is in a chromosome region• See features in and around a specific gene• Search & retrieve across the whole genome• Investigate genome organization• Compare to other genomes
![Page 13: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/13.jpg)
13 of 45
• Ensembl – public site + installable system
Genome browsers
• NCBI Map Viewer
• UCSC Human Genome Browser
http://www.ensembl.org
http://www.ncbi.nlm.nih.gov/mapview
http://genome.ucsc.edu
![Page 14: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/14.jpg)
14 of 45
Introduction to the
Ensembl web site Ensembl … …
takes genomic sequence assemblieshuman build 34, mouse, rat, Fugu,mosquito
adds annotation and links automated process
presents all the data on a web site
![Page 15: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/15.jpg)
15 of 45
Known genes Novel genes
• where?• genomic structure?• transcripts(s)?• protein(s)?• orthologues?• attach useful links
• how to predict?require evidence• transcripts(s)?• protein(s)?• orthologues?• attach useful links
Annotation: genes
![Page 16: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/16.jpg)
16 of 45
Annotation: other features
• markers and SNPs• cytogenetic bands• repeated sequences• ESTs & other sequence records where do they show sequence similarity?
• regions homologous to other species
![Page 17: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/17.jpg)
17 of 45
How to get started … …
• Species homepage
• Site map
• Map View
• Text search
• BLAST
• SSAHA
• Disease View
![Page 18: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/18.jpg)
Homepage
![Page 19: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/19.jpg)
Site map
![Page 20: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/20.jpg)
MapView
AnchorView
![Page 21: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/21.jpg)
BLAST and SSAHA
![Page 22: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/22.jpg)
BLAST and SSAHA
![Page 23: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/23.jpg)
23 of 45
Regions, maps and markers
MarkerView
SNPView
ContigView
CytoView
SyntenyView
MultiContigView
![Page 24: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/24.jpg)
EnsemblContigView
![Page 25: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/25.jpg)
ContigView close-up
EvidenceTranscriptsred & black(Ensembl predictions)Blue (Vega)
Customising& short cuts
Pop-up menu
![Page 26: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/26.jpg)
ContigView - Chromosome 20 close-up
Manualannotationvia Vega
Ensembl predictions
Ensembl EST-based predictions
Forw
ard
strandR
everse strand
Other chromosomes with manual annotation from http://vega.sanger.ac.uk: 6, 7, 9, 10, 13, 14, 20, 22, X
![Page 27: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/27.jpg)
CytoView
![Page 28: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/28.jpg)
GeneSNPView
![Page 29: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/29.jpg)
MarkerView
SNPView
![Page 30: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/30.jpg)
SyntenyView
![Page 31: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/31.jpg)
MultiContigView
![Page 32: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/32.jpg)
32 of 45
Genes & gene products
GeneView
TransViewExonView
ProteinView
FamilyView
DomainView
GOView
DiseaseView
![Page 33: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/33.jpg)
EnsemblGeneView
![Page 34: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/34.jpg)
TransView ExonView
![Page 35: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/35.jpg)
ProteinView
![Page 36: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/36.jpg)
FamilyView
![Page 37: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/37.jpg)
GOView
![Page 38: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/38.jpg)
DiseaseView
![Page 39: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/39.jpg)
39 of 45
Data retrieval
EnsMart
Data sets on ftp site
MySQL queries of databases
Perl API access to databases
Export View
![Page 40: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/40.jpg)
ExportView
![Page 41: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/41.jpg)
EnsMart
![Page 42: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/42.jpg)
42 of 45
Mouse differences
• Genomic sequence assembly based on whole genome shotgun, with finished ‘stitched’ BACs
• BACs are shown in CytoView (FPC map), but for most no sequence is available
![Page 43: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/43.jpg)
MouseCytoView
![Page 44: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/44.jpg)
44 of 45
Help!
• context sensitive help pages - click
• access other documentation via generic home page
• email the helpdeskHelpDesk / Suggestions
![Page 46: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November](https://reader038.vdocuments.us/reader038/viewer/2022103005/56649e575503460f94b4f8eb/html5/thumbnails/46.jpg)
Database Schema and Core API
Arne Stabenau
Yuan Chen
Ian Longden
Craig Melsopp
Glenn Proctor
Daniel Ríos
Guy Slater
Distributed Annotation System
Andreas Kähäri
Project Leader
Ewan Birney (EBI)
Tim Hubbard (Sanger)
Ensembl Web Team
James Stalker
Fiona Cunningham
James Smith
Vega Web Team
Patrick Meidl
Steve Trevianon
Analysis and
Annotation Pipeline
Val Curwen
Steve Searle
Dan Andrews
Mario Caccamo
Laura Clarke
Martin Hammond
Jan Hinnerck-Vogel
Kevin Howe
Vivek Iyer
Kerstin Jekosch
Felix Kokocinski
Simon White
User Support
Xosé Mª Fernández
Michael Schuster
Comparative Genomics
Abel Ureta-Vidal
Javier Herrero Sánchez
Jessica Severin
Cara Woodwark
EnsMart & BioMart
Arek Kasprzyk
Damian Keefe
Darin London
Damian Smedley
Ensembl TeamEnsembl Team
November 2004