comparative genomics
DESCRIPTION
Comparative Genomics. Overview. Orthologues and paralogues Protein families Genome-wide DNA alignments Syntenic blocks. Comparative Genomics. Allows us to achieve a greater understanding of vertebrate evolution - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Comparative Genomics](https://reader030.vdocuments.us/reader030/viewer/2022033100/56815a6f550346895dc7d1dc/html5/thumbnails/1.jpg)
1 of 33
Comparative GenomicsComparative Genomics
![Page 2: Comparative Genomics](https://reader030.vdocuments.us/reader030/viewer/2022033100/56815a6f550346895dc7d1dc/html5/thumbnails/2.jpg)
2 of 33
OverviewOverview
• Orthologues and paralogues
• Protein families
• Genome-wide DNA alignments
• Syntenic blocks
![Page 3: Comparative Genomics](https://reader030.vdocuments.us/reader030/viewer/2022033100/56815a6f550346895dc7d1dc/html5/thumbnails/3.jpg)
3 of 33
Comparative GenomicsComparative Genomics
• Allows us to achieve a greater understanding of vertebrate evolution
• Tells us what is common and what is unique between different species at the genome level
• The function of human genes and other regions may be revealed by studying their counterparts in lower organisms
• Helps identify both coding and non-coding genes and regulatory elements
![Page 4: Comparative Genomics](https://reader030.vdocuments.us/reader030/viewer/2022033100/56815a6f550346895dc7d1dc/html5/thumbnails/4.jpg)
4 of 33
Species in EnsemblSpecies in Ensembl
CAMBRI ORDO SIL DEV CARBON PER TRIA JURA CRETAC TERTIA
57
0
50
5
43
8
40
8
36
0
28
6
24
5
20
8
14
4
65
MY
BP
FISHES
BIRDSREPTILES
MAMMALS PLACENTALS
MONOTREMES
MARSUPIALS
OTHER BIRDS
PALEOGNATHS
PASSERINES
CROCODILES
TURTLES
LIZARDS
AMPHIBIANS
TELEOSTS
SHARKS
RAYS
LATIMERIA
BICHIR/POLYPTERUS
LUNGFISHES
AGNATHANS
NON-VERTEBRATES
![Page 5: Comparative Genomics](https://reader030.vdocuments.us/reader030/viewer/2022033100/56815a6f550346895dc7d1dc/html5/thumbnails/5.jpg)
5 of 33
Orthologue / Paralogue Prediction Orthologue / Paralogue Prediction AlgorithmAlgorithm
(1) Load the longest translation of each gene from all species used in Ensembl.
(2) Run WUBLASTp+SmithWaterman of every gene against every other (both self and non-self species) in a genome-wise manner.
(3) Build a graph of gene relations based on Best Reciprocal Hits (BRH) and Blast Score Ratio (BSR) values.
(4) Extract the connected components (=single linkage clusters), each cluster representing a gene family.
(5) For each cluster, build a multiple alignment based on the protein sequences using MUSCLE.
(6) For each aligned cluster, build a phylogenetic tree using PHYML. An unrooted tree is obtained at this stage.
(7) Reconcile each gene tree with the species tree to call duplication event on internal nodes and root the tree, using RAP.
(8) From each gene tree, infer gene pairwise relations of orthology and paralogy types.
![Page 6: Comparative Genomics](https://reader030.vdocuments.us/reader030/viewer/2022033100/56815a6f550346895dc7d1dc/html5/thumbnails/6.jpg)
6 of 33
• Orthologues :any gene pairwise relation where the ancestor node is a speciation event
• Paralogues :any gene pairwise relation where the ancestor node is a duplication event
Homologue RelationshipsHomologue Relationships
![Page 7: Comparative Genomics](https://reader030.vdocuments.us/reader030/viewer/2022033100/56815a6f550346895dc7d1dc/html5/thumbnails/7.jpg)
7 of 33
Orthologue and Paralogue TypesOrthologue and Paralogue Types
![Page 8: Comparative Genomics](https://reader030.vdocuments.us/reader030/viewer/2022033100/56815a6f550346895dc7d1dc/html5/thumbnails/8.jpg)
8 of 33
Orthologue and Paralogue typesOrthologue and Paralogue types
![Page 9: Comparative Genomics](https://reader030.vdocuments.us/reader030/viewer/2022033100/56815a6f550346895dc7d1dc/html5/thumbnails/9.jpg)
9 of 33
GeneView
![Page 10: Comparative Genomics](https://reader030.vdocuments.us/reader030/viewer/2022033100/56815a6f550346895dc7d1dc/html5/thumbnails/10.jpg)
10 of 33
GeneView
![Page 11: Comparative Genomics](https://reader030.vdocuments.us/reader030/viewer/2022033100/56815a6f550346895dc7d1dc/html5/thumbnails/11.jpg)
11 of 33
GeneTreeView
GeneTree
MUSCLEprotein alignment
![Page 12: Comparative Genomics](https://reader030.vdocuments.us/reader030/viewer/2022033100/56815a6f550346895dc7d1dc/html5/thumbnails/12.jpg)
12 of 33
GeneTreeView
Duplication node (red)
Speciation node (blue)
![Page 13: Comparative Genomics](https://reader030.vdocuments.us/reader030/viewer/2022033100/56815a6f550346895dc7d1dc/html5/thumbnails/13.jpg)
13 of 33
Protein DatasetProtein Dataset
More than 1,500,000 proteins clustered:
• All Ensembl protein predictions from all species supported~ 670,000 protein predictions
• All metazoan (animal) proteins in UniProt:~ 80,000 UniProt/Swiss-Prot~ 830,000 UniProt/TrEMBL
![Page 14: Comparative Genomics](https://reader030.vdocuments.us/reader030/viewer/2022033100/56815a6f550346895dc7d1dc/html5/thumbnails/14.jpg)
14 of 33
Clustering StrategyClustering Strategy
• BLASTP all-versus-all comparison
• Markov clustering
• For each cluster:• Calculation of multiple sequence
alignments with ClustalW• Assignment of a consensus description
![Page 15: Comparative Genomics](https://reader030.vdocuments.us/reader030/viewer/2022033100/56815a6f550346895dc7d1dc/html5/thumbnails/15.jpg)
15 of 33
Link to FamilyView
GeneView / TransView / ProtView
![Page 16: Comparative Genomics](https://reader030.vdocuments.us/reader030/viewer/2022033100/56815a6f550346895dc7d1dc/html5/thumbnails/16.jpg)
16 of 33
Ensembl family members
within human
UniProt family members
Ensembl family members in
other species
Consensus annotation
JalView multiple alignments
FamilyView
![Page 17: Comparative Genomics](https://reader030.vdocuments.us/reader030/viewer/2022033100/56815a6f550346895dc7d1dc/html5/thumbnails/17.jpg)
17 of 33
JalView
![Page 18: Comparative Genomics](https://reader030.vdocuments.us/reader030/viewer/2022033100/56815a6f550346895dc7d1dc/html5/thumbnails/18.jpg)
18 of 33
Whole Genome AlignmentsWhole Genome Alignments
• Functional sequences evolve more slowly than non-functional sequences, therefore sequences that remain conserved may perform a biological function.
• Comparing genomic sequences from species at different evolutionary distances allows us to identify:• Coding genes• Non-coding genes• Non-coding regulatory sequences
![Page 19: Comparative Genomics](https://reader030.vdocuments.us/reader030/viewer/2022033100/56815a6f550346895dc7d1dc/html5/thumbnails/19.jpg)
19 of 33
Selection of Species for DNA Selection of Species for DNA comparisonscomparisons
Both coding and non-coding sequences
~70-75%
~150 MYA
4.2
Opossum
0.42.53.0Size (Gbp)
~65%~80%>99%Sequence
conservation (in coding regions)
Primarily coding
sequences
Both coding and non-coding sequences
Recently changed
sequences and genomic
rearrangements
Aids identification of…
~450 MYA~ 65 MYA~5 MYATime since divergence
PufferfishMouseChimpanzeeHuman vs..
![Page 20: Comparative Genomics](https://reader030.vdocuments.us/reader030/viewer/2022033100/56815a6f550346895dc7d1dc/html5/thumbnails/20.jpg)
20 of 33
Alignment AlgorithmAlignment Algorithm
• Should find all highly similar regions between two sequences
• Should allow for segments without similarity, rearrangements etc.
• Issues• Heavy process• Scalability, as more and more genomes are
sequenced• Time constraint
![Page 21: Comparative Genomics](https://reader030.vdocuments.us/reader030/viewer/2022033100/56815a6f550346895dc7d1dc/html5/thumbnails/21.jpg)
21 of 33
BLASTZ-net, tBLAT and PECANBLASTZ-net, tBLAT and PECAN
• BLASTZ-net (comparison on nucleotide level) is used for species that are evolutionary close, e.g. human - mouse
• Translated BLAT (comparison on amino acid level) is used for evolutionary more distant species, e.g. human - zebrafish
• PECAN is used for multispecies alignments• 7 eutherian mammals• 10 amniota vertebrates
![Page 22: Comparative Genomics](https://reader030.vdocuments.us/reader030/viewer/2022033100/56815a6f550346895dc7d1dc/html5/thumbnails/22.jpg)
22 of 33
BLASTZ-net, tBLAT and PECANBLASTZ-net, tBLAT and PECANFor which combinations of species whole genome alignments have been done is shown on the Comparative Genomics page(Help & Documentation > Genomic Data > Comparative Genomics):
![Page 23: Comparative Genomics](https://reader030.vdocuments.us/reader030/viewer/2022033100/56815a6f550346895dc7d1dc/html5/thumbnails/23.jpg)
23 of 33
ContigView
Constrained elements
Conservation score
Blastz mouse
tBLAT zebrafish
PECAN alignments
![Page 24: Comparative Genomics](https://reader030.vdocuments.us/reader030/viewer/2022033100/56815a6f550346895dc7d1dc/html5/thumbnails/24.jpg)
24 of 33
Conserved sequences
human
Conserved sequences
dog
MultiContigView
![Page 25: Comparative Genomics](https://reader030.vdocuments.us/reader030/viewer/2022033100/56815a6f550346895dc7d1dc/html5/thumbnails/25.jpg)
25 of 33
Human
AlignSliceView
Rat
Dog
Mouse
![Page 26: Comparative Genomics](https://reader030.vdocuments.us/reader030/viewer/2022033100/56815a6f550346895dc7d1dc/html5/thumbnails/26.jpg)
26 of 33
MultiContigView vs. AlignSliceView
![Page 27: Comparative Genomics](https://reader030.vdocuments.us/reader030/viewer/2022033100/56815a6f550346895dc7d1dc/html5/thumbnails/27.jpg)
27 of 33
AlignView
![Page 28: Comparative Genomics](https://reader030.vdocuments.us/reader030/viewer/2022033100/56815a6f550346895dc7d1dc/html5/thumbnails/28.jpg)
28 of 33
GeneSeqalignView
![Page 29: Comparative Genomics](https://reader030.vdocuments.us/reader030/viewer/2022033100/56815a6f550346895dc7d1dc/html5/thumbnails/29.jpg)
29 of 33
GeneSeqalignView
![Page 30: Comparative Genomics](https://reader030.vdocuments.us/reader030/viewer/2022033100/56815a6f550346895dc7d1dc/html5/thumbnails/30.jpg)
30 of 33
Syntenic BlocksSyntenic Blocks
• Genome alignments are refined into larger syntenic regions
• Alignments are clustered together when the relative distance between them is less than 100 kb and order and orientation are consistent
• Any clusters less than 100 kb are discarded
![Page 31: Comparative Genomics](https://reader030.vdocuments.us/reader030/viewer/2022033100/56815a6f550346895dc7d1dc/html5/thumbnails/31.jpg)
31 of 33
Human chromosome
Mouse chromosomes
Mouse chromosomes
SyntenyView
Orthologues
![Page 32: Comparative Genomics](https://reader030.vdocuments.us/reader030/viewer/2022033100/56815a6f550346895dc7d1dc/html5/thumbnails/32.jpg)
32 of 33
CytoView
Syntenic blocks
Orientation Chromosome
![Page 33: Comparative Genomics](https://reader030.vdocuments.us/reader030/viewer/2022033100/56815a6f550346895dc7d1dc/html5/thumbnails/33.jpg)
33 of 33
QQ&&AAQ U E S T I O N SQ U E S T I O N S
A N S W E R SA N S W E R S