comparative genomics

33
1 of 33 Comparative Genomics Comparative Genomics

Upload: keefe

Post on 05-Feb-2016

72 views

Category:

Documents


0 download

DESCRIPTION

Comparative Genomics. Overview. Orthologues and paralogues Protein families Genome-wide DNA alignments Syntenic blocks. Comparative Genomics. Allows us to achieve a greater understanding of vertebrate evolution - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Comparative Genomics

1 of 33

Comparative GenomicsComparative Genomics

Page 2: Comparative Genomics

2 of 33

OverviewOverview

• Orthologues and paralogues

• Protein families

• Genome-wide DNA alignments

• Syntenic blocks

Page 3: Comparative Genomics

3 of 33

Comparative GenomicsComparative Genomics

• Allows us to achieve a greater understanding of vertebrate evolution

• Tells us what is common and what is unique between different species at the genome level

• The function of human genes and other regions may be revealed by studying their counterparts in lower organisms

• Helps identify both coding and non-coding genes and regulatory elements

Page 4: Comparative Genomics

4 of 33

Species in EnsemblSpecies in Ensembl

CAMBRI ORDO SIL DEV CARBON PER TRIA JURA CRETAC TERTIA

57

0

50

5

43

8

40

8

36

0

28

6

24

5

20

8

14

4

65

MY

BP

FISHES

BIRDSREPTILES

MAMMALS PLACENTALS

MONOTREMES

MARSUPIALS

OTHER BIRDS

PALEOGNATHS

PASSERINES

CROCODILES

TURTLES

LIZARDS

AMPHIBIANS

TELEOSTS

SHARKS

RAYS

LATIMERIA

BICHIR/POLYPTERUS

LUNGFISHES

AGNATHANS

NON-VERTEBRATES

Page 5: Comparative Genomics

5 of 33

Orthologue / Paralogue Prediction Orthologue / Paralogue Prediction AlgorithmAlgorithm

(1) Load the longest translation of each gene from all species used in Ensembl.

(2) Run WUBLASTp+SmithWaterman of every gene against every other (both self and non-self species) in a genome-wise manner.

(3) Build a graph of gene relations based on Best Reciprocal Hits (BRH) and Blast Score Ratio (BSR) values.

(4) Extract the connected components (=single linkage clusters), each cluster representing a gene family.

(5) For each cluster, build a multiple alignment based on the protein sequences using MUSCLE.

(6) For each aligned cluster, build a phylogenetic tree using PHYML. An unrooted tree is obtained at this stage.

(7) Reconcile each gene tree with the species tree to call duplication event on internal nodes and root the tree, using RAP.

(8) From each gene tree, infer gene pairwise relations of orthology and paralogy types.

Page 6: Comparative Genomics

6 of 33

• Orthologues :any gene pairwise relation where the ancestor node is a speciation event

• Paralogues :any gene pairwise relation where the ancestor node is a duplication event

Homologue RelationshipsHomologue Relationships

Page 7: Comparative Genomics

7 of 33

Orthologue and Paralogue TypesOrthologue and Paralogue Types

Page 8: Comparative Genomics

8 of 33

Orthologue and Paralogue typesOrthologue and Paralogue types

Page 9: Comparative Genomics

9 of 33

GeneView

Page 10: Comparative Genomics

10 of 33

GeneView

Page 11: Comparative Genomics

11 of 33

GeneTreeView

GeneTree

MUSCLEprotein alignment

Page 12: Comparative Genomics

12 of 33

GeneTreeView

Duplication node (red)

Speciation node (blue)

Page 13: Comparative Genomics

13 of 33

Protein DatasetProtein Dataset

More than 1,500,000 proteins clustered:

• All Ensembl protein predictions from all species supported~ 670,000 protein predictions

• All metazoan (animal) proteins in UniProt:~ 80,000 UniProt/Swiss-Prot~ 830,000 UniProt/TrEMBL

Page 14: Comparative Genomics

14 of 33

Clustering StrategyClustering Strategy

• BLASTP all-versus-all comparison

• Markov clustering

• For each cluster:• Calculation of multiple sequence

alignments with ClustalW• Assignment of a consensus description

Page 15: Comparative Genomics

15 of 33

Link to FamilyView

GeneView / TransView / ProtView

Page 16: Comparative Genomics

16 of 33

Ensembl family members

within human

UniProt family members

Ensembl family members in

other species

Consensus annotation

JalView multiple alignments

FamilyView

Page 17: Comparative Genomics

17 of 33

JalView

Page 18: Comparative Genomics

18 of 33

Whole Genome AlignmentsWhole Genome Alignments

• Functional sequences evolve more slowly than non-functional sequences, therefore sequences that remain conserved may perform a biological function.

• Comparing genomic sequences from species at different evolutionary distances allows us to identify:• Coding genes• Non-coding genes• Non-coding regulatory sequences

Page 19: Comparative Genomics

19 of 33

Selection of Species for DNA Selection of Species for DNA comparisonscomparisons

Both coding and non-coding sequences

~70-75%

~150 MYA

4.2

Opossum

0.42.53.0Size (Gbp)

~65%~80%>99%Sequence

conservation (in coding regions)

Primarily coding

sequences

Both coding and non-coding sequences

Recently changed

sequences and genomic

rearrangements

Aids identification of…

~450 MYA~ 65 MYA~5 MYATime since divergence

PufferfishMouseChimpanzeeHuman vs..

Page 20: Comparative Genomics

20 of 33

Alignment AlgorithmAlignment Algorithm

• Should find all highly similar regions between two sequences

• Should allow for segments without similarity, rearrangements etc.

• Issues• Heavy process• Scalability, as more and more genomes are

sequenced• Time constraint

Page 21: Comparative Genomics

21 of 33

BLASTZ-net, tBLAT and PECANBLASTZ-net, tBLAT and PECAN

• BLASTZ-net (comparison on nucleotide level) is used for species that are evolutionary close, e.g. human - mouse

• Translated BLAT (comparison on amino acid level) is used for evolutionary more distant species, e.g. human - zebrafish

• PECAN is used for multispecies alignments• 7 eutherian mammals• 10 amniota vertebrates

Page 22: Comparative Genomics

22 of 33

BLASTZ-net, tBLAT and PECANBLASTZ-net, tBLAT and PECANFor which combinations of species whole genome alignments have been done is shown on the Comparative Genomics page(Help & Documentation > Genomic Data > Comparative Genomics):

Page 23: Comparative Genomics

23 of 33

ContigView

Constrained elements

Conservation score

Blastz mouse

tBLAT zebrafish

PECAN alignments

Page 24: Comparative Genomics

24 of 33

Conserved sequences

human

Conserved sequences

dog

MultiContigView

Page 25: Comparative Genomics

25 of 33

Human

AlignSliceView

Rat

Dog

Mouse

Page 26: Comparative Genomics

26 of 33

MultiContigView vs. AlignSliceView

Page 27: Comparative Genomics

27 of 33

AlignView

Page 28: Comparative Genomics

28 of 33

GeneSeqalignView

Page 29: Comparative Genomics

29 of 33

GeneSeqalignView

Page 30: Comparative Genomics

30 of 33

Syntenic BlocksSyntenic Blocks

• Genome alignments are refined into larger syntenic regions

• Alignments are clustered together when the relative distance between them is less than 100 kb and order and orientation are consistent

• Any clusters less than 100 kb are discarded

Page 31: Comparative Genomics

31 of 33

Human chromosome

Mouse chromosomes

Mouse chromosomes

SyntenyView

Orthologues

Page 32: Comparative Genomics

32 of 33

CytoView

Syntenic blocks

Orientation Chromosome

Page 33: Comparative Genomics

33 of 33

QQ&&AAQ U E S T I O N SQ U E S T I O N S

A N S W E R SA N S W E R S