1/30 comparative genomics. 2/30 overview of the talk comparing genomes homologies & families...

30
Comparative Genomics

Upload: job-hodge

Post on 18-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

Comparative Genomics

Page 2: 1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

2/30

Overview of the Talk

• Comparing Genomes

• Homologies & Families

• Sequence Alignments

Page 3: 1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

3/30

Evolution at the DNA Level

…ACTGACATGTACCA…

…AC----CATGCACCA…

Mutation

Sequence edits

Rearrangements

Deletion

InversionTranslocationDuplication

Page 4: 1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

4/30

• We can better understand evolution/ speciation

• We can find important, functional regions of the sequence (codons, promoters, regulatory regions)

• It can help us locate genes in other species that are missing or not well-defined (also through comparison and alignments).

Why Compare Genomes?

Page 5: 1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

5/30

Mammals have roughly 3 billion base pairs in their genomes

Over 98% human genes are shared with primates, wth more than 95-98% similarity between genes.

Even the fruit fly shares 60% of its genes with humans! (March 2000)

Differences: gene structure, sequence

Remember… one nucleotide change can cause disease such as sickle cell anemia and cancer.

Comparing Genomes

Page 6: 1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

6/30

• Uses all the species

• Uses a representative protein (the longest) for every gene

• Builds a gene tree

• EnsemblCompara GeneTrees: Analysis of complete, duplication aware phylogenetic trees in vertebrates. Vilella AJ, Severin J, Ureta-Vidal A, Durbin R, Heng L, Birney E. Genome Res. 2008 Nov 24.

How Does Ensembl Predict Homology?

Page 7: 1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

7/30

Load longest protein for every gene from all species

WU Blastp + SmithWaterman longest translation of every gene

against every other (Blast Reciprocal Hit/ Blast Score Ratio)

Protein clustering, build multiple alignments (MCoffee)

From each alignment, build a gene tree (TreeBest)

Reconcile each gene tree with the species tree to determine internal

nodes (TreeBest) Orthologues, paralogues…

Steps in Homology Prediction

..MEDPATA…

Page 8: 1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

8/30

Viewing Trees in Ensembl

Page 9: 1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

9/30

Types of Homologues

• Orthologues : any gene pairwise relation where the ancestor node is a speciation event

• Paralogues : any gene pairwise relation where the ancestor node is a duplication event

Page 10: 1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

10/30

The Gene Tree for INS (insulin precursor)

A red square is a

duplication event

(Paralogues)

A blue square is a

speciation event

(Orthologues)

Page 11: 1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

Reconciliation

M

R

H

M

R

H

species tree

unrooted gene tree

Duplication nodeSpeciation node

M

R

HM

H

R

gene

loss

gene

loss

gene lossR’

H’

M’

Page 12: 1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

12/30

Orthologue Types

What is ‘1 to 1’?

What is ‘1 to many’?

Page 13: 1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

13/30

Protein Families

• How: Cluster proteins for every isoform in every species + UniProt proteins.

• BLASTP comparison of:– all Ensembl ENSP…– all metazoan (animal) proteins in UniProt

Page 14: 1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

14/30

1. Find the human MYL6 gene: go to its gene summary.

2. How many paralogues does it have? Find them in the gene tree.

3. Which paralogue is closest to the human MYL6 gene? In what taxon is the common ancestor?

Homologues ExerciseHomologues Exercise

Page 15: 1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

15/30

Pan-taxonomic compara

Anopheles gambiaeCaenorhabditis elegansDrosophila melanogaster

Aspergillus nidulansNeurospora crassaSaccharomyces cerevisiaeSchizosaccharomyces pombe

B_aphidicola_Tokyo_1998B_burgdorferi_DSM_4680B_subtilisE_coli_K12M_tuberculosis_H37RvN_meningitidis_AP_horikoshiiS_aureus_N315S_pneumoniae_TIGR4S_pyogenes_SF370W_pipientis_wMel

Anolis carolinensisCiona savignyiDanio rerioEquus caballusGallus gallusHomo sapiensMacaca mulattaMonodelphis domesticaMus musculusOrnithorhynchus anatinusPan troglodytesPongo pygmaeusXenopus tropicalis

Dictyostelium discoideumPlasmodium falciparumPlasmodium vivax

Page 16: 1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

16/30

www.ensemblgenomes.org

Page 17: 1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

17/30

Families

Page 18: 1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

18/30

Ensembl Proteins in the Family

Page 19: 1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

19/30

Overview of the Talk

• Comparing Genomes

• Homologies and Families

• Sequence Alignments

Page 20: 1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

20/30

• To identify homologous regions

• To spot trouble gene predictions

• Conserved regions could be functional

• To define syntenic regions (long regions of DNA sequences where order and orientation is highly conserved)

Aligning Whole Genomes- Why?

Page 21: 1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

21/30

Aligning large genomic sequences

Difficulties:• Requires a significant computer resource• Scalability, as more and more genomes are

sequenced• Time constraint• As the «true» alignment is not known, then

difficult to measure the alignment accuracy and apply the right method

Page 22: 1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

22/30

Whole Genome Alignments• BLASTZ-net (nucleotide level) closer species e.g. human – mouse

• Translated BLAT (amino acid level) more distant species, e.g. human – zebrafish

• EPO/PECAN multispecies alignments

• ORTHEUS used to determine ancestral alleles

Page 23: 1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

23/30

Which Multispecies Alignments?

Mercator-Pecan• 16 amniota vertebrates + constrained elements

Enredo-Pecan-Ortheus (EPO)• For 6 primates• For 5 teleost fish + constrained elements• For 12 eutherian mammals• For 34 eutherian mammals + constrained elements

Page 24: 1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

24/30

• “Phylogenetic Footprinting” – conserved noncoding regions can be functional

• Regulatory regions discovered in this way for genes:

Hoxb-1, Hoxb4, PAX6, SOX9

Non-Coding Regions

Page 25: 1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

25/30

More Examples

• Highly conserved transcription factor binding sites discovered

eg. 401 bp non-coding sequence involved in transcriptional regulation of Interleukins.

• New genes (human-mouse comparison)

eg. APOA5, identified as a paralogue to APOA4 in human and mouse.

Page 26: 1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

26/30

Going Beyond Mammals

Where human-mouse is too conserved, go to other species:

Chicken (Mammals and birds: 300MYA)

e.g. A cardiac-specific enhancer of Nkx2-5

Human and fish (400-450 MYA)

In 2002, comparison of human to Fugu rubripes led to identification of 1000 genes.

Page 27: 1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

27/30

Regulatory Features of the PDX1 gene

Region in Detail shows conservation of sequence in regionsinvolved in PDX1 transcriptional regulation (1.6-2.8 kb upstream of the gene).

Page 28: 1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

28/30

1. Have a look at Region in Detail for the ACN9 gene.

2. Turn on the BLASTZ alignment against macaque. What parts of the macaque genome aligns to this region in human?

3. Turn on the constrained elements for the 33 eutherian mammals. How does this track differ from the BLASTZ alignment?

Alignments ExerciseAlignments Exercise

Page 29: 1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

29/30

1. Zoom out one box in the zoom slide.

Are there constrained elements upstream of the ACN9 transcript that overlap a regulatory feature?

2. View the ‘6 primates alignment’ using the Alignments links at the left.

Alignments ContinuedAlignments Continued

Page 30: 1/30 Comparative Genomics. 2/30 Overview of the Talk Comparing Genomes Homologies & Families Sequence Alignments

30/30

Compara Team at EBI

• Javier Herrero• Kathryn Beal• Stephen Fitzgerald• Leo Gordon