using blast to study gene evolution – an example. introduction to bioinformatics, lesson 3b

50
Using blast Using blast to study to study gene gene evolution – evolution – an example. an example. troduction to bioinformatics, lesson 3b.

Upload: dale-burke

Post on 14-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

Using blast to Using blast to study gene study gene evolution – an evolution – an example.example.

Introduction to bioinformatics, lesson 3b.

Page 2: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

NCBI diagram

Page 3: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

Orthologs

Homologous sequences are Homologous sequences are orthologousorthologous if they were separated by a if they were separated by a speciation event:event:

If a gene exists in a species, and that If a gene exists in a species, and that species diverges into two species, then the species diverges into two species, then the copies of this gene in the resulting species copies of this gene in the resulting species are orthologous.are orthologous.

Page 4: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

Orthologs

• Orthologs will typically have the same or similar function in the course of evolution.

• Identification of orthologs is critical for reliable prediction of gene function in newly sequenced genomes.

Page 5: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

Orthologs

speciation

ancestor

descendant 2descendant 2

Page 6: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

Paralogs

Homologous sequences are Homologous sequences are paralogousparalogous if if they were separated by a they were separated by a gene duplication event: event:

If a gene in an organism is duplicated, If a gene in an organism is duplicated, then the two copies are paralogous. then the two copies are paralogous.

Page 7: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

Paralogs

• Orthologs will typically have the same or similar function.

• This is not always true for paralogs due to lack of the original selective pressure upon one copy of the duplicated gene, this copy is free to mutate and acquire new functions.

Page 8: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

Paralogs

DuplicationDuplication

Page 9: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

Orthologs and Paralogs

Duplication

Speciation

Species a Species b

Paralogs

Orthologs

Orthologs

Page 10: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

NCBI diagram

Page 11: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

What is conservation?

Functionally or structurally important sites are conserved:

Conserved sites “slow” evolving sitesVariable sites “fast evolving” sites

A functionally or structurally important sites – are subject to stronger evolutionary pressure =Purifying selection force

Page 12: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

Finding conservation regions from an alignment

S1 KITAYCELARTDMKLGLDFYKGVSLANWVCLAKWESGYN S2 MPFERCELARTLKRMADADIRGVSLANWVCLAKWFWDGGS3 MPFERCELARTLKRMMDADIRGVSLANWVCLAKWFWDGG

From the MSA and the tree, one can determine how From the MSA and the tree, one can determine how conserved is a gene.conserved is a gene.

Page 13: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

Mol. Biol. Evol. (2005) 22:598-606

Page 14: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

Protocol

Page 15: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

Step 1 - BLAST

Search for Human-mouse orthologous protein pairs

Page 16: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

Step 1 - BLAST

• The orthologs are defined as pairs of reciprocal BLAST hits.

• Eliminate genes with more than one potential orthologous sequence.

• Select only genes which the human protein was functionally annotated.

Page 17: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

Step 2 – Evolutionary Rates

For each orthologous pair:

• Alignment at the amino acid level.

• Measure conservation

The data set contained 6,776 human-mouse gene pairs.

Page 18: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

Step 3 – Assignment of Temporal Categories

Using BLAST for finding homologous genes in 6 different eukaryotic genomes .

Page 19: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

Caenorhabditis elegans Schizosaccharo

myces pombe

Takifugu rubripes

Drosophila melanogaster

Arabidopsis thaliana

Saccharomyces cerevisiae

Page 20: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

What is Old?

• Presence of any homolog in all the 6 genomes.

What is Presence? • Using an e-value cutoff of 10-4 in BLAST.

OLD

METAZOANS

DEUTEROSTOMES

TETRAPODS

Caenorhabditis elegans

Drosophila melanogaster

Takifugu rubripes

Page 21: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

• METAZOANS - Animals whose bodies consist of many cells, as distinct from Protozoa, which are unicellular; all animals commonly recognized as animals.

• DEUTEROSTOMES - The second of the two main groups of bilaterally symmetrical animals. The name derives from 'deutero' (second) 'stome' (mouth), referring to the origin of the definitive mouth as an opening independent from the blastopore of the embryo.

• TETRAPODS - Any four-legged animals, including mammals, birds, reptiles and amphibians.

Page 22: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

Results

Page 23: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

Negative correlation between “age” of genes and the rate of

evolutionCONSERVATION

CONSERVATION

CONSERVATION

CONSERVATION

Page 24: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

Control.• Changing the sensitivity of the BLAST

detection to a more conservative one of 10-10, did not significantly affect the result.

Page 25: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

Explanations

Page 26: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

Functional constraint remained constant throughout the evolutionary history of

each gene, but the newer genes are less constrained than older genes.

Page 27: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

Functional constraints are not constant, rather they are weak at the time of origin of a gene and they become progressively

more stringent with age.

Page 28: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

Eran Elhaik, Niv Sabath, and Dan Graur

Mol. Biol. Evol. 23(1):1–3. 2006

Page 29: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

Goal

• To show that these results are an artifact caused by our inability to detect similarity when genetic distances are large.

Page 30: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

Simulation

Page 31: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

The evolutionary process

Rat

Dog

Cat

Mouse

Fly

AlaArgVal

Ala

Arg

Val

Replacement probabilities

Page 32: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

The evolutionary process

Rat

Dog

Cat

Mouse

Fly

V

AlaArgVal

Ala

Arg

Val

Replacement probabilities

Page 33: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

Rat

Dog

Cat

Mouse

Fly

V

V

The evolutionary process

AlaArgVal

Ala

Arg

Val

Replacement probabilities

Page 34: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

Rat

Dog

Cat

Mouse

Fly

LV

V

The evolutionary process

AlaArgVal

Ala

Arg

Val

Replacement probabilities

Page 35: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

LLIM

V

Rat

Dog

Cat

Mouse

Fly

LL

V

V

The evolutionary process

AlaArgVal

Ala

Arg

Val

Replacement probabilities

Page 36: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

Rat L M T G S H M G N F I IMouse L M T G S G M A N H V ICat I M T G S H I G Y A M FDog M M T G S G I G L T R A Fly V M T G S W R G R M Y A

The evolutionary process

...

And repeat the process for all positions…(assume: each position evolves independently)

Page 37: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

All the genes originated in the common ancestor of A,B,C,D,E and are, thus, of equal age.

Similar to the human and mouse orthologous genes.

Remote homologous genes from increasingly more distant taxa.

Generate terminal sequences with the following phylogenetic relationships:

DA B EC

Page 38: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

Simulation

• They simulated genes with 101 different rates.

• High rate -> higher likelihood for a amino acid replacement in each branch.

Page 39: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

Simulation

Use BLAST, at the same way that Alba and Castresana used it, to detect homology between gene A to genes C,D and E.

Page 40: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

Only one different – the groups names

OLD

METAZOANS

DEUTEROSTOMES

TETRAPODS

SENIORS

ADULTS

TEENAGERS

TODDLERS

Page 41: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

Results

Page 42: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

Same as Alba and Castresana

Page 43: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

But all the simulated genes are at the same “age”.

What is the problem???

Page 44: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

We can only count genes that are identified as homologous by the

protocol

Page 45: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

Alba and Castresana may have, thus, failed to spot the vast

majority of homologs from among the fastest evolving genes

Page 46: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

The vast majority of the fastest evolving genes are undetectable even when the cutoffs are extremely permissive.

Page 47: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

Conclusion

Page 48: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

The inverse relationship between evolutionary rate and gene age is an artifact caused by our inability to detect similarity when genetic distances are large.

Page 49: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

• Since genetic distance increases with time of divergence and rate of evolution, it is difficult to identify homologs of fast evolving genes in distantly related taxa.

• Thus, fast evolving genes may be misclassified as “new”.

Page 50: Using blast to study gene evolution – an example. Introduction to bioinformatics, lesson 3b

So, the only conclusion that can be drawn from Alba and

Castresana’s study is that

Slowly evolving genesevolve slowly

!!!