families with >5 genes are more common in plants than in animals adapted from lockton s, gaut bs....

24
families with >5 genes are more common in plants than in animals adapted from Lockton S, Gaut BS. 2005. Trends Genet 21: 60-65 0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 90.0 100.0 1 2 3-5 >5 Num ber ofgenes perfam ily Percentage ofgenes H um an Y east Fruitfly Nem atode Rice Arabidopsis

Upload: lesley-hodge

Post on 18-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Families with >5 genes are more common in plants than in animals adapted from Lockton S, Gaut BS. 2005. Trends Genet 21: 60-65

families with >5 genes are more common in plants than in animals

adapted from Lockton S, Gaut BS. 2005. Trends Genet 21: 60-65

0.010.020.030.040.050.060.070.080.090.0

100.0

1 2 3-5 >5

Number of genes per family

Per

cen

tag

e o

f g

enes

Human

Yeast

Fruit fly

Nematode

Rice

Arabidopsis

Page 2: Families with >5 genes are more common in plants than in animals adapted from Lockton S, Gaut BS. 2005. Trends Genet 21: 60-65

alternative splicing (AS) is more common in animals than in plants

Boue S, et al. 2003. BioEssays 25: 1031-1034; Iida K, et al. 2004. Nucleic Acids Res 32: 5096-5103; Kikuchi S, et al. 2003. Science 301: 376-379

Arabidopsis and rice AS

Page 3: Families with >5 genes are more common in plants than in animals adapted from Lockton S, Gaut BS. 2005. Trends Genet 21: 60-65

duplications occur on any length scale, from individual genes (where tandem refers to a gene and its duplicate being adjacent), to multi-gene segments of the chromosome, to an entire genomee.g. wild wheat is diploid 2n, domestication gave a tetraploid 4n (pasta) and a hexaploid 6n (bread)

synteny is when 2 or more genes are found in the same order/orientation on the chromosomes of related species

Page 4: Families with >5 genes are more common in plants than in animals adapted from Lockton S, Gaut BS. 2005. Trends Genet 21: 60-65

polyploidy (whole genome duplication) events among plants

adapted from Blanc G, Wolfe KH. 2004. Plant Cell 16: 1667-1678; Paterson AH, et al. 2004. Proc Natl Acad Sci USA 101: 9903-9908

mon

ocot

dico

t

Page 5: Families with >5 genes are more common in plants than in animals adapted from Lockton S, Gaut BS. 2005. Trends Genet 21: 60-65

phylogeny of the favored plantsthere is extensive synteny among Gramineae but between Gramineae and Arabidopsis there is essentially no synteny

sorghum

maize

Arabidopsis

barley

wheat

rice

Gramineae 55~70 Mya

monocot-dicot 170~235 Mya

Page 6: Families with >5 genes are more common in plants than in animals adapted from Lockton S, Gaut BS. 2005. Trends Genet 21: 60-65

the duplication history of riceevery cDNA-defined gene is assigned a duplication category

using the methods of Yu J, et al. 2005. PLoS Biol 3: e38

1. analysis relies entirely on 19,079 full length cDNAs; had we used predicted genes instead many of the duplications would have been missed

2. a homolog pair refers to a cDNA and its TblastN match (i.e. comparisons done at amino acid level to genome translation in all 6 reading frames) at an expectation value of 1E-7 and requiring that >50% be aligned; note that the TblastN match is not necessarily expressed itself

3. if a gene has any homologs at all, the mean(median) number of homologs is 40(5)

4. multiple duplications are difficult to analyze; so consider the cDNAs with 1-and-only-1 homolog

Page 7: Families with >5 genes are more common in plants than in animals adapted from Lockton S, Gaut BS. 2005. Trends Genet 21: 60-65

ONE whole genome duplication, a recent segmental duplication, and many individual gene duplications

birth

death

whole genome

individual genes

recent segmental

time

Page 8: Families with >5 genes are more common in plants than in animals adapted from Lockton S, Gaut BS. 2005. Trends Genet 21: 60-65

18 pairs of duplicated segments covering 65.7% of rice genomehigher order homologs used to backfill established trend lines

RiceChr01Chr02Chr03Chr04Chr05Chr06Chr07Chr08Chr09Chr10Chr11Chr120

10

20

30

40

0 10 20 30Rice Chr02 (Mb)

Rice-Rice Comparison

segmental

Page 9: Families with >5 genes are more common in plants than in animals adapted from Lockton S, Gaut BS. 2005. Trends Genet 21: 60-65

ancient whole genome duplication (WGD) in rice

Page 10: Families with >5 genes are more common in plants than in animals adapted from Lockton S, Gaut BS. 2005. Trends Genet 21: 60-65

uninterpretable plot if use cDNAs with more than one homolog in rice

mean (median) number of homologs per duplicated gene is 40 (5)

RiceChr01Chr02Chr03Chr04Chr05Chr06Chr07Chr08Chr09Chr10Chr11Chr120

10

20

30

40

0 10 20 30Rice Chr02 (Mb)

Rice-Rice Comparison

Page 11: Families with >5 genes are more common in plants than in animals adapted from Lockton S, Gaut BS. 2005. Trends Genet 21: 60-65

unmarked trend along diagonal from tandem gene duplicationsthere were NO segmental duplications within a chromosome

RiceChr01Chr02Chr03Chr04Chr05Chr06Chr07Chr08Chr09Chr10Chr11Chr120

10

20

30

40

0 10 20 30 40Rice Chr01 (Mb)

Rice-Rice Comparison

tandem

background

Page 12: Families with >5 genes are more common in plants than in animals adapted from Lockton S, Gaut BS. 2005. Trends Genet 21: 60-65

computing molecular clocks and indicators of evolutionary selection

Ka = non-synonymous changes per available site

Ks = synonymous changes per available site

available site corrects for fact that 76% of substitutions, or 438 of 3364, encode a different amino acid

Ka/Ks < 1 is evidence of purifying selection

Ka/Ks = 1 is evidence of no selection (pseudogene)

Ka/Ks > 1 is evidence of adaptive selection

mean Ka/Ks is 0.20 in primates and 0.14 in rodents

Page 13: Families with >5 genes are more common in plants than in animals adapted from Lockton S, Gaut BS. 2005. Trends Genet 21: 60-65

from neutral substitution rate to time since divergence of species

neutral substitution rates vary with genes and evolutionary lineages but on average they are 2.2×10-9 for mammals and 6.5×10-9 for Gramineae

Kumar S, Hedges SB. 1998. Nature 392: 917-920

common ancestor

species1 species2

time since divergence equals species2-species1 divided by (2 × neutral substitution rate)

Page 14: Families with >5 genes are more common in plants than in animals adapted from Lockton S, Gaut BS. 2005. Trends Genet 21: 60-65

17 of 18 segments are attributable to a whole genome duplication just before the Gramineae divergence

higher order homologsKs from K-Estimator

0

30

60

90

0 0.5 1 1.5subs per silent site, Ks

Rice-Rice segmental duplicationtwo TblastN hits are allowedKs from K-Estimator

0

100

200

300

400

0 0.2 0.4 0.6subs per silent site, Ks

Rice-Rice tandem duplication

timing of WGD relative to Gramineae divergence is based on observed syntenies and not Ks

Page 15: Families with >5 genes are more common in plants than in animals adapted from Lockton S, Gaut BS. 2005. Trends Genet 21: 60-65

background duplications have Ks signature like tandem duplications except that they are more ancient

two TblastN hits are allowedKs from K-Estimator

0

100

200

300

400

0 0.2 0.4 0.6subs per silent site, Ks

Rice-Rice tandem duplicationone and only one homologKs from K-Estimator

0

50

100

150

200

0 1 2 3subs per silent site, Ks

Rice-Rice background duplication

peak at zero Ks and exponential decay thereafter is indicative of ongoing duplication process

Page 16: Families with >5 genes are more common in plants than in animals adapted from Lockton S, Gaut BS. 2005. Trends Genet 21: 60-65

duplicated genes undergo periods of relaxed selection and are usually silenced within 4~17 million years

hypothesis introduced by Lynch M, Conery JS. 2000. Science 290: 1151; with details in Lynch M, Conery JS. 2003. J Struct Funct Genomics 3: 35

one copy left alone

one copy to modify

eventual death

novel function

progenitor gene

relaxed selection

reduced expression

post-duplicative ‘transient’ of duration

4~17 million years

Page 17: Families with >5 genes are more common in plants than in animals adapted from Lockton S, Gaut BS. 2005. Trends Genet 21: 60-65

rice analysis succeeded only because duplication is not too old

when the duplication is old: an analysis from yeast comparing related genomes with and without the duplicationKellis M, et al. 2004. Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature 428: 617-624

when the duplication is extremely new: an analysis from humanBailey JA, et al. 2002. Recent segmental duplications in the human genome. Science 297: 1003-1007

Page 18: Families with >5 genes are more common in plants than in animals adapted from Lockton S, Gaut BS. 2005. Trends Genet 21: 60-65

proof of whole genome duplication in Saccharomyces cerevisiae by

comparison to sequence of Kluyveromyces waltii

duplication

mutation

gene death

interleaving genes from sister segments in comparison to K. waltii

Page 19: Families with >5 genes are more common in plants than in animals adapted from Lockton S, Gaut BS. 2005. Trends Genet 21: 60-65

gene and regional correspondences with K. waltii

Page 20: Families with >5 genes are more common in plants than in animals adapted from Lockton S, Gaut BS. 2005. Trends Genet 21: 60-65

ancient whole genome duplication in S. cerevisiae

Page 21: Families with >5 genes are more common in plants than in animals adapted from Lockton S, Gaut BS. 2005. Trends Genet 21: 60-65

identifying recent segmental duplications in human assembly

whole genome shotgun (WGS) reads from Celera are aligned to map-based genome from IHGSC; recent segmental duplications are detected in similarity and read depth anomalies

Page 22: Families with >5 genes are more common in plants than in animals adapted from Lockton S, Gaut BS. 2005. Trends Genet 21: 60-65

patterns of intra-chromosomal and inter-chromosomal duplication

recent segmental duplications of length>10-kb & identity>95%; intra-chromosomal (blue lines) and inter-chromosomal (red bars) duplication; unique regions

surrounded by intra-chromosomal duplications (gold bars) are hot spots for genomic disorders

Page 23: Families with >5 genes are more common in plants than in animals adapted from Lockton S, Gaut BS. 2005. Trends Genet 21: 60-65

recent segmental duplications in IHGSC and Celera genomes

proportion of Celera aligned bases falls rapidly as identity exceeds 97% or length exceeds 15-kb, but the total sequence lost is still only 2%~3%

NB: search of the map-based rice genome revealed no segmental duplications of recent origins (Yu J, et al. 2006. Trends Plant Sci 11: 387-391

Page 24: Families with >5 genes are more common in plants than in animals adapted from Lockton S, Gaut BS. 2005. Trends Genet 21: 60-65

“Although it is clear that the detailed clone-ordered approach is superior in the resolution of segmental duplications, it would be unrealistic to propose that the sequencing community should abandon whole-genome-shotgun based approaches. These are the most efficient cost-effective means of capturing the bulk of the euchromatic sequence.”

Evan E. Eichler (21 October 2004)