the gene family play and the chromosomal theater todd vision department of biology university of...

58
The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Post on 19-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

The gene family play and the chromosomal theater

Todd Vision

Department of Biology

University of North Carolina at Chapel Hill

Page 2: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Outline

Large-scale duplication and loss of genes in the angiosperms

Looking into the future of plant phylogenomics

A case study in gene family demography

Duplication and functional divergence

Page 3: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Paul Franz, University of Amsterdam

Page 4: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Arabidopsis as a hub for plant comparative maps

genome sizes in angiosperms

145262

367 367 372 415 439 473 560 622

907

0

250

500

750

1000

mega

base

s

data from Arumuganathan & Earle (1991)Plant Mol Biol Rep 9:208-218

Page 5: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Tomato-Arabidopsis synteny

Bancroft (2001) TIG 17, 89 after Ku et al (2000) PNAS 97, 9121

Page 6: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Duplicated genes in Arabidopsis

Page 7: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Modes of gene duplication Tandem (T)

• unequal crossing-over• mostly young

Dispersed (D)• transposition• all ages

Segmental (S)• polyploidy• all old

Page 8: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Paleotetraploidy?

The Arabidopsis Genome Initiative. 2000. Nature 408:796

Page 9: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Vision et al. (2000) Science 290:2114-7.

Page 10: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Microsynteny within blocks

Page 11: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

distribution of dA

Problems• proteins diverge at different rates

• high dA is difficult to estimate

Solution• average dA within blocks

in blocksnot in blocks

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.10

0.0 0.1 0.2 0.2 0.3 0.4 0.5 0.5 0.6 0.7 0.8 0.8 0.9 1.0

amino acid substitution

f

Page 12: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

A B DC E F

0 50 100 150 200 Mya

discrete duplication events

monocots(rice)

Asterids(tomato)

Rosids(Arabidopsis)

110-160 Mya

160-240 Mya

0

2

4

6

8

10

12

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

amino acid substitution

freq

uenc

y of

blo

cks

Page 13: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

the 2-4 complex(one ancestral segment broken up by 4 large

inversions)

2600

3000

3400

3800

4200

1200 1600 2000 2400 2800

chromosome 2 (5.6 Mb)

chro

mos

ome

4 (4

.6 M

b)

45

52

49

54

56

Page 14: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

0

10

20

30

40

50

60

70

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Ka

freq

uen

cy

0

20

40

60

80

100

120

0 1 2 3 4 5

Ks

freq

uen

cy

coefficient of variation = 0.67

coefficient of variation = 0.53

Page 15: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Mayer et al. (2001) Genome Res. 11, 1167

Rice-Arabidopsis microsynteny

Page 16: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Blanc, Hokamp, Wolfe (2003) Genome Res. 13, 137-144.

Page 17: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Ara

bido

psis

Ric

e

Ric

e

Ara

bido

psis

Ara

bido

psis

Ric

e

Ric

e

Ara

bido

psis

duplication

Page 18: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Block 37 after

Asterid-Rosidsplit

Block 57before

monocot-dicot divergence

Raes, Vandepoele, Saeys, Simillion, Van de Peer (2003) J. Struct. Func. Genomics 3, 117-129

Page 19: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Divergence among duplicated genes in rice

Goff et al. (2002) Science 296: 92

Page 20: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Hidden syntenies

Simillion, Vandepoele, Van Montagu, Zabeau, Van de Peer (2002) PNAS 99, 13627

Page 21: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Interspecies comparison can reveal hidden syntenies

Vandepoele, Simillion, Van de Peer (2002) TIG 18, 606-608

Page 22: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Comparative mapping in a phylogenetic context

Page 23: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Major plant genome datasetsFamily Genus genome EST

mapAizoaceae Mesembryanthemum crystallinum XBrassicaceae Arabidopsis thaliana X X X Brassica spp. XFabaceae Glycine max X X Medicago truncatula X X Phaseolus spp. XMalvaceae Gossypium arboreum X XSolanaceae Capsicum annuum X Lycopersicon esculentum X X Solanum tuberosum X XPoaceae Hordeum vulgare X X Oryza sativa X X X Sorghum bicolor/propinguim X X Triticum aestivum X X Zea mays X XOther Beta vulgaris X Chlamydomonas reinhardtii X X Pinus taeda X X Populus spp. X Prunus spp. X

Page 24: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Plant unigene datasetsspecies TIGR PlantGDB

barley 49885 74621beet na 13565chlamydomonas 30296 nacitrus na 4266coffee na 392cotton 24350 27854grape 49885 74621iceplant 8455 8945lettuce 21960 nalotus 11025 namaize 55063 71655marchantia na 1059medicago 36976 43384oat na 361onion 11726 napine 26882 24668poplar na 20935potato 24275 24839rice 60778 52156rye 5199 5384sorghum 33273 34363soybean 67826 73946sunflower 20520 natomato 31012 35725wheat 109509 95949

+ Arabidopsis 27170

Page 25: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Wikström et al (2001) Proc R Soc Lond B 268, 2211

Page 26: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Plant phylogenomics: Phytome

The goal is to integrate• Organismal phylogeny• Gene family

sequencealignmentphylogeny

• Genetic and physical maps

Page 27: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Some uses for Phytome Starting with a chromosome segment

• Identify homologous segments• Predict unobserved gene content (candidate QTL)

Starting with a gene family• Resolve orthology/paralogy relationships• Identify coevolving families

Starting with a species• Explore lineage-specific diversification• Guide comparative mapping wet-work

Page 28: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Homolog identification

Multiple sequence alignment

Protein sequence prediction

Protein family clustering

Phylogenetic inference

Unigenecollections

Annotations

Phytome

Current pipeline

Page 29: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill
Page 30: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Lineage specific diversification

Arabidopsis

Cotton

Medicago

Tomato

Rice

1033

436173

334

696836

715

919

152 genes are “single copy” in all four species

Page 31: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

A tale of two sisters: the ARF and the Aux/IAA gene families

Modulate whole plant response to auxinInteract via dimerization

• ARFs are transcription factors• Aux/IAAs bind and repress ARFs in the

absence of auxin

Page 32: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

The chromosomal context

Page 33: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Diversification of ARFs

Page 34: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Diversification of the Aux/IAAs

Page 35: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill
Page 36: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Why the different patterns of diversification?

12% (ARF) vs 40% (Aux/IAA) segmental duplications

Presumably reflects differential retentionPossible explanations

• Dosage requirements• Coevolution with other interacting genes• Regional transcriptional regulation

Page 37: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Divergence of duplicated genes

Age of duplication

Div

erge

nce

in

expr

essi

on p

rofi

le

Page 38: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Duplicate pairs in yeast and human (Gu et al. 2002, Makova and Li 2003)

Appx. 50% of pairs diverge very rapidly Proportion of divergent pairs increases with

Ks and Ka

• Plateaus at Ka ~0.3 in human

In humans,• Immune response genes over-represented among

young, divergent pairs• Distantly related pairs with conserved expression

tend to be either ubiquitous or very tissue specific

Page 39: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Retention of duplicated genes

Nonfunctionalization, or loss of one copy• The fate of most pairs

Neofunctionalization (NF)• Positive selection on a new mutation can maintain the pair

Subfunctionalization (SF)• Mutations that increase the specificity of duplicates can fix

due to drift provided that, combined, the two copies provide the functionality of the ancestral gene. Once SF happens, both copies are indispensable and are retained.

• One prediction of the model is that SF more likely for tandem than dispersed pairs (due to linkage)

Page 40: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Digital expression profiling Massively Parallel Signature Sequencing (MPSS)

• Count occurrence of 17-20 bp mRNA signatures• Cloning and sequencing is done on microbeads• Similar to Serial Analysis of Gene Expression

(SAGE) “Bar-code” counting reduces concerns of

• cross-hybridization• probe affinity• background hybridization

Advantages• Accurate counts of low expression genes• Can distinguish expression profiles of duplicate genes

Page 41: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

MPSS library constructionAAAAAAA

AAAAAAA

AAAAAAA

AAAAAAAAAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAAAAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

extract mRNA from tissue

AAAAAAATTTTTTT

5’ - Add standard

primer(added by cloning)

3’ - Add unique 32 bp

tag and standard

primer

AAAAAAAmRNA

Cut w/ Sau3A AAAAAAA

TTTTTTT

AAAAAAA

Convert to cDNA

TTTTTTT Add linker

Brenner et al., PNAS 97:1665-70.

Remove 3’ primer and expose single stranded unique tag

(digest, 3' 5' exonuclease)

Anneal to beads coated with unique anti-tag(32 bp, complementary to tag on mRNA) PCR

AAAAAAATTTTTTT

GATC

Page 42: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

MPSS library construction

The result of the library construction is a set of microbeads. Each bead contains many DNA molecules, all derived from the 3’ end of a single transcript.

Beads are loaded in a monolayer on a microscope slide for the sequencing of 17 – 20 bp from the 5’ end.

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAAAAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

AAAAAAA

Brenner et al., PNAS 97:1665-70.

Sort by FACS to remove ‘empty’ beads

Page 43: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

MPSS Sequencing

Repeat Cycle

8 7 6 5

Steps of four bases; overhang is shifted by four

bases in each round

NNNN

Digest with Type IIS enzyme to

uncover next 4 bases

9 bp

13 bp

CNNN 4 3 2 1

^ ^GNNN CODEC4RS DECODERED

Sequence by hybridization

16 cyclesfor 4 bp

NNXN CODEX2

XNNN CODEX4

NXNN CODEX3

NNNX CODEX1RS

RS

RS

RS

4 3 2 1NNNN

+

Add adaptors

Brenner et al., Nat. Biotech. 18:630-4.

Page 44: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

MPSS Sequencing

GATCAATCGGACTTGTCGATCGTGCATCAGCAGTGATCCGATACAGCTTTGGATCTATGGGTATAGTCGATCCATCGTTTGGTGCGATCCCAGCAAGATAACGATCCTCCGTCTTCACAGATCACTTCTCTCATTAGATCTACCAGAACTCGG..GATCGGACCGATCGACT

253212349417561672702814..2,935

123456789..30,285

Each bead provides a signature of 17-20 bp

Tag #SignatureSequence

# of Beads (Frequency)

Two sets of signatures are generated from each sample in different reading frames staggered

by two bases

Total # of tags: >1,000,000

ATG TGA

Page 45: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Classifying signatures

Potential alternative splicing or nested

gene

Potential alternative termination

Potential un-annotated

ORF

Potential anti-sensetranscript

Anti-sense transcript or nested

gene?

Duplicated: expression may

be from other site in genome

Triangles refer to colors used on our web page:Class 1 - in an exon, same strand as ORF.Class 2 - within 500 bp after stop codon, same strand as ORF.Class 3 - anti-sense of ORF (like Class 1, but on opposite strand).Class 4 - in genome but NOT class 1, 2, 3, 5 or 6.Class 5 - entirely within intron, same strand.Class 6 - entirely within intron, anti-sense.

Grey = potential signature NOT expressedClass 0 - signatures found in the expression libraries but not the genome.

or

or

or

or

or

or

Typicalsignatures

Page 46: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Core Arabidopsis MPSS librariessequenced by Lynx for Blake Meyers, U. of Delaware

Signatures Distinct

Library sequenced signatures

Root 3,645,414 48,102

Shoot 2,885,229 53,396

Flower 1,791,460 37,754

Callus 1,963,474 40,903

Silique 2,018,785 38,503

TOTAL 12,304,362 133,377

Page 47: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

http://www.dbi.udel.edu/mpss

Query by• Sequence• Arabidopsis gene identifier• chromosomal position• BAC clone ID• MPSS signature• Library comparison

Site includes• Library and tissue information• FAQs and help pages

Page 48: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Genome-wide MPSS profile in Arabidopsis

Of the 29,084 gene models, 17,849 match unambiguous, expressed class 1 and/or 2 signatures

Chr. I

Chr. II

Chr. III

Chr. IV

Chr. V

Page 49: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Dataset of duplicate pairs

Gene families of size two in Arabidopsis classified as• Dispersed (280)• Segmental (149)• Tandem (63)

For each pair• Measure similarity/distance in expression profile

• Estimate of Ks and KA

Page 50: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Expression distance

library 1

library 2

library 3

Page 51: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

The number of genes with >5 ppm expression in a given number of libraries among the 984 genes in pairs analyzed and among all Arabidopsis genes with MPSS profiles.

Libraries Genes in pairs All genes 0 153 (15.5%) 4160 (23.3%)1 124 (12.6%) 2643 (14.8%)2 73 (7.4%) 1727 (9.6%)3 93 (9.5%) 1777 (10.0%)4 109 (11.1%) 1930 (10.8%)5 432 (43.9%) 5612 (31.4%)

Page 52: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Asymmetry in levels of expression among libraries within pairs

Symmetry of divergenceType of Pair A B C D ________________________________________________________________Young Dispersed (Ks0.5) 14 61 8 6

15.7% 68.5% 9.0% 6.7%

Tandem (Ks0.5) 8 29 10 914.3% 51.8 17.9% 16.1%

Old Dispersed (Ks>0.5) 35 111 24 21

18.3% 58.1% 12.6% 11.0%

Segmental (All) 31 104 7 720.8% 69.8% 4.7% 4.7%

A: Each copy has higher expression in at least one libraryB: One copy has higher expression in all libraries that differ and at least

two libraries differC: Copies differ in expression in only one libraryD: Copies do not differ in expression in any libraries

Page 53: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

0

0.1

0.2

0.3

0.4

0.5

0.6

0.05 0.1 0.15 0.2

nonsynonymous substitution

no

rma

lize

d d

ista

nc

e

D

S

T

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.9 1

synonymous substitution

no

rma

lize

d d

ista

nc

e

D

S

T

dN =0.48+0.37 KA, p<0.0001

Page 54: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

0.05 0.1 0.15 0.2 0.25 0.35 0.4

nonsynonymous substitution

bre

ad

th o

f e

xp

res

sio

n

D

S

T

0

500

1000

1500

2000

2500

3000

3500

4000

0.05 0.1 0.15 0.2 0.25 0.35 0.4

nonsynonymous substitution

tota

l ex

pre

ss

ion

D

S

T

Page 55: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Pairs with small Ks but dissimilar expression profiles.

Ks Ka dup gene pair callus flower leaf rootsilique

0.03 <0.01 D AT1G80700 71 59 11 140 94AT1G80980 0 0 1 8 17

0.17 0.05 T AT2G46280 246 210 160 308 80AT2G46290 28 29 1 29 16

0.20 0.06 T AT2G15400 4 14 5 5 34AT2G15430 42 128 14 136 18

0.22 0.05 D AT1G36280 1 3 9 13 10AT4G18440 40 87 69 69 51

0.26 0.05 T AT1G71270 88 56 44 52 107AT1G71300 0 0 0 0 1

0.27 0.07 T AT3G13290 20 22 1 1 6AT3G13300 246 245 72 192 77

0.27 0.10 T AT1G29390 18 238 89 8 165AT1G29395 0 63 5 0 36

0.27 0.06 T AT3G26070 16 169 346 0 524AT3G26080 349 13 41 4 135

0.28 0.13 D AT3G56190 216 115 144 239 56AT3G56450 15 0 6 4 1

Page 56: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Pairs with large Ks but similar expression profiles.

Ks Ka dup gene pair callus flower leaf rootsilique

0.87 0.28 T AT3G16220 16 10 57 3 19

AT3G16230 21 12 35 13 13

0.89 0.13 D AT3G03660 14 0 0 0 0AT5G17810 71 0 0 0 0

0.95 0.29 D AT2G41180 57 14 78 4 29

AT3G56710 75 15 39 3 14

0.97 0.28 D AT1G31814 2 39 4 3 0

AT5G16320 0 55 10 19 8

0.98 0.23 D AT5G07230 0 344 0 0 0

AT5G62080 0 288 0 0 0

0.99 0.26 D AT3G22160 86 6 10 4 4AT4G15120 34 2 0 0 0

Page 57: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

A closing thought

1965 • The Ecological Theater and the Evolutionary Play,

G. E. Hutchison

2004• The Chromosomal Theater and the Gene Family

Play

Phylogenetics has a great deal to contribute to understanding the evolutionary interplay of genome structure and function

Page 58: The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

Dan BrownBrandon Gaut

Steven TanksleyLiqing Zhang

Jason PhillipsDihui Lu

David RemingtonJason Reed

Tom Guilfoyle

Blake Meyers

NSF