1 l u n d u n i v e r s i t y comparative genomics in basidiomycetes - analyzing multigene families...
TRANSCRIPT
1
L U N D U N I V E R S I T Y
Comparative Genomics in Basidiomycetes- Analyzing multigene families
Balaji RajashekarAnders Tunlid
Dag Ahrén
Jason Stajich
2
L U N D U N I V E R S I T Y
Basidiomycete genome data
Protein coding genes
Genome size (Mb)
Laccaria bicolor 20,614 64.9
Coprinopsis cinerea 13,544 36.25-37.5
Phanerochaete chrysosporium
10,048 35.1
Cryptococcus neoformans 7302 19.5
Ustilago maydis 6522 19.7
58,030
3
L U N D U N I V E R S I T Y
Sequence similarity & clustering
• BLASTP
Gene 1
Gene 2
Gene 3Gene 4
Gene 5Gene 6
Gene 7
Gene 8
Gene 9Gene 10
4
L U N D U N I V E R S I T Y
TribeMCL (Enright et al. NAR 2002)
TribeMCL animation
• BLASTP: All against all for the basidiomycete genomes• 58,000 versus 58,000 proteins
• Split generated network into families
• Data and settings dependent
5
L U N D U N I V E R S I T Y
Gene family distribution
Laccaria
Coprinopsis
Phanerochaete
Cryptococcus
Ustilago
Families present 5947 5148 4126 3056 2583
Families not present
1405 2204 3226 4296 4769
Total 7352 7352 7352 7352 7352
6
L U N D U N I V E R S I T Y
Global view of proteins vs genome size
7
L U N D U N I V E R S I T Y
Gene family size distribution
8
L U N D U N I V E R S I T Y
Statistical analyses of gene families
CAFE (Bie et al, Bioinformatics 2006)• Model the evolution of gene family sizes• Takes phylogeny into account• Calculates birth and death of genes in all
nodes• Identifies families with accelerated gene
gain/loss including extinction
9
L U N D U N I V E R S I T Y
Gene family expansions/contractions
Branch
Divergence time (MYA)
Expansion No change Contractions Average expansion
1 246 109 5248 26 0.036
2 167 426 4873 84 0.178
3 57 393 4855 135 0.130
4 84 1064 3844 475 0.695
5 84 459 4111 813 0.056
6 140 371 3291 1721 -0.169
7 308 307 2272 2804 -0.519
8 554 96 2043 3244 -0.655
10
L U N D U N I V E R S I T Y
Protein families in Laccaria
5383 Protein families analysed by CAFE1969 Unique protein families7352 Protein families in total
11
L U N D U N I V E R S I T Y
Example of families >25 Laccaria proteins
Protein family Lac Copr Phae Cryp Ust Pfam accession Pfam description
Significantly Expanded
1* 216 97 91 75 74 PF00400 WD domain, G-beta repeat
2* 150 113 109 86 74 PF00069, PF07714 Protein kinase domain, Protein tyrosine kinase
22 102 13 2 1 0
Unique
5 206 0 0 0 PF00931, PF05729 NB-ARC domain, NACHT domain
17* 128 0 0 0
64 56 0 0 0
12
L U N D U N I V E R S I T Y
Identification of significant families
13
L U N D U N I V E R S I T Y
PCA of expression data
PCA case scores
Axi
s 2
Axis 1
MycPiv
MycPgh
MycD_I
MycD_II
FBE
FBL
K
E
K2MS238N_IMS238N_II
-0.8
-1.5
-2.3
-3.1
0.8
1.5
2.3
3.1
3.8
-0.8-1.5-2.3-3.1 0.8 1.5 2.3 3.1 3.8
PCA variable loadings
Axi
s 2
Axis 1
e_gww1.12.148.1
estExt_fgenesh2_pg.C_360055gww1.55.29.1
e_gwh1.27.56.1
eu2.Lbscf0030g00660
estExt_Genewise1_worm.C_40153
eu2.Lbscf0017g00490
eu2.Lbscf0013g00800
gwh1.7.135.1
gww1.27.41.1
eu2.Lbscf0006g02780
e_gwh1.2.448.1
e_gww1.6.110.1e_gwh1.44.45.1
estExt_GeneWisePlus_human.C_280082
e_gwh1.5.192.1e_gww1.19.31.1
e_gwh1.2.152.1eu2.Lbscf0003g07090
estExt_GeneWisePlus_worm.C_670057
eu2.Lbscf0024g01630gww1.57.59.1
eu2.Lbscf0007g01060
eu2.Lbscf0017g00380
gwh1.1.563.1
eu2.Lbscf0001g04800
estExt_GeneWisePlus_worm.C_10268eu2.Lbscf0002g06120gwh1.1.661.1
e_gww1.9.180.1
eu2.Lbscf0005g05230fgenesh3_pg.C_scaffold_12000242
estExt_Genewise1_human.C_400022gww1.1.1402.1
estExt_fgenesh2_pg.C_50174
gwh1.20.71.1estExt_GeneWisePlus_worm.C_60427e_gwh1.5.229.1
fgenesh3_pg.C_scaffold_40000114
gww1.1.1304.1
e_gww1.1.407.1gww1.6.137.1
fgenesh3_pg.C_scaffold_40000128
gww1.4.783.1
gwh1.37.23.1estExt_fgenesh2_pg.C_10789
estExt_GeneWisePlus_worm.C_10838
e_gww1.58.24.1gwh1.5.298.1
gww1.12.152.1
gwh1.4.771.1
fgenesh3_pg.C_scaffold_5000229
estExt_fgenesh2_pm.C_20123
e_gwh1.1.1345.1
e_gww1.30.42.1
e_gwh1.20.192.1
fgenesh3_pg.C_scaffold_229000001e_gwh1.20.74.1
eu2.Lbscf0003g07170
e_gww1.11.208.1gww1.36.63.1
gwh1.2.846.1
estExt_GeneWisePlus_human.C_90342eu2.Lbscf0068g00700
eu2.Lbscf0031g00330
eu2.Lbscf0151g00020
e_gwh1.1.623.1fgenesh3_pg.C_scaffold_6000181
eu2.Lbscf0003g05610
e_gwh1.5.297.1
gww1.2.997.1
eu2.Lbscf0012g00450
gww1.11.144.1
gww1.17.111.1gww1.3.211.1
e_gww1.2.453.1
e_gwh1.2.226.1
e_gww1.5.565.1
e_gww1.12.82.1
fgenesh3_pg.C_scaffold_6000326eu2.Lbscf0018g02120
fgenesh3_pg.C_scaffold_75000050
gwh1.4.754.1e_gww1.54.37.1
gwh1.10.174.1e_gww1.5.208.1
gww1.1.1346.1
estExt_fgenesh2_pg.C_120345
eu2.Lbscf0068g00180
e_gwh1.36.32.1eu2.Lbscf0001g01480gww1.1.38.1
e_gww1.4.280.1
eu2.Lbscf0060g00980
eu2.Lbscf0015g01810
gwh1.29.47.1fgenesh3_pm.C_scaffold_4000007
estExt_GeneWisePlus_human.C_120035e_gww1.2.361.1e_gww1.1.1259.1e_gww1.20.9.1
eu2.Lbscf0063g00790
eu2.Lbscf0005g02250e_gww1.61.7.1
gww1.11.198.1gwh1.4.285.1gwh1.9.351.1
gww1.5.261.1
e_gww1.2.425.1
eu2.Lbscf0001g02580estExt_GeneWisePlus_worm.C_330026gwh1.8.309.1e_gwh1.3.275.1
e_gww1.50.20.1eu2.Lbscf0004g00600eu2.Lbscf0035g01560
e_gwh1.4.532.1estExt_GeneWisePlus_worm.C_30636
eu2.Lbscf0002g05080eu2.Lbscf0002g09250e_gwh1.11.136.1eu2.Lbscf0010g02740
eu2.Lbscf0003g04790
eu2.Lbscf0003g08400
eu2.Lbscf0001g06630gww1.54.35.1gww1.72.23.1
eu2.Lbscf0004g00650
gww1.5.440.1
estExt_fgenesh2_pg.C_50124
gwh1.2.310.1fgenesh3_pg.C_scaffold_5000516e_gww1.4.568.1
eu2.Lbscf0014g00770
eu2.Lbscf0075g00640
gwh1.12.277.1
eu2.Lbscf0015g01340
eu2.Lbscf0026g01000
eu2.Lbscf0009g02240
e_gwh1.3.317.1
estExt_fgenesh2_pg.C_70239e_gwh1.8.173.1
eu2.Lbscf0015g01850
gww1.21.68.1
fgenesh3_pg.C_scaffold_2000796
fgenesh3_pg.C_scaffold_79000007
eu2.Lbscf0018g02060
eu2.Lbscf0026g00940
estExt_Genewise1_worm.C_30660
estExt_fgenesh2_pg.C_70059
-0.05
-0.11
-0.16
-0.22
-0.27
0.05
0.11
0.16
0.22
0.27
-0.05-0.11-0.16-0.22-0.27 0.05 0.11 0.16 0.22 0.27
Protein family 211 experiments
Mycelia
Mycorrhiza
Fruiting bodies
Axis 1
14
L U N D U N I V E R S I T Y
Comparative Genomics in Basidiomycetes- Analyzing multigene families
Balaji RajashekarAnders Tunlid
Dag Ahrén
Jason Stajich
15
L U N D U N I V E R S I T Y
Identification of significant families
16
L U N D U N I V E R S I T Y