bioinformatics and evolutionary genomics · r. prowazekii h. pylori . 26695. synechocystis sp. h....

41
3/4/2013 1 Bioinformatics and Evolutionary Genomics Evolution of Genomes, Proteomes, Networks and Complexes Berend Snel Associate Professor Theoretical Biology and Bioinformatics Department of Biology Science Faculty Utrecht University

Upload: others

Post on 08-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

3/4/2013 1

Bioinformatics and Evolutionary Genomics

Evolution of Genomes, Proteomes, Networks and Complexes

Berend Snel Associate Professor

Theoretical Biology and Bioinformatics Department of Biology

Science Faculty Utrecht University

Page 2: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

Today

• Lecture on homology and domains • Introduction on general aims of the course

and on procedural stuff

Page 3: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

Requests • very heterogeneous with respect to previous

knowledge (IBMB, GB, research projects, PhD students)

• PLEASE: interrupt / ask questions when I am going to fast, when I use jargon, when I make jumps/conclusions that to me seem obvious 100% logical, but to your are erratic; please point out my implicit assumptions regarding what everybody knows

• Computer exercises: more experienced people help

• And also apologies for some redundancy

Page 4: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

The (human) genome: why does it look the way it does?

Page 5: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

Why ? To do stuff! molecular biology and systems biology

Page 6: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

But. Design logic is not so obvious: it is the result of an evolutionary process.

And classic two types of why

Page 7: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

Gene Content

bag of genes

Why does the genome contain the genes that it does

Page 8: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

Gene loss, gene duplication and gene invention shaped these phyletic patterns (and thus our

genome)

Page 9: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

C. pneumoniae

C. trachomatis

M. tuberculosis M. pneumoniae M. genitalium

B. subtilis

T. pallidum

T. maritima

B. burgdorferi

P. horikoshii

M. thermoautotrophicum

A. fulgidus

M. jannaschii

S. cerevisiae C. elegans

A. aeolicus

E. coli

H. influenzae R. prowazekii

H. pylori 26695

Synechocystis sp.

H. pylori J99

A. pernix 100 100

100 100

100

100 100

100

100 98

93 89

69

88

0.1

97

Proteobacteria

Eukarya

Euryarchaeota

Low G+C Gram-Positive Bacteria

100

Snel Bork Huynen Nature Genet 1999 Huynen Snel Bork Science 1999

Page 10: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

Fritz-Laylin et al. cell 2010

Page 11: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

… however

Fokkens and Snel PLoS Comp Biol 2009

Page 12: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

Mediator

•Essential for transcription •Associated with general transcription machinery •Bridge •25 subunits •Four submodules

Page 13: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

S.po

Y.lip

S.cer K.lacC.alb

D.han

A.fum A.nidG.zea N.cra

U.mayC.neo

P.chr

L.bic R.ory

E.cun

D.mel

H.sap M.musC.ele

E.hist D.dis

root

C.merA.thal O.tau

C.rei

T.theCryp

Thei

P.falcPhyt

T.pseP.tri

GiardN.gruL.maj

Tryp

Animals Amoebozoa

Fungi Excavata Chrom- Alveolates*

Archaeplastids ?

Page 14: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

divisions Fungi Animals Chromalveloates ExcevataAscomycota Basidomycota

subm

odul

e

Med

iato

r su

buni

t

spec

ies

S.ce

r K.

lac

D.ha

n

C.a

lb

Y.lip

A.ni

g A.

fum

N.c

ra G

.zea

S.po

U.m

ay

C.n

eo

P.ch

r

L.bi

c

R.o

ry

E.cu

n

H.s

ap

M.m

us

D.m

el

C.el

e

D.di

s

E.h

ist

A.th

al

O.ta

u

C.re

i

C.m

er

P.fa

lc

Cryp

Thei

T.th

e

Phyt

T.ps

e

P.tri

N.gr

u

L.m

aj T

ryp

Gia

rd

Med15 x x x x x x x x x x x x x x x x x x x x x xMed16 x x x x x x x x x x xMed14 x x x x x x x x x x x x x x x x x x x x x x xMed3 xMed2 x

Med10 x x x x x x x x x x x x x x x x x x x x x x x x xMed1 x x x x x x x x x x x x x xMed4 x x x x x x x x x x x x x x x x x x x xMed7 x x x x x x x x x x x x x x x x x x x x x x x x x x x xMed9 x x x x x x x x x x xMed5 x x x x x x x x x x xMed31 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x xMed21 x x x x x x x x x x x x x x x x x x x x x x x x

Med11 x x x x x x x x x x x x x x xMed6 x x x x x x x x x x x x x x x x x x x x x x x x x x x x xMed20 x x x x x x x x x x x x x x x x xMed18 x x x x x x x x x x x x x x x x xMed17 x x x x x x x x x x x x x x x x x x x xMed22 x x x x x x x x x x x x x x x x x x x x x xMed8 x x x x x x x x x x x x x x x x x x x x xMed19 x x x x x x x x x x x x x

Cdk8 x x x x x x x x x x x x x x x x x x x x x xCycC x x x x x x x x x x x x x x x x x x x x xMed13 x x x x x x x x x x x x x x x x x x x x xMed12 x x x x x x x x x x x x x x x x x x x x x

Med23 x x x x x x x xMed24 x x x xMed25 x x x x xMed26 x x xMed27 x x x x x x x x x x x x xMed28 x x x x x xMed29 x x x xMed30 x x x

Amoe- bozoa

Archeaplas-tids

CDK

unkn

own

Tail

Mid

dle

Head

S.po

Y.lip

S.cer K.lacC.alb

D.han

A.fum A.nidG.zea N.cra

U.mayC.neo

P.chr

L.bic R.ory

E.cun

D.mel

H.sap M.musC.ele

E.hist D.dis

root

C.merA.thal O.tau

C.rei

T.theCryp

Thei

P.falcPhyt

T.pseP.tri

GiardN.gruL.maj

Tryp

Page 15: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

many partial losses, few gains

Page 16: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

Presence of genes across genomes: orthology

bags of genes

Why does the genome contain the genes that it does

Page 17: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

Bag of genes, what about the homologs within a genome? (paralogs, duplicates)

Page 18: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

Ub

Cdk1

Cdk1

Cyclin B Securin

Ub Ub

Ub

Ub Ub

Ub Ub

Anaphase Promoting Complex/Cyclosome (APC/C)

Initiating Anaphase

Page 19: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

Mitotic Checkpoint

Cdk1

Cyclin B Securin

The Mitotic Checkpoint

X

Page 20: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

The Mitotic Checkpoint Complex (MCC)

BubR1

Mad2

Cdc20

Mps1

Mitotic Checkpoint

BubR1

Mad2

Cdc20

MCC

Kinetochore

AurB Bub1

Page 21: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

Kinase TPR

KEN BUB3-binding

ScBub1 HsBub1

(fungi & vertebrates)

TPR

KEN BUB3-binding

Kinase TPR

BUB3-binding CDI

hsBubR1 (vertebrates)

scMad3p (fungi)

Page 22: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

9 independent duplications. 7 cases where a mad3-like and a bub1-like protein arose out of a bubmad-like ancestor. Parallel or convergent evolution?

Page 23: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

What about the kinase domain in human bubr1?

degeneration of motifs essential for catalysis

Page 24: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

• the preserved ‘catalytic’ residues are essential for BUBR1 conformational stability in vitro and in cells. Uncoupling potential enzymatic activity from structural stability shows that catalysis is dispensable for BUBR1 function in mitosis.

• Suggest that BUBR1 is an atypical pseudokinase.

Page 25: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

What about the kinase domain in human (and fly)

Page 26: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

What about all these BubMad containing species?

What about the ancestor?

? Subfunctionalization? (cf. TOR but with convergent evolution of domain architecture)

Page 27: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

What about all these bubmad containing species? What about the ancestor?: experimentally testing

subfunctionalization

Assembled hsBubMad protein from 1-714 bubr1 & bub1 734-1085 is able to functionally replace both bub1 and bubr1 in human cells !

Page 28: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

This course • I want study evolution of genomes pathways and

networks, so that is why I study gene/protein evolution • At the end to be more or less able to replicate e.g.

bubmad, mediator • Understanding that many bioinformatic challenges are

a mix of conceptual and technical problems (e.g. why orthology is such an incredibly persistent problem)

• “what you should ~know” in order to this kind of research

• Topics are interrelated – e.g. orthology already in homology lecture but proper

explanation a day after – e.g. that trees can be used to time a duplication to

eukaryogenesis but proper discussion of eukaryogenis has its own lecture

Page 29: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

Homology (& domains) • Absolute basis of any comparative analysis,

affects MSA and trees, detection still being improved,

Page 30: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

Gene Phylogeny & Orthology

• How do we get such trees and how do we interpret them

• Trees reveal some of the most important genome evolution processes: LGT, duplication, loss,

Page 31: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

(Eukaryotic) tree of life • New genomes, cool/exotic animals / protists • Berend / Eelco / John / Like / Leny sitting behind

his/her computer and thinking should I include this genome? How should I interpret an absence? What source of species could I best use for homology based gene prediction.

• Crucial when interpreting gene trees: – Knowing it by heart >>> having to look it up

• With regards to evolutionary signaling cell biology ( kinases, smallgtpases etc. )the diversity in present day genomes is staggering and dwarfs e.g. human-fruit fly difference

Page 32: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

Large scale orthology

• Needed to move beyond anecdotes, but difficult to get

Page 33: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

RapGAP (animals(LSE), fungi, dicty)

PHYSOJ14061 Phytophthora sojae 142624 PHYINF15173 Phytophthora infestans PITG 15173

RalGAPB (oomycetes, dicty, naegleria, fungi, animals))

RalGAPA (dicty, naegleria, fungi, animals)

RheBGAP (TSC2, oomycetes, diatoms, red algea, animals, fungi, dicty, tetrahymena

99

13

823

31

100

24

Eukaryogenesis / LECA

• Biological topic, eukaryogensis / LECA for which these types of analyses are telling us a lot. But it also impacts a lot of things we do: we see it back in gene trees and it impacts getting orthologous groups across eukaryotes.

Page 34: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

Gene content evolution

• Fundamental level of genome evolution • Gene invention -> inability to detect homologs vs real

lack of homologs does not simply mean novel gene • Evolutionary modules? • Trying to move large scale but remember the pitfalls

Page 35: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

Whole Genome Duplication (WGD)

• Like LECA, WGD is important biology for which bioinf needed to research but which also impacts our data

• And which is welcome source of information for our analyses (Lidija, bubmad): independent and reliable reconstruction of the history of part of the history of genes

Page 36: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

Using HTP data to study evolution of networks / complexes

• Is the number of conserved interactions between e.g. yeast and human 10% or 95%???

• On top of all the genome analysis pitfalls also all the HTP data pitfalls …

• Duplicates vs orthologs

Page 37: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

Techniques AND biology

• Detective/forensics vs concepts; Large scale biology vs small-scale biology; Bioinformatics biology vs data/technique problems;

• A lot like police investigation … and less like Nobel prize winning physics ...

• Anything goes in genome evolution; many processes often entangled (i.e. google subneofunctionalization)

Page 38: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

Literature discussion • You should have read the papers before the day of the

discussion • On day itself group split in critique / defense (not WGD) • Groups prepare defense and critique • Discussion: (critique starts because we all have read the

paper) – Critique gives outline why paper is weak – break – Defense responds to critique – break – Critique gives final comments to which immediately response

can be made – Defense gives final comments to which immediately response

can be made

Page 39: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

Computer Exercises

• Computer exercises for some topics many others more difficult (i.e. evolution of interaction networks based on HTP analysis).

• Previous years too much cookbook. Attempt at less cookbook, more playing around → you should learn more; but it is slower

• Ask help from fellow students. • Ties strongly into mini-projects

Page 40: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

Mini projects • A protein • what does swissprot / sgd already knows about your protein

function wise (GO) (pathway) • protein topology. (domains, disorder, TM?, motifs?) • "Most diverged" sequence in same genome ... that is still an

homolog. • Homologs across tree of life • Point of invention of family? if bacterial invention, mito or archaeal

route? • Does it have WGD duplicates? • Tree of relevant sequences in diverse genomes • Orthologs in relevant genomes • (normally relevant genomes would be a few metazoa, fungi, other

ophistokonts, amoebazoa, strameopiles, alveolates, plantae, excavates, see e.g. bubmad but it depends on your protein)

Page 41: Bioinformatics and Evolutionary Genomics · R. prowazekii H. pylori . 26695. Synechocystis sp. H. pylori . J99. 100. A. pernix 100. 100. 100. 100. 100 100 100 100 98 93 89. 69. 88

Requests • very heterogeneous with respect to previous

knowledge (IBMB, GB, research projects, PhD students)

• PLEASE: interrupt / ask questions when I am going to fast, when I use jargon, when I make jumps/conclusions that to me seem obvious 100% logical, but to your are erratic; please point out my implicit assumptions regarding what everybody knows

• Computer exercises: more experienced people help

• And also apologies for some redundancy