bioinformatics and evolutionary genomics · r. prowazekii h. pylori . 26695. synechocystis sp. h....

Post on 08-Jun-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

3/4/2013 1

Bioinformatics and Evolutionary Genomics

Evolution of Genomes, Proteomes, Networks and Complexes

Berend Snel Associate Professor

Theoretical Biology and Bioinformatics Department of Biology

Science Faculty Utrecht University

Today

• Lecture on homology and domains • Introduction on general aims of the course

and on procedural stuff

Requests • very heterogeneous with respect to previous

knowledge (IBMB, GB, research projects, PhD students)

• PLEASE: interrupt / ask questions when I am going to fast, when I use jargon, when I make jumps/conclusions that to me seem obvious 100% logical, but to your are erratic; please point out my implicit assumptions regarding what everybody knows

• Computer exercises: more experienced people help

• And also apologies for some redundancy

The (human) genome: why does it look the way it does?

Why ? To do stuff! molecular biology and systems biology

But. Design logic is not so obvious: it is the result of an evolutionary process.

And classic two types of why

Gene Content

bag of genes

Why does the genome contain the genes that it does

Gene loss, gene duplication and gene invention shaped these phyletic patterns (and thus our

genome)

C. pneumoniae

C. trachomatis

M. tuberculosis M. pneumoniae M. genitalium

B. subtilis

T. pallidum

T. maritima

B. burgdorferi

P. horikoshii

M. thermoautotrophicum

A. fulgidus

M. jannaschii

S. cerevisiae C. elegans

A. aeolicus

E. coli

H. influenzae R. prowazekii

H. pylori 26695

Synechocystis sp.

H. pylori J99

A. pernix 100 100

100 100

100

100 100

100

100 98

93 89

69

88

0.1

97

Proteobacteria

Eukarya

Euryarchaeota

Low G+C Gram-Positive Bacteria

100

Snel Bork Huynen Nature Genet 1999 Huynen Snel Bork Science 1999

Fritz-Laylin et al. cell 2010

… however

Fokkens and Snel PLoS Comp Biol 2009

Mediator

•Essential for transcription •Associated with general transcription machinery •Bridge •25 subunits •Four submodules

S.po

Y.lip

S.cer K.lacC.alb

D.han

A.fum A.nidG.zea N.cra

U.mayC.neo

P.chr

L.bic R.ory

E.cun

D.mel

H.sap M.musC.ele

E.hist D.dis

root

C.merA.thal O.tau

C.rei

T.theCryp

Thei

P.falcPhyt

T.pseP.tri

GiardN.gruL.maj

Tryp

Animals Amoebozoa

Fungi Excavata Chrom- Alveolates*

Archaeplastids ?

divisions Fungi Animals Chromalveloates ExcevataAscomycota Basidomycota

subm

odul

e

Med

iato

r su

buni

t

spec

ies

S.ce

r K.

lac

D.ha

n

C.a

lb

Y.lip

A.ni

g A.

fum

N.c

ra G

.zea

S.po

U.m

ay

C.n

eo

P.ch

r

L.bi

c

R.o

ry

E.cu

n

H.s

ap

M.m

us

D.m

el

C.el

e

D.di

s

E.h

ist

A.th

al

O.ta

u

C.re

i

C.m

er

P.fa

lc

Cryp

Thei

T.th

e

Phyt

T.ps

e

P.tri

N.gr

u

L.m

aj T

ryp

Gia

rd

Med15 x x x x x x x x x x x x x x x x x x x x x xMed16 x x x x x x x x x x xMed14 x x x x x x x x x x x x x x x x x x x x x x xMed3 xMed2 x

Med10 x x x x x x x x x x x x x x x x x x x x x x x x xMed1 x x x x x x x x x x x x x xMed4 x x x x x x x x x x x x x x x x x x x xMed7 x x x x x x x x x x x x x x x x x x x x x x x x x x x xMed9 x x x x x x x x x x xMed5 x x x x x x x x x x xMed31 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x xMed21 x x x x x x x x x x x x x x x x x x x x x x x x

Med11 x x x x x x x x x x x x x x xMed6 x x x x x x x x x x x x x x x x x x x x x x x x x x x x xMed20 x x x x x x x x x x x x x x x x xMed18 x x x x x x x x x x x x x x x x xMed17 x x x x x x x x x x x x x x x x x x x xMed22 x x x x x x x x x x x x x x x x x x x x x xMed8 x x x x x x x x x x x x x x x x x x x x xMed19 x x x x x x x x x x x x x

Cdk8 x x x x x x x x x x x x x x x x x x x x x xCycC x x x x x x x x x x x x x x x x x x x x xMed13 x x x x x x x x x x x x x x x x x x x x xMed12 x x x x x x x x x x x x x x x x x x x x x

Med23 x x x x x x x xMed24 x x x xMed25 x x x x xMed26 x x xMed27 x x x x x x x x x x x x xMed28 x x x x x xMed29 x x x xMed30 x x x

Amoe- bozoa

Archeaplas-tids

CDK

unkn

own

Tail

Mid

dle

Head

S.po

Y.lip

S.cer K.lacC.alb

D.han

A.fum A.nidG.zea N.cra

U.mayC.neo

P.chr

L.bic R.ory

E.cun

D.mel

H.sap M.musC.ele

E.hist D.dis

root

C.merA.thal O.tau

C.rei

T.theCryp

Thei

P.falcPhyt

T.pseP.tri

GiardN.gruL.maj

Tryp

many partial losses, few gains

Presence of genes across genomes: orthology

bags of genes

Why does the genome contain the genes that it does

Bag of genes, what about the homologs within a genome? (paralogs, duplicates)

Ub

Cdk1

Cdk1

Cyclin B Securin

Ub Ub

Ub

Ub Ub

Ub Ub

Anaphase Promoting Complex/Cyclosome (APC/C)

Initiating Anaphase

Mitotic Checkpoint

Cdk1

Cyclin B Securin

The Mitotic Checkpoint

X

The Mitotic Checkpoint Complex (MCC)

BubR1

Mad2

Cdc20

Mps1

Mitotic Checkpoint

BubR1

Mad2

Cdc20

MCC

Kinetochore

AurB Bub1

Kinase TPR

KEN BUB3-binding

ScBub1 HsBub1

(fungi & vertebrates)

TPR

KEN BUB3-binding

Kinase TPR

BUB3-binding CDI

hsBubR1 (vertebrates)

scMad3p (fungi)

9 independent duplications. 7 cases where a mad3-like and a bub1-like protein arose out of a bubmad-like ancestor. Parallel or convergent evolution?

What about the kinase domain in human bubr1?

degeneration of motifs essential for catalysis

• the preserved ‘catalytic’ residues are essential for BUBR1 conformational stability in vitro and in cells. Uncoupling potential enzymatic activity from structural stability shows that catalysis is dispensable for BUBR1 function in mitosis.

• Suggest that BUBR1 is an atypical pseudokinase.

What about the kinase domain in human (and fly)

What about all these BubMad containing species?

What about the ancestor?

? Subfunctionalization? (cf. TOR but with convergent evolution of domain architecture)

What about all these bubmad containing species? What about the ancestor?: experimentally testing

subfunctionalization

Assembled hsBubMad protein from 1-714 bubr1 & bub1 734-1085 is able to functionally replace both bub1 and bubr1 in human cells !

This course • I want study evolution of genomes pathways and

networks, so that is why I study gene/protein evolution • At the end to be more or less able to replicate e.g.

bubmad, mediator • Understanding that many bioinformatic challenges are

a mix of conceptual and technical problems (e.g. why orthology is such an incredibly persistent problem)

• “what you should ~know” in order to this kind of research

• Topics are interrelated – e.g. orthology already in homology lecture but proper

explanation a day after – e.g. that trees can be used to time a duplication to

eukaryogenesis but proper discussion of eukaryogenis has its own lecture

Homology (& domains) • Absolute basis of any comparative analysis,

affects MSA and trees, detection still being improved,

Gene Phylogeny & Orthology

• How do we get such trees and how do we interpret them

• Trees reveal some of the most important genome evolution processes: LGT, duplication, loss,

(Eukaryotic) tree of life • New genomes, cool/exotic animals / protists • Berend / Eelco / John / Like / Leny sitting behind

his/her computer and thinking should I include this genome? How should I interpret an absence? What source of species could I best use for homology based gene prediction.

• Crucial when interpreting gene trees: – Knowing it by heart >>> having to look it up

• With regards to evolutionary signaling cell biology ( kinases, smallgtpases etc. )the diversity in present day genomes is staggering and dwarfs e.g. human-fruit fly difference

Large scale orthology

• Needed to move beyond anecdotes, but difficult to get

RapGAP (animals(LSE), fungi, dicty)

PHYSOJ14061 Phytophthora sojae 142624 PHYINF15173 Phytophthora infestans PITG 15173

RalGAPB (oomycetes, dicty, naegleria, fungi, animals))

RalGAPA (dicty, naegleria, fungi, animals)

RheBGAP (TSC2, oomycetes, diatoms, red algea, animals, fungi, dicty, tetrahymena

99

13

823

31

100

24

Eukaryogenesis / LECA

• Biological topic, eukaryogensis / LECA for which these types of analyses are telling us a lot. But it also impacts a lot of things we do: we see it back in gene trees and it impacts getting orthologous groups across eukaryotes.

Gene content evolution

• Fundamental level of genome evolution • Gene invention -> inability to detect homologs vs real

lack of homologs does not simply mean novel gene • Evolutionary modules? • Trying to move large scale but remember the pitfalls

Whole Genome Duplication (WGD)

• Like LECA, WGD is important biology for which bioinf needed to research but which also impacts our data

• And which is welcome source of information for our analyses (Lidija, bubmad): independent and reliable reconstruction of the history of part of the history of genes

Using HTP data to study evolution of networks / complexes

• Is the number of conserved interactions between e.g. yeast and human 10% or 95%???

• On top of all the genome analysis pitfalls also all the HTP data pitfalls …

• Duplicates vs orthologs

Techniques AND biology

• Detective/forensics vs concepts; Large scale biology vs small-scale biology; Bioinformatics biology vs data/technique problems;

• A lot like police investigation … and less like Nobel prize winning physics ...

• Anything goes in genome evolution; many processes often entangled (i.e. google subneofunctionalization)

Literature discussion • You should have read the papers before the day of the

discussion • On day itself group split in critique / defense (not WGD) • Groups prepare defense and critique • Discussion: (critique starts because we all have read the

paper) – Critique gives outline why paper is weak – break – Defense responds to critique – break – Critique gives final comments to which immediately response

can be made – Defense gives final comments to which immediately response

can be made

Computer Exercises

• Computer exercises for some topics many others more difficult (i.e. evolution of interaction networks based on HTP analysis).

• Previous years too much cookbook. Attempt at less cookbook, more playing around → you should learn more; but it is slower

• Ask help from fellow students. • Ties strongly into mini-projects

Mini projects • A protein • what does swissprot / sgd already knows about your protein

function wise (GO) (pathway) • protein topology. (domains, disorder, TM?, motifs?) • "Most diverged" sequence in same genome ... that is still an

homolog. • Homologs across tree of life • Point of invention of family? if bacterial invention, mito or archaeal

route? • Does it have WGD duplicates? • Tree of relevant sequences in diverse genomes • Orthologs in relevant genomes • (normally relevant genomes would be a few metazoa, fungi, other

ophistokonts, amoebazoa, strameopiles, alveolates, plantae, excavates, see e.g. bubmad but it depends on your protein)

Requests • very heterogeneous with respect to previous

knowledge (IBMB, GB, research projects, PhD students)

• PLEASE: interrupt / ask questions when I am going to fast, when I use jargon, when I make jumps/conclusions that to me seem obvious 100% logical, but to your are erratic; please point out my implicit assumptions regarding what everybody knows

• Computer exercises: more experienced people help

• And also apologies for some redundancy

top related