spring 2002christophe roos - bioinfo primer informatics goes system biology christophe roos -...

27
Spring 2002 Christophe Roos - Bioinfo primer Informatics goes system biology Christophe Roos Christophe Roos - - MediCel MediCel christophe.roos@medicel.fi Gene-networks in signaling Drosophila as a model olecular biology goes in silico

Upload: beatrix-wells

Post on 30-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Spring 2002Christophe Roos - Bioinfo primer Informatics goes system biology Christophe Roos - MediCel ltd christophe.roos@medicel.fi Gene-networks in signaling

Spring 2002Christophe Roos - Bioinfo primer

Informatics goes system biology

Christophe RoosChristophe Roos - - MediCel ltdMediCel [email protected]

Gene-networks in signaling

Drosophila as a model

Molecular biology goes in silico

Page 2: Spring 2002Christophe Roos - Bioinfo primer Informatics goes system biology Christophe Roos - MediCel ltd christophe.roos@medicel.fi Gene-networks in signaling

Spring 2002Christophe Roos - Bioinfo primer

From single cell to organism – a life cycle

The use of a model organism

• Fertilisation followed by cell division

• Pattern formation – instructions for– Body plan (Axes: A-P, D-V)

– Germ layers (ecto-, meso-, endoderm)

• Cell movement - form – gastrulation• Cell differentiation• Cell growth, cell death (apoptosis)

Page 3: Spring 2002Christophe Roos - Bioinfo primer Informatics goes system biology Christophe Roos - MediCel ltd christophe.roos@medicel.fi Gene-networks in signaling

Spring 2002Christophe Roos - Bioinfo primer

Development of the body plan

• We are much more like flies in our development than you might think.

• Drosophila is the best understood of all developmental systems.

• We have evolved by two genome duplications• Like all animals with bilateral symmetry, the fly embryo is

patterned along two distinct and largely independent axes: Anterior-Posterior and Dorsal-Ventral

Page 4: Spring 2002Christophe Roos - Bioinfo primer Informatics goes system biology Christophe Roos - MediCel ltd christophe.roos@medicel.fi Gene-networks in signaling

Spring 2002Christophe Roos - Bioinfo primer

Pattern formation – germ layers

Page 5: Spring 2002Christophe Roos - Bioinfo primer Informatics goes system biology Christophe Roos - MediCel ltd christophe.roos@medicel.fi Gene-networks in signaling

Spring 2002Christophe Roos - Bioinfo primer

From lab bench to computer keyboard

• Molecular biology is the mother of Biotechnology, an area with huge potential applications.

• Molecular biology has handled on single genes and proteins, but now methods make it possible to operate on large sets simultaneously.

• Information technology is an essential enabling technolgy(tool) in molecular biology. We know it as bioinformatics or biocomputing.

• Bioinformatics is to a large extent a predictive science, the results of which enter public and private electronic databases.

5 text slides

Bio

info

Page 6: Spring 2002Christophe Roos - Bioinfo primer Informatics goes system biology Christophe Roos - MediCel ltd christophe.roos@medicel.fi Gene-networks in signaling

Spring 2002Christophe Roos - Bioinfo primer

The experimental data hits the hard disk

• Biological data is accumulating at a high rate– DNA and protein sequence

– Gene expression profiles

– Protein structure

– Scientific litterature are accumulating at a high rate.

• The genome of several organisms has been sequenced (many viruses and bacteria, yeast, C.elegans and the fruitfly Drosophila). Most of the DNA sequence of the complete human genome has been determined.

4 text slides

Bio

info

Page 7: Spring 2002Christophe Roos - Bioinfo primer Informatics goes system biology Christophe Roos - MediCel ltd christophe.roos@medicel.fi Gene-networks in signaling

Spring 2002Christophe Roos - Bioinfo primer

… but what is the data?genome, gene, protein

• Each cell contains a full genome (some exceptions), it consists of DNA

• The size varies:

– Small for viruses and prokaryotes (10 kbp-20Mbp)

– Medium for lower eukaryotes

• Yeast, unicellular eukaryote 13 Mbp

• Worm (Caenorhabditis elegans) 100 Mbp

• Fly, invertebrate (Drosophila melanogaster) 170 Mbp

– Larger for higher eukaryotes

• Mouse and man 3.000 Mbp

– Very variable for plants (many are polyploid)

• Mouse ear cress (Arabidopsis thaliana) 120 Mbp

• Lilies 60.000 Mbp

3 text slides

Bio

info

Page 8: Spring 2002Christophe Roos - Bioinfo primer Informatics goes system biology Christophe Roos - MediCel ltd christophe.roos@medicel.fi Gene-networks in signaling

Spring 2002Christophe Roos - Bioinfo primer

… the data

• The genome is partitioned over one or many chromosomes. Their number is constant within a species but varies between species (range ca. 1-100).

• The chromosome is one DNA double helix molecule

• A gene is the smallest functional unit on a chromosome that codes for a protein or an effector RNA (e.g. tRNA and rRNA). The gene is directional (5’ end → 3’ end).

– Regulatory regions (promotor & enhancer)

– Transcribed regions: exons and introns

• Introns are spliced away during maturation, exons are concatenated

• Exons make up the 5’ UTR, the CDS and the 3’ UTR)

genome, gene, protein

chromosome

mRNA5’UTR CDS 3’UTR

promotor

2 text slides

Bio

info

Page 9: Spring 2002Christophe Roos - Bioinfo primer Informatics goes system biology Christophe Roos - MediCel ltd christophe.roos@medicel.fi Gene-networks in signaling

Spring 2002Christophe Roos - Bioinfo primer

• The proteins are formed by amino acids, each of them (20 different) is coded for by one or several triplets (43=64 different).

• As the gene code is read is triplets, it is essential to keep the frame correct (theoretically 3 forward and 3 reverse frames for any DNA segment).

• The polypeptide chain is linear but folds into a 3D-structure.

– The 3D structure is pivotal for the function of most proteins

– The 3D structure consists of folds

– Some discrete structures make up the folds (-helix, -sheet, etc.)

– The 3D structure cannot (yet) be predicted, but can be measured by NMR or X-ray spectroscopy of crystals.

– The structure is not static and depends also on partners.

… the datagenome, gene, protein

Last insert

Bio

info

Page 10: Spring 2002Christophe Roos - Bioinfo primer Informatics goes system biology Christophe Roos - MediCel ltd christophe.roos@medicel.fi Gene-networks in signaling

Spring 2002Christophe Roos - Bioinfo primer

Genes control cell behavior by controlling which proteins are made

by a cell• Genomic content constant: all cells have the same

instructive set• Differential gene activity controls development• Understanding development means a.o.

understanding gene control

• Chromatin structure• Transcription• Processing (splicing)• Nuclear export• Cytoplasmic location (storage)• Translation• Modifications of the polypeptide

• Glycosylation (sugars)• Proteolytic cleavage• Complex formation

Page 11: Spring 2002Christophe Roos - Bioinfo primer Informatics goes system biology Christophe Roos - MediCel ltd christophe.roos@medicel.fi Gene-networks in signaling

Spring 2002Christophe Roos - Bioinfo primer

Development is progressive

• Specification of cell fate: determination

– All cells still ‘look the same’

– Can be tested by transplantation experiments

• Interactions can make cells different from each other: induction

Page 12: Spring 2002Christophe Roos - Bioinfo primer Informatics goes system biology Christophe Roos - MediCel ltd christophe.roos@medicel.fi Gene-networks in signaling

Spring 2002Christophe Roos - Bioinfo primer

Patterning – interpretation of positional information

• Positional value

– Morphogen – a substance

– Threshold concentration

• Program for development

– Generative rather than descriptive

Page 13: Spring 2002Christophe Roos - Bioinfo primer Informatics goes system biology Christophe Roos - MediCel ltd christophe.roos@medicel.fi Gene-networks in signaling

Spring 2002Christophe Roos - Bioinfo primer

The bicoid gene provides an A-P morphogen gradient

Page 14: Spring 2002Christophe Roos - Bioinfo primer Informatics goes system biology Christophe Roos - MediCel ltd christophe.roos@medicel.fi Gene-networks in signaling

Spring 2002Christophe Roos - Bioinfo primer

The A-P axis is divided into broad regions by gap gene expression

• The first zygotic genes• Respond to maternally-derived

instructions• Short-lived proteins, gives bell-

shaped distribution from source

Page 15: Spring 2002Christophe Roos - Bioinfo primer Informatics goes system biology Christophe Roos - MediCel ltd christophe.roos@medicel.fi Gene-networks in signaling

Spring 2002Christophe Roos - Bioinfo primer

Transcription factors in cascade

• Hunchback (hb) , a gap gene, responds to the dose of bicoid protein

• A concentration above threshold of bicoid activates the expression of hb

• The more bicoid transcripts, the further back hb expression goes

Page 16: Spring 2002Christophe Roos - Bioinfo primer Informatics goes system biology Christophe Roos - MediCel ltd christophe.roos@medicel.fi Gene-networks in signaling

Spring 2002Christophe Roos - Bioinfo primer

Krüppel reads two values

• Krüppel (Kr), a gap gene, responds to the dose of hb protein

• A concentration above minimum threshold of hb activates the expression of Kr

• A concentration above maximum threshold of hb inactivates the expression of Kr

Page 17: Spring 2002Christophe Roos - Bioinfo primer Informatics goes system biology Christophe Roos - MediCel ltd christophe.roos@medicel.fi Gene-networks in signaling

Spring 2002Christophe Roos - Bioinfo primer

Segmentation:

activation of the pair-rule genes

• Parasegments are delimited by expression of pair-rule genes in a periodic pattern

• Each is expressed in a series of 7 transverse stripes

Page 18: Spring 2002Christophe Roos - Bioinfo primer Informatics goes system biology Christophe Roos - MediCel ltd christophe.roos@medicel.fi Gene-networks in signaling

Spring 2002Christophe Roos - Bioinfo primer

Universals:The homeotic genes also specify human development

Lew

is W

olpe

rt, R

osa

Bed

ding

ton,

Jer

emy

Bro

ckes

, T

hom

as J

esse

l, P

eter

Law

renc

e, E

llio

t Mey

erow

itz,

P

rinc

iple

s of

Dev

elop

men

t, C

urre

nt B

iolo

gy lt

d,

Oxf

ord

Uni

vers

ity

Pre

ss 1

998,

IS

BN

0-1

9-85

0263

-X

Page 19: Spring 2002Christophe Roos - Bioinfo primer Informatics goes system biology Christophe Roos - MediCel ltd christophe.roos@medicel.fi Gene-networks in signaling

Spring 2002Christophe Roos - Bioinfo primer

Genes are controlled by activating and repressing transcription factors that bind the promotor

• The 500bp promoter region of the even-skipped gene

• Gene expression occurs when the activating factors are present above a threshold

• Repressors may act by preventing binding of activators

promotor

Page 20: Spring 2002Christophe Roos - Bioinfo primer Informatics goes system biology Christophe Roos - MediCel ltd christophe.roos@medicel.fi Gene-networks in signaling

Spring 2002Christophe Roos - Bioinfo primer

… understand the promotor

What we might know about the promotor:

...CAGTGCTAATATAAAACTGATATTTAATTGAAATCTTTTCTAATTTAGCGCGCTCAGCTGTTGGGTGACCTTGCTGCCGTTCAAATTCCGGAGGAGGAGCTGCAGCAGTATACTTCCATTAGCCAAGTGCAAACCGTGGGATTAAAGCGTCTACCCACCCTTGACGAGTATCTAGCCAAGAAAAAGGAAAGACAGGCCCAAGTTTTAGCTGAAAAAAGCTCGGCGTCGGGTCTCCGCGTAAATGCTATAAAGGGCTCCAAGCGCAAGCTTCTCGTCGAAGAGGAGGAGGAACTACAGGCCAAGCGAAAGAATCCGAATGTAATTAGCGTGGAGGAAGATGACGAAGATTCTTCATCCTCTGATGAGGACGATGAGGAGGCACCAGCTCAATCCGCTCCTATTGCCATACCCACTCCAGTGTCTATAGCTCCACCGCAAATCGCTGTTAAACCACCCATTAAAAAGTTGAAGCCAGAGCCTAACCCACCTGCCTGTATCCACCAGACTGTCTATGTGCCCGTACATCGGACAACAGAAGTTCAGAATGCCCGTCTTCGACTGCCTATCCTCGCGGAGGAGCAGCAGGTGATGGAGACAATCAACGAAAACCCCATTGTGATCGTGGCTGGTGAGACTGGCTCTGGAAAGACTACCCAGCTACCGCAGTTCCTGTACGAAGCGGGGTATGCCCAGCACAAGATGATTGGAGTGACGGAGCCGCGGCGAGTGGCTGCTATTGCCATGTCCAAGCGGGTGGCCCACGAGATGAACCTGCCGGAGAGCGAGGTGTCATACCTCATTCGCTTCGAGGGAAACGTAACACCAGCGACGCGCATTAAATTCATGACAGATGGTGTGTTGCTTAAGGAGATCGAAACTGACTTTCTGCTTAGTAAGTACTCAGTGATCATCCTGGACGAGGCGCACGAGCGCAGTGTTTACACAGACATCCTAGTGGGTCTCCTGTCAAGGATCGTGCCCTTGCGTCACAAACGCGGGCAGCCGCTGAAGCTGATCATTATGTCTGCCACTTTGCGGGTATCCGATTTTACAGAGAATACTCGCTTGTTTAAGATTCCGCCACCGTTGCTTAAAGTGGAGGCTCGACAATTTCCGGTGACTATTCACTTCCAGAAGCGCACACCTGATGACTATGTGGCGGAGGCTTACCGCAAGACCTTAAAAATCCATAATAAGCTTCCGGAAGGCGGCATACTAATTTTTGTGACGGGACAGCAGGAGGTCAACCAACTGGTGCGCAAGCTGCGACGTACGTTTCCGTATCATCATGCGCCAACCAAGGATGTCGCTAAAAATGGAAAGGTATCGGAGGAAGAAAAAGAGGAAACAATAGATGATGCGGCATCGACTGTGGAGGATCCCAAGGAGCTGGAGTTTGATATGAAACGAGTTATACGTAATATTCGTAAATCTAAGAAAAAGTTCTTGGCGCAAATGGCGTTACCCAAAATCAATTTGGACGACTACAAGCTCCCTGGTGATGATACGGAAGCAGACATGCACGAGCAGCCGGATGAGGATGATGAGCAGGAGGGACTAGAAGAGGATAACGACGATGAACTAGGCTTGGAGGATGAGTCGGGAATGGGATCTGGTCAAAGGCAACCTCTGTGGGTCCTGCCGCTCTACTCGCTCCTCTCCTCGGAGAAGCAAAACCGCATCTTCCTGCCCGTTCCCGATGGCTGCCGGCTATGCGTGGTTAGCACCAATGTGGCAGAGACATCTCTCACCATCCCGCACATCAAGTATGTTGTTGACTGTGGTCGCCAGAAGACGCGTCTTTACGACAAACTGACGGGTGTGAGTGCTTTTGTGGTAACCTACACGTCTAAGGCCTCGGCGGATCAGCGTGCTGGACGAGCGGGTCGCATCAGCGCCGGACATTGCTATCGCCTCTACTCGAGTGCCGTGTACAACGACTGCTTCGAGGACTTTTCCCAGCCGGATATCCAGAAAAAGCCCGTCGAGGACCTTATGCTGCAAATGCGCTGCATGGGCATCGATCGCGTGGTGCACTTTCCCTTTCCCTCACCACCGGATCAAGTGCAGCTGCAAGCCGCCGAGCGGCGATTGATCGTGCTAGGTGCCCTGGAGGTCGCCAAGACAGAGAATACAGATTTGCCACCAGCCGTTACTCGTTTGGGTCACGTTATCTCCCGCTTTCCCGTGGCGCCGCGCTTTGGAAAAATGCTGGCTCTGTCCCACCAGCAGAACCTACTGCCCTACACCGTCTGCCTGGTGGCCGCACTTTCAGTCCAGGAGGTGCTAATCGAAACGGGCGTTCAAAGGGATGAGGATGTGGCACCTGGCGCGAATCGGTTCCACCGCAAACGCCAAAGTTGGGCGGCCAGCGGCAACTATCAGTTGCTTGGAGATCCTATGGTCTTATTACGTGCCGTAGGAGCTGCAGAGTACGCCGGATCGCAGGGCCGCTTGCCAGAGTTTTGTGCTGCGAATGGATTGCGCCAGAAAGCGATGAGCGAGGTGCGAAAATTGCGCGTCCAGCTGACTAACGAGATTAACCTGAATGTTAGTGACGTTGAGCTGGGTGTGGACCCCGAACTGAAGCCTCCCACCGATGCCCAGGCGCGTTTCCTTCGCCAAATTCTATTGGCCGGCATGGGCGACCGGGTGGCTAGAAAGGTACCTCTGGCAGACATCGCCGACAAGGAAGAGCGGCGGCGATTAAAGTACGCATACAATTGTGCTGACATGGAGGAACCAGCGTTCCTGCACGTCTCATCCGTGTTGCGTCAAAAAGCACCCGAATGGGTAATCTATCAGGAGGCATACGAGCTGCAAAACGGCGACTCTACCAAGATGTTCATCCGCGGC...

This page shows 3000 charactersThus, the human genome has about 106 pages … however

Page 21: Spring 2002Christophe Roos - Bioinfo primer Informatics goes system biology Christophe Roos - MediCel ltd christophe.roos@medicel.fi Gene-networks in signaling

Spring 2002Christophe Roos - Bioinfo primer

… the reality is certainly different

What the protein regulators might know about the promotor:

Page 22: Spring 2002Christophe Roos - Bioinfo primer Informatics goes system biology Christophe Roos - MediCel ltd christophe.roos@medicel.fi Gene-networks in signaling

Spring 2002Christophe Roos - Bioinfo primer

McAdams and Shapiro Science, 1995, 269, pp.650-656

Lytic cycle decision -phage: 11 genes

Human Genome:~ 31 000 – 40 000 genes

… and when regulation concerns many genes simultaneously…

There is more than promotors

Page 23: Spring 2002Christophe Roos - Bioinfo primer Informatics goes system biology Christophe Roos - MediCel ltd christophe.roos@medicel.fi Gene-networks in signaling

Spring 2002Christophe Roos - Bioinfo primer

Data types will diversify

• While the promotor is a challenge to bioinformatics, it is only one tiny facet of biological data

• Other data types concern among other

– Gene transcripts or proteins present

• At various time points

• In different tissues

• In diseases

– Interactions between components

– Pathways

• Metabolic

• Regulatory

How can it be organised?

Page 24: Spring 2002Christophe Roos - Bioinfo primer Informatics goes system biology Christophe Roos - MediCel ltd christophe.roos@medicel.fi Gene-networks in signaling

Spring 2002Christophe Roos - Bioinfo primer

Pathway database, interaction database, ...

Page 25: Spring 2002Christophe Roos - Bioinfo primer Informatics goes system biology Christophe Roos - MediCel ltd christophe.roos@medicel.fi Gene-networks in signaling

Spring 2002Christophe Roos - Bioinfo primer

A wealth of databases

•Primary and derived databases•Each one accessible via separate tools

• sometimes cross-indexed• with separate syntax• with different levels of confidence• with errors

We have a problem

Page 26: Spring 2002Christophe Roos - Bioinfo primer Informatics goes system biology Christophe Roos - MediCel ltd christophe.roos@medicel.fi Gene-networks in signaling

Spring 2002Christophe Roos - Bioinfo primer

Biology in the computing age

• What does informatics mean to biologists?• Representing the data to the user• Organising the data in databases• Disseminating the data over Internet• Manipulating and interlinking of the data• Analysing of the data

• What challenges does biology offer computer scientists?• Cracking the genome code• Presenting data in an intelligible form• Biological data is complex and interlinked• Multiple entities interact to form pathways, networks• Model, simulate and understand how living things function

Page 27: Spring 2002Christophe Roos - Bioinfo primer Informatics goes system biology Christophe Roos - MediCel ltd christophe.roos@medicel.fi Gene-networks in signaling

Spring 2002Christophe Roos - Bioinfo primer

System biology needs more

• Mathematics, systematics, semantics• Reverse engineering• Modelling• Data mining

• However, it has to be done starting from biological premises