the challenge of bioinformatics chris glasbey biomathematics & statistics scotland

Post on 03-Jan-2016

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

The challenge of bioinformatics

Chris Glasbey

Biomathematics & Statistics Scotland

Talk plan

1. DNA

2. mRNA

3. Protein

4. Genetic networks

1. DNA

1. DNA

Frank Wright et al

BioSS

1.DNA

1. DNA

TOPALi

2. mRNAPrepare cDNA targets

Label withfluorescent dyes

Combine Equal Amounts

Hybridise for 5 -12 hours

Scanning

2. mRNA• Scanner’s PMT setting is one

of the sources of contamination.

• Scanner’s setting is to be raised to a certain level to make the weakly expressed genes visible.

• This may cause highly expressed genes to get censored (at 216–1= 65535) expression values.

2. mRNA

Censored spot

Imputed values

0

65535

With GTI (Edinburgh)

2. mRNA

Scan-1 intensity data

Scans 1

to 4

inte

nsity d

ata

0 10000 20000 30000 40000 50000

020000

40000

60000

Scan-1 vs. Scan-1Scan-2 vs. Scan-1Scan-3 vs. Scan-1Scan-4 vs. Scan-1

Multiple scans

Estimated gene expression

Obs

erve

d pi

xel m

ean

/ bet

a

0 10000 20000 30000 40000

010

000

3000

050

000

Scan-1Scan-2Scan-3Scan-4

Array-2 data

Mizan Khondoker

2. mRNA

Jim McNicol

3. Proteins

Electrophoresis gel

Lars Pedersen DTU, Denmark

3. Proteins

Protein separation by

1. pH

2. Mol. Wt.

3. Proteins

gel 1 gel 2

How to compare gels 1 and 2?

3. Proteins

John Gustafsson, Chalmers University, Sweden

WARP

3. Proteins

Two gels superimposed (in different colours)

3. Proteins

Statistical Design

3 complete reps of 15 treatment combinations. (3 ecotypes by 5 heavy metals)

Maximum of 1400 protein spots per gel

Statistical Analyses

Filter data – remove spots with low intensity values and low quality scores (leaving ~290 spots)

Individual proteins – ANOVA, main effects and interactions

1E-16

1E-14

1E-12

1E-10

1E-08

1E-06

0.0001

0.01

1

1 26 51 76 101 126 151 176 201 226 251 276

3. Proteins

Principal Components Analysis

Identify groups of proteins that are affected in a consistent manner by treatments

-0.08

-0.06

-0.04

-0.02

0.00

0.02

0.04

0.06

0.08

0.10

0.12

1 25 49 73 97 121 145 169 193 217 241 265 289

Protein identity

Loadin

gs

Jim McNicol

4. Genetic networks

4. Genetic networks

4. Genetic networks

Is it possible to infer the network from gene expression data such as these?

Dirk Husmeier

4. Genetic networks

Bayesian network

4. Genetic networks

truth inferred

“I genuinely believe that we are living through the greatest intellectual moment in human history.” (Matt Ridley, Genome, 1999)

“Grand Unified Systems Biology”

top related