oliver stegle and karsten borgwardt - eth z

96
Oliver Stegle and Karsten Borgwardt: Computational Approaches for Analysing Complex Biological Systems, Page 1 GP Applications Oliver Stegle and Karsten Borgwardt Machine Learning and Computational Biology Research Group, Max Planck Institute for Biological Cybernetics and Max Planck Institute for Developmental Biology, Tübingen

Upload: others

Post on 13-Apr-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Oliver Stegle and Karsten Borgwardt - ETH Z

Oliver Stegle and Karsten Borgwardt: Computational Approaches for Analysing Complex Biological Systems, Page 1

GP ApplicationsOliver Stegle and Karsten Borgwardt

Machine Learning andComputational Biology Research Group,

Max Planck Institute for Biological Cybernetics andMax Planck Institute for Developmental Biology, Tübingen

Page 2: Oliver Stegle and Karsten Borgwardt - ETH Z

Outline

Outline

O. Stegle & K. Borgwardt GP Applications Tubingen 1

Page 3: Oliver Stegle and Karsten Borgwardt - ETH Z

Application1: modelling physiological time series

Outline

Application1: modelling physiological time seriesOverviewGaussian process prior for heart rateResults

Application 2: differential gene expressionOverviewA Gaussian process two-sample testExperimental Results on ArabidopsisDetecting Temporal Patterns of Differential Expression

Application 3: Modeling transcriptional regulation using Gaussianprocesses

Summary

O. Stegle & K. Borgwardt GP Applications Tubingen 2

Page 4: Oliver Stegle and Karsten Borgwardt - ETH Z

Application1: modelling physiological time series

Motivation

I Human heart rate is an important physiological trait.

I Measurement over long periods only viable with poor sensors.

I Motivation Gaussian process model for heart rate.

O. Stegle & K. Borgwardt GP Applications Tubingen 3

Page 5: Oliver Stegle and Karsten Borgwardt - ETH Z

Application1: modelling physiological time series

Motivation

I Human heart rate is an important physiological trait.

I Measurement over long periods only viable with poor sensors.

I Motivation Gaussian process model for heart rate.

O. Stegle & K. Borgwardt GP Applications Tubingen 3

Page 6: Oliver Stegle and Karsten Borgwardt - ETH Z

Application1: modelling physiological time series

The problemDataset

4 days of heart data

0 1000 2000 3000 4000 5000 6000 70000

50

100

150

200

250

300

t/min

hr/

bm

p

Features

I Different noise sources

I Two time scales, 24hrhythms

I Asymmetric around themean

I Auxiliary variablesindicative of noise

O. Stegle & K. Borgwardt GP Applications Tubingen 4

Page 7: Oliver Stegle and Karsten Borgwardt - ETH Z

Application1: modelling physiological time series

The problemDataset

4 days of heart data

50

100

150

200

HR

/ b

pm

0

1

∆HRmin

0

1

∆HRmax

1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 60000

1

t

Tlost

Features

I Different noise sources

I Two time scales, 24hrhythms

I Asymmetric around themean

I Auxiliary variablesindicative of noise

O. Stegle & K. Borgwardt GP Applications Tubingen 4

Page 8: Oliver Stegle and Karsten Borgwardt - ETH Z

Application1: modelling physiological time series

The problemDataset

zoomed view

2100 2200 2300 2400 2500 2600

20

40

60

80

100

120

t/min

hr/

bm

p

Features

I Different noise sources

I Two time scales, 24hrhythms

I Asymmetric around themean

I Auxiliary variablesindicative of noise

O. Stegle & K. Borgwardt GP Applications Tubingen 5

Page 9: Oliver Stegle and Karsten Borgwardt - ETH Z

Application1: modelling physiological time series Overview

modelThe Inference Model

B

C

A

z1noisyclean very noisy

noiseaux. data

y1

f1

time step 1

z2noisyclean very noisy

noiseaux. data

y2

f2

time step 2

znnoisyclean very noisy

noiseaux. data

yn

fn

time step n

...

...

...

I (a) Gaussian process prior on latent heart rate

I (b) Clustering of auxiliary data to extract noise classes

I (c) Heavy tailed noise model taking classes into account

O. Stegle & K. Borgwardt GP Applications Tubingen 6

Page 10: Oliver Stegle and Karsten Borgwardt - ETH Z

Application1: modelling physiological time series Gaussian process prior for heart rate

A GP prior for heart rate

I Covarianc function for shortrange fluctuations

I Long-range perdiodic signal

I Sum of (a) and (b)

I Log transformation (asymmetry)

0 50 100 150

−200

−100

0

100

200

t / min

HR

/ b

pm

(a)

0 1000 2000 3000 4000

−200

−100

0

100

200

t / min

HR

/ b

pm

(b)

0 1000 2000 3000 4000

−200

−100

0

100

200

t / min

HR

/ b

pm

(c)

0 1000 2000 3000 40000

50

100

150

200

250

t / min

HR

/ b

pm

(d)

O. Stegle & K. Borgwardt GP Applications Tubingen 7

Page 11: Oliver Stegle and Karsten Borgwardt - ETH Z

Application1: modelling physiological time series Gaussian process prior for heart rate

A GP prior for heart rate

I Covarianc function for shortrange fluctuations

I Long-range perdiodic signal

I Sum of (a) and (b)

I Log transformation (asymmetry)

0 50 100 150

−200

−100

0

100

200

t / min

HR

/ b

pm

(a)

0 1000 2000 3000 4000

−200

−100

0

100

200

t / min

HR

/ b

pm

(b)

0 1000 2000 3000 4000

−200

−100

0

100

200

t / min

HR

/ b

pm

(c)

0 1000 2000 3000 40000

50

100

150

200

250

t / min

HR

/ b

pm

(d)

O. Stegle & K. Borgwardt GP Applications Tubingen 7

Page 12: Oliver Stegle and Karsten Borgwardt - ETH Z

Application1: modelling physiological time series Gaussian process prior for heart rate

A GP prior for heart rate

I Covarianc function for shortrange fluctuations

I Long-range perdiodic signal

I Sum of (a) and (b)

I Log transformation (asymmetry)

0 50 100 150

−200

−100

0

100

200

t / min

HR

/ b

pm

(a)

0 1000 2000 3000 4000

−200

−100

0

100

200

t / min

HR

/ b

pm

(b)

0 1000 2000 3000 4000

−200

−100

0

100

200

t / min

HR

/ b

pm

(c)

0 1000 2000 3000 40000

50

100

150

200

250

t / min

HR

/ b

pm

(d)

O. Stegle & K. Borgwardt GP Applications Tubingen 7

Page 13: Oliver Stegle and Karsten Borgwardt - ETH Z

Application1: modelling physiological time series Gaussian process prior for heart rate

A GP prior for heart rate

I Covarianc function for shortrange fluctuations

I Long-range perdiodic signal

I Sum of (a) and (b)

I Log transformation (asymmetry)

0 50 100 150

−200

−100

0

100

200

t / min

HR

/ b

pm

(a)

0 1000 2000 3000 4000

−200

−100

0

100

200

t / min

HR

/ b

pm

(b)

0 1000 2000 3000 4000

−200

−100

0

100

200

t / min

HR

/ b

pm

(c)

0 1000 2000 3000 40000

50

100

150

200

250

t / min

HR

/ b

pm

(d)

O. Stegle & K. Borgwardt GP Applications Tubingen 7

Page 14: Oliver Stegle and Karsten Borgwardt - ETH Z

Application1: modelling physiological time series Gaussian process prior for heart rate

A GP prior for heart rate

I Short-range covariance

CS(x, x′) = C1 Matern3/2(x, x

′, δS)

I Periodic covariance

CL(x, x′) = C0 exp

[− (x− x′)2

2δ2L

]exp

[−2 ·

sin2( 2πpL · (x− x′))

A2L

]I Total covariance

C(x, x′) = CL(x, x′) + CS(x, x

′)

I Non-linear transformation, reflecting strict positivity and asymmetryof heart rate

y∗t = logβ(yt)

O. Stegle & K. Borgwardt GP Applications Tubingen 8

Page 15: Oliver Stegle and Karsten Borgwardt - ETH Z

Application1: modelling physiological time series Gaussian process prior for heart rate

A GP prior for heart rate

I Short-range covariance

CS(x, x′) = C1 Matern3/2(x, x

′, δS)

I Periodic covariance

CL(x, x′) = C0 exp

[− (x− x′)2

2δ2L

]exp

[−2 ·

sin2( 2πpL · (x− x′))

A2L

]I Total covariance

C(x, x′) = CL(x, x′) + CS(x, x

′)

I Non-linear transformation, reflecting strict positivity and asymmetryof heart rate

y∗t = logβ(yt)

O. Stegle & K. Borgwardt GP Applications Tubingen 8

Page 16: Oliver Stegle and Karsten Borgwardt - ETH Z

Application1: modelling physiological time series Gaussian process prior for heart rate

A GP prior for heart rate

I Short-range covariance

CS(x, x′) = C1 Matern3/2(x, x

′, δS)

I Periodic covariance

CL(x, x′) = C0 exp

[− (x− x′)2

2δ2L

]exp

[−2 ·

sin2( 2πpL · (x− x′))

A2L

]I Total covariance

C(x, x′) = CL(x, x′) + CS(x, x

′)

I Non-linear transformation, reflecting strict positivity and asymmetryof heart rate

y∗t = logβ(yt)

O. Stegle & K. Borgwardt GP Applications Tubingen 8

Page 17: Oliver Stegle and Karsten Borgwardt - ETH Z

Application1: modelling physiological time series Gaussian process prior for heart rate

A GP prior for heart rate

I Short-range covariance

CS(x, x′) = C1 Matern3/2(x, x

′, δS)

I Periodic covariance

CL(x, x′) = C0 exp

[− (x− x′)2

2δ2L

]exp

[−2 ·

sin2( 2πpL · (x− x′))

A2L

]I Total covariance

C(x, x′) = CL(x, x′) + CS(x, x

′)

I Non-linear transformation, reflecting strict positivity and asymmetryof heart rate

y∗t = logβ(yt)

O. Stegle & K. Borgwardt GP Applications Tubingen 8

Page 18: Oliver Stegle and Karsten Borgwardt - ETH Z

Application1: modelling physiological time series Gaussian process prior for heart rate

Clustering of auxiliary data

I Every datapoint can be member in one of K clusters.

I Use clustering approach to determine cluster membership

πn = {πn,1, . . . , πn,K} where

K∑k=1

πn,k = 1

O. Stegle & K. Borgwardt GP Applications Tubingen 9

Page 19: Oliver Stegle and Karsten Borgwardt - ETH Z

Application1: modelling physiological time series Gaussian process prior for heart rate

Clustering of auxiliary data

I Every datapoint can be member in one of K clusters.

I Use clustering approach to determine cluster membership

πn = {πn,1, . . . , πn,K} where

K∑k=1

πn,k = 1

O. Stegle & K. Borgwardt GP Applications Tubingen 9

Page 20: Oliver Stegle and Karsten Borgwardt - ETH Z

Application1: modelling physiological time series Gaussian process prior for heart rate

Robust noise model

I Remember: standard robust noise model, accounting for outliers

p(yn | fn) = πokN(yn∣∣ fn, σ2)+ (1− πok)N

(yn∣∣ fn, σ2∞)

I Here: clustering results (auxiliaries S) encode useful information.

z1

y1|f1 S1

z2

y2|f2 S2

zN

yN |fN SN

π

I Generalize likelihood K noise components, one for each cluster anduse data-specific mixture probabilities πn

p(yn | fn) =K∑k=1

πk,nN(yn∣∣ fn, σ2k)

O. Stegle & K. Borgwardt GP Applications Tubingen 10

Page 21: Oliver Stegle and Karsten Borgwardt - ETH Z

Application1: modelling physiological time series Gaussian process prior for heart rate

Robust noise model

I Remember: standard robust noise model, accounting for outliers

p(yn | fn) = πokN(yn∣∣ fn, σ2)+ (1− πok)N

(yn∣∣ fn, σ2∞)

I Here: clustering results (auxiliaries S) encode useful information.

z1

y1|f1 S1

z2

y2|f2 S2

zN

yN |fN SN

π

I Generalize likelihood K noise components, one for each cluster anduse data-specific mixture probabilities πn

p(yn | fn) =K∑k=1

πk,nN(yn∣∣ fn, σ2k)

O. Stegle & K. Borgwardt GP Applications Tubingen 10

Page 22: Oliver Stegle and Karsten Borgwardt - ETH Z

Application1: modelling physiological time series Gaussian process prior for heart rate

Robust noise model

I Remember: standard robust noise model, accounting for outliers

p(yn | fn) = πokN(yn∣∣ fn, σ2)+ (1− πok)N

(yn∣∣ fn, σ2∞)

I Here: clustering results (auxiliaries S) encode useful information.

z1

y1|f1 S1

z2

y2|f2 S2

zN

yN |fN SN

π

I Generalize likelihood K noise components, one for each cluster anduse data-specific mixture probabilities πn

p(yn | fn) =K∑k=1

πk,nN(yn∣∣ fn, σ2k)

O. Stegle & K. Borgwardt GP Applications Tubingen 10

Page 23: Oliver Stegle and Karsten Borgwardt - ETH Z

Application1: modelling physiological time series Results

Regression for Heart Data

Single data set

I Clusteringcolor-coded andHinton diagrams

I Three clusters- g,b,r-valuesfor responsi-bilities

I Noise parameters{σc}3c=1

optimised alongwith other hyperparameters

O. Stegle & K. Borgwardt GP Applications Tubingen 11

Page 24: Oliver Stegle and Karsten Borgwardt - ETH Z

Application1: modelling physiological time series Results

Regression for Heart Data

Double data-set

I Two sensors forone heart

I GPs overlap wellwithin error-bars

I Lower panel:difference plotand error bars

O. Stegle & K. Borgwardt GP Applications Tubingen 12

Page 25: Oliver Stegle and Karsten Borgwardt - ETH Z

Application1: modelling physiological time series Results

Fill-in testMotivation

I Evaluate predictive performance to benchmark alternative modelsI Fill-in test:

I Model is trained on a subset of the data to predict the remainder.I Log probability as criteria rewards models with appropriately sized error

bars.

O. Stegle & K. Borgwardt GP Applications Tubingen 13

Page 26: Oliver Stegle and Karsten Borgwardt - ETH Z

Application1: modelling physiological time series Results

Block size 1 minute

Block size 1 minute

0.1 0.2 0.3 0.4 0.5 0.6−1

0

1

2

3

Fraction of missing data

<ln

(P(d

ata

)>

(i)(ii)(iii)

(iv)

(v)

I 5 different models comparedI (i) baselineI (ii) GP with ksI (iii) GP with ks + klI (iv) as (iii) but with Tlost

threshold noise modelI (v) as (iv) but with full robust

noise model

O. Stegle & K. Borgwardt GP Applications Tubingen 14

Page 27: Oliver Stegle and Karsten Borgwardt - ETH Z

Application1: modelling physiological time series Results

Block size 60 minute

Block size 60 minutes

0.1 0.2 0.3 0.4 0.5 0.6−1

−0.5

0

0.5

Fraction of missing data

<ln

(P(d

ata

)>

(i)

(ii)

(iii)

(iv)

(v)

I 5 different models comparedI (i) baselineI (ii) GP with ksI (iii) GP with ks + klI (iv) as (iii) but with Tlost

threshold noise modelI (v) as (iv) but with full robust

noise model

I Larger blocks removed rewardslong range covariance

O. Stegle & K. Borgwardt GP Applications Tubingen 15

Page 28: Oliver Stegle and Karsten Borgwardt - ETH Z

Outline

Outline

O. Stegle & K. Borgwardt GP Applications Tubingen 16

Page 29: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression

Outline

Application1: modelling physiological time seriesOverviewGaussian process prior for heart rateResults

Application 2: differential gene expressionOverviewA Gaussian process two-sample testExperimental Results on ArabidopsisDetecting Temporal Patterns of Differential Expression

Application 3: Modeling transcriptional regulation using Gaussianprocesses

Summary

O. Stegle & K. Borgwardt GP Applications Tubingen 17

Page 30: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression Overview

Response to External Stimuli

I Organisms such as plants respond toexternal stimuli:

I Heat/ColdI StarvationI Biotic stresses (fungus)I ...

I Changes in observed gene expressionlevels.

O. Stegle & K. Borgwardt GP Applications Tubingen 18

Page 31: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression Overview

Response to External Stimuli

I Organisms such as plants respond toexternal stimuli:

I Heat/ColdI StarvationI Biotic stresses (fungus)I ...

I Changes in observed gene expressionlevels.

O. Stegle & K. Borgwardt GP Applications Tubingen 18

Page 32: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression Overview

Response to External Stimuli

I Organisms such as plants respond toexternal stimuli:

I Heat/ColdI StarvationI Biotic stresses (fungus)I ...

I Changes in observed gene expressionlevels.

O. Stegle & K. Borgwardt GP Applications Tubingen 18

Page 33: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression Overview

Response to External Stimuli

I Organisms such as plants respond toexternal stimuli:

I Heat/ColdI StarvationI Biotic stresses (fungus)I ...

I Changes in observed gene expressionlevels.

O. Stegle & K. Borgwardt GP Applications Tubingen 18

Page 34: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression Overview

Differential Gene Expression

I An important goal is the identificationof differentially expressed genes.

I Identification of involved regulatorycomponents.

I Uncovering parts of the biologicalnetwork.

O. Stegle & K. Borgwardt GP Applications Tubingen 19

Page 35: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression Overview

Differential Gene Expression

I An important goal is the identificationof differentially expressed genes.

I Identification of involved regulatorycomponents.

I Uncovering parts of the biologicalnetwork.

O. Stegle & K. Borgwardt GP Applications Tubingen 19

Page 36: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression Overview

Differential Gene Expression in Time SeriesTime Series

I The response to external stimuli is a dynamic process.

I Hence the response should be studied as a function of time.

O. Stegle & K. Borgwardt GP Applications Tubingen 20

Page 37: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression Overview

Differential Gene Expression in Time SeriesTime Series

I The response to external stimuli is a dynamic process.

I Hence the response should be studied as a function of time.

10 20 30 40Time/h

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

5.5

Log

expr

essi

onle

vel

TreatmentControl

Non-differentially expressed

10 20 30 40Time/h

−1

0

1

2

3

4

5

6

7

Log

expr

essi

onle

vel

TreatmentControl

Differentially expressed

O. Stegle & K. Borgwardt GP Applications Tubingen 20

Page 38: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression Overview

Differential Gene Expression in Time SeriesChallenges

I Time series expression profiles vary smoothly over time.

I Noisy observations – outliers.

I Multiple replicates.

I Few observations.

I Temporal patterns (intervals) of differential gene expression.

10 20 30 40Time/h

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

5.5

Log

expr

essi

onle

vel

TreatmentControl

Non-differentially expressed

10 20 30 40Time/h

−1

0

1

2

3

4

5

6

7

Log

expr

essi

onle

vel

TreatmentControl

Differentially expressed

O. Stegle & K. Borgwardt GP Applications Tubingen 21

Page 39: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression A Gaussian process two-sample test

Gaussian process ModelModel comparison

I The basic idea – a comparison of two models:I The shared model: Expression levels are explained by a single process.I The independent model: Expression levels are explained by two

separate processes.

O. Stegle & K. Borgwardt GP Applications Tubingen 22

Page 40: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression A Gaussian process two-sample test

Gaussian process ModelModel comparison

I The basic idea – a comparison of two models:I The shared model: Expression levels are explained by a single process.I The independent model: Expression levels are explained by two

separate processes.

O. Stegle & K. Borgwardt GP Applications Tubingen 22

Page 41: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression A Gaussian process two-sample test

Gaussian process ModelModel comparison

I The basic idea – a comparison of two models:I The shared model: Expression levels are explained by a single process.I The independent model: Expression levels are explained by two

separate processes.

O. Stegle & K. Borgwardt GP Applications Tubingen 22

Page 42: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression A Gaussian process two-sample test

Gaussian process modelBayesian Network: Shared Model

I Data in conditions A and Bobserved at N time points withR replicates.

I A Gaussian process priorincorporates beliefs aboutsmoothness.

I Noise is is modeled separatelyper-replicate, σA/Br .

O. Stegle & K. Borgwardt GP Applications Tubingen 23

Page 43: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression A Gaussian process two-sample test

Gaussian process modelBayesian Network: Shared Model

I Data in conditions A and Bobserved at N time points withR replicates.

I A Gaussian process priorincorporates beliefs aboutsmoothness.

I Noise is is modeled separatelyper-replicate, σA/Br .

O. Stegle & K. Borgwardt GP Applications Tubingen 23

Page 44: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression A Gaussian process two-sample test

Gaussian process modelBayesian Network: Shared Model

I Data in conditions A and Bobserved at N time points withR replicates.

I A Gaussian process priorincorporates beliefs aboutsmoothness.

I Noise is is modeled separatelyper-replicate, σA/Br .

O. Stegle & K. Borgwardt GP Applications Tubingen 23

Page 45: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression A Gaussian process two-sample test

Gaussian process modelBayesian Network: Both Models

I The independent model follows in an analogous manner.

Shared model HS Independent model HI

O. Stegle & K. Borgwardt GP Applications Tubingen 24

Page 46: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression A Gaussian process two-sample test

Gaussian Process ModelInference

I Models are compared using the Bayes factor

Score = log

Independent model︷ ︸︸ ︷P (DA,DB |HI)

P (DA,DB |HS)︸ ︷︷ ︸Shared model

.

I Writing out the GP models explicitly leads to

Score = logP (YA |HGP,T

A, )P (YB |HGP,TB, )

P (YA ∪YB |HGP,TA ∪TB, ).

(YA/B : expression levels in conditions A and B; TA/B : observation time points)

O. Stegle & K. Borgwardt GP Applications Tubingen 25

Page 47: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression A Gaussian process two-sample test

Gaussian Process ModelInference

I Models are compared using the Bayes factor

Score = log

Independent model︷ ︸︸ ︷P (DA,DB |HI)

P (DA,DB |HS)︸ ︷︷ ︸Shared model

.

I Writing out the GP models explicitly leads to

Score = logP (YA |HGP,T

A,θI)P (YB |HGP,T

B,θI)

P (YA ∪YB |HGP,TA ∪TB,θS).

(YA/B : expression levels in conditions A and B; TA/B : observation time points)

O. Stegle & K. Borgwardt GP Applications Tubingen 25

Page 48: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression A Gaussian process two-sample test

Gaussian Process ModelInference

I Models are compared using the Bayes factor

Score = log

Independent model︷ ︸︸ ︷P (DA,DB |HI)

P (DA,DB |HS)︸ ︷︷ ︸Shared model

.

I Writing out the GP models explicitly leads to

Score = logP (YA |HGP,T

A, θI )P (YB |HGP,T

B, θI )

P (YA ∪YB |HGP,TA ∪TB, θS ).

(YA/B : expression levels in conditions A and B; TA/B : observation time points)

O. Stegle & K. Borgwardt GP Applications Tubingen 25

Page 49: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression A Gaussian process two-sample test

Gaussian Process ModelShared Model

I Given observed data from both conditions D = {YA/B,TA/B} theposterior distribution over latent function values f is

P (f |Y,T,θK,θL) ∝N (f |0,KT(θK))

×∏

c∈{A,B}

R∏r=1

N∏n=1

pL(ycr,tn | ftn ,θL),

I Covariance funciton (kernel)

I Noise model

I Hyperparameters θS = {θK,θL} (length scale, noise levels)

I For Gaussian noise, pL(ycr,t | f cr,t,θL) = N

(ycr,t

∣∣∣ f cr,t, (σcr)2), the

model is tractable in closed form.

O. Stegle & K. Borgwardt GP Applications Tubingen 26

Page 50: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression A Gaussian process two-sample test

Gaussian Process ModelShared Model

I Given observed data from both conditions D = {YA/B,TA/B} theposterior distribution over latent function values f is

P (f |Y,T,θK,θL) ∝N(f∣∣∣0, KT(θK)

∏c∈{A,B}

R∏r=1

N∏n=1

pL(ycr,tn | ftn ,θL),

I Covariance funciton (kernel)

I Noise model

I Hyperparameters θS = {θK,θL} (length scale, noise levels)

I For Gaussian noise, pL(ycr,t | f cr,t,θL) = N

(ycr,t

∣∣∣ f cr,t, (σcr)2), the

model is tractable in closed form.

O. Stegle & K. Borgwardt GP Applications Tubingen 26

Page 51: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression A Gaussian process two-sample test

Gaussian Process ModelShared Model

I Given observed data from both conditions D = {YA/B,TA/B} theposterior distribution over latent function values f is

P (f |Y,T,θK,θL) ∝N(f∣∣∣0, KT(θK)

∏c∈{A,B}

R∏r=1

N∏n=1

pL(ycr,tn | ftn ,θL) ,

I Covariance funciton (kernel)

I Noise model

I Hyperparameters θS = {θK,θL} (length scale, noise levels)

I For Gaussian noise, pL(ycr,t | f cr,t,θL) = N

(ycr,t

∣∣∣ f cr,t, (σcr)2), the

model is tractable in closed form.

O. Stegle & K. Borgwardt GP Applications Tubingen 26

Page 52: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression A Gaussian process two-sample test

Gaussian Process ModelShared Model

I Given observed data from both conditions D = {YA/B,TA/B} theposterior distribution over latent function values f is

P (f |Y,T, θK,θL ) ∝N(f∣∣∣0, KT(θK)

∏c∈{A,B}

R∏r=1

N∏n=1

pL(ycr,tn | ftn ,θL) ,

I Covariance funciton (kernel)

I Noise model

I Hyperparameters θS = {θK,θL} (length scale, noise levels)

I For Gaussian noise, pL(ycr,t | f cr,t,θL) = N

(ycr,t

∣∣∣ f cr,t, (σcr)2), the

model is tractable in closed form.

O. Stegle & K. Borgwardt GP Applications Tubingen 26

Page 53: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression A Gaussian process two-sample test

Gaussian Process ModelShared Model

I Given observed data from both conditions D = {YA/B,TA/B} theposterior distribution over latent function values f is

P (f |Y,T, θK,θL ) ∝N(f∣∣∣0, KT(θK)

∏c∈{A,B}

R∏r=1

N∏n=1

pL(ycr,tn | ftn ,θL) ,

I Covariance funciton (kernel)

I Noise model

I Hyperparameters θS = {θK,θL} (length scale, noise levels)

I For Gaussian noise, pL(ycr,t | f cr,t,θL) = N

(ycr,t

∣∣∣ f cr,t, (σcr)2), the

model is tractable in closed form.

O. Stegle & K. Borgwardt GP Applications Tubingen 26

Page 54: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression A Gaussian process two-sample test

Robustness With Respect to Outliers

I Outliers in the expression profilecan obscure the regressionresults.

I A mixture noise-model accountsfor outliers.

I Inference in this model is doneusing Expectation Propagation.

O. Stegle & K. Borgwardt GP Applications Tubingen 27

Page 55: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression A Gaussian process two-sample test

Robustness With Respect to Outliers

I Outliers in the expression profilecan obscure the regressionresults.

I A mixture noise-model accountsfor outliers.

6 4 2 0 2 4 60.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

I Inference in this model is doneusing Expectation Propagation.

O. Stegle & K. Borgwardt GP Applications Tubingen 27

Page 56: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression A Gaussian process two-sample test

Robustness With Respect to Outliers

I Outliers in the expression profilecan obscure the regressionresults.

I A mixture noise-model accountsfor outliers.

6 4 2 0 2 4 60.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

I Inference in this model is doneusing Expectation Propagation.

O. Stegle & K. Borgwardt GP Applications Tubingen 27

Page 57: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression Experimental Results on Arabidopsis

Illustration of the Model ComparisonA Differentially Expressed Gene

I Shared model.

O. Stegle & K. Borgwardt GP Applications Tubingen 28

Page 58: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression Experimental Results on Arabidopsis

Illustration of the Model ComparisonA Differentially Expressed Gene

I Independent model.

O. Stegle & K. Borgwardt GP Applications Tubingen 28

Page 59: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression Experimental Results on Arabidopsis

Illustration of the Model ComparisonA Differentially Expressed Gene

I Model comparison.

O. Stegle & K. Borgwardt GP Applications Tubingen 28

Page 60: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression Experimental Results on Arabidopsis

Predictive Performance (RECOMB09)

I Data:I 30,336 Arabidopsis thaliana gene probesI Biotic stress: fungus infectionI 24 time points, 4 biological replicates

I Evaluation of alternative methods on 2000 randomly chosenhuman-labeled probes:

I GP no robustI GP robustI F-Test (FT) (MAANOVA package)I Timecourse (TC) (Tai and Speed)

O. Stegle & K. Borgwardt GP Applications Tubingen 29

Page 61: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression Experimental Results on Arabidopsis

Predictive Performance (RECOMB09)

I Data:I 30,336 Arabidopsis thaliana gene probesI Biotic stress: fungus infectionI 24 time points, 4 biological replicates

I Evaluation of alternative methods on 2000 randomly chosenhuman-labeled probes:

I GP no robustI GP robustI F-Test (FT) (MAANOVA package)I Timecourse (TC) (Tai and Speed)

O. Stegle & K. Borgwardt GP Applications Tubingen 29

Page 62: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression Experimental Results on Arabidopsis

Predictive Performance (RECOMB09)ROC Curves

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False positive rate

Tru

e po

sitiv

e ra

te

GP robust (AUC 0.985)GP standard (AUC 0.944)FT (AUC 0.859)TC (AUC 0.869)

O. Stegle & K. Borgwardt GP Applications Tubingen 30

Page 63: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression Detecting Temporal Patterns of Differential Expression

Time-local GPTwoSampleMotivation

I Differential expressionis not static over time.

I The responsedevelops over time.

I Different regulatorsmay be active atdistinct times andtrigger each other.

O. Stegle & K. Borgwardt GP Applications Tubingen 31

Page 64: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression Detecting Temporal Patterns of Differential Expression

Time-local GPTwoSampleMotivation

I Differential expressionis not static over time.

I The responsedevelops over time.

I Different regulatorsmay be active atdistinct times andtrigger each other.

O. Stegle & K. Borgwardt GP Applications Tubingen 31

Page 65: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression Detecting Temporal Patterns of Differential Expression

Time-local GPTwoSampleModel

I The shared andindependent model areinterleaved over time.

I Indicator variables ztnswitch between bothmodels.

I Inference is done usingGibbs sampling (later).

O. Stegle & K. Borgwardt GP Applications Tubingen 32

Page 66: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression Detecting Temporal Patterns of Differential Expression

Time-local GPTwoSampleModel

I The shared andindependent model areinterleaved over time.

I Indicator variables ztnswitch between bothmodels.

I Inference is done usingGibbs sampling (later).

O. Stegle & K. Borgwardt GP Applications Tubingen 32

Page 67: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression Detecting Temporal Patterns of Differential Expression

Time-local GPTwoSampleModel

I The shared andindependent model areinterleaved over time.

I Indicator variables ztnswitch between bothmodels.

I Inference is done usingGibbs sampling (later).

O. Stegle & K. Borgwardt GP Applications Tubingen 32

Page 68: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression Detecting Temporal Patterns of Differential Expression

Time-local GPTwoSampleExample Results

The example from before

O. Stegle & K. Borgwardt GP Applications Tubingen 33

Page 69: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression Detecting Temporal Patterns of Differential Expression

Time-local GPTwoSampleExample Results

The example from before Periodic differential expression

O. Stegle & K. Borgwardt GP Applications Tubingen 33

Page 70: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression Detecting Temporal Patterns of Differential Expression

Smooth Time-local GPTwoSample

I Transitions betweennon-differential anddifferential expressioncan occur at unobservedtime points and aresmooth.

I Extending the time-localmodel with a GP prior asgating network on theswitch variables.

I Inference using Gibbssampling.

O. Stegle & K. Borgwardt GP Applications Tubingen 34

Page 71: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression Detecting Temporal Patterns of Differential Expression

Smooth Time-local GPTwoSample

I Transitions betweennon-differential anddifferential expressioncan occur at unobservedtime points and aresmooth.

I Extending the time-localmodel with a GP prior asgating network on theswitch variables.

I Inference using Gibbssampling.

O. Stegle & K. Borgwardt GP Applications Tubingen 34

Page 72: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression Detecting Temporal Patterns of Differential Expression

Smooth Time-local GPTwoSampleGibbs Sampling

I Gibbs sampling exploits tractableconditional distributions.

I Individual indicators zti are resampledin turn

P(zti = s | z\ti ,T,Y,θS,θI,θG

)∼ P

(Y | zti = s, z\ti ,T,θI,θS

)× P

(zti = s | z\ti ,θG

).

I Data likelihood from GP experts

I Predictive distribution from gating network

O. Stegle & K. Borgwardt GP Applications Tubingen 35

Page 73: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression Detecting Temporal Patterns of Differential Expression

Smooth Time-local GPTwoSampleGibbs Sampling

I Gibbs sampling exploits tractableconditional distributions.

I Individual indicators zti are resampledin turn

P(zti = s | z\ti ,T,Y,θS,θI,θG

)∼ P

(Y | zti = s, z\ti ,T,θI,θS

)× P

(zti = s | z\ti ,θG

).

I Data likelihood from GP experts

I Predictive distribution from gating network

O. Stegle & K. Borgwardt GP Applications Tubingen 35

Page 74: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression Detecting Temporal Patterns of Differential Expression

Smooth Time-local GPTwoSampleGibbs Sampling

I Gibbs sampling exploits tractableconditional distributions.

I Individual indicators zti are resampledin turn

P(zti = s | z\ti ,T,Y,θS,θI,θG

)∼ P

(Y | zti = s, z\ti ,T,θI,θS

)× P

(zti = s | z\ti ,θG

).

I Data likelihood from GP experts

I Predictive distribution from gating network

O. Stegle & K. Borgwardt GP Applications Tubingen 35

Page 75: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression Detecting Temporal Patterns of Differential Expression

Smooth Time-local GPTwoSampleGibbs Sampling

I Gibbs sampling exploits tractableconditional distributions.

I Individual indicators zti are resampledin turn

P(zti = s | z\ti ,T,Y,θS,θI,θG

)∼ P

(Y | zti = s, z\ti ,T,θI,θS

)× P

(zti = s | z\ti ,θG

).

I Data likelihood from GP experts

I Predictive distribution from gating network

O. Stegle & K. Borgwardt GP Applications Tubingen 35

Page 76: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression Detecting Temporal Patterns of Differential Expression

Smooth Time-local GPTwoSampleExample Results

Early differential expression Transient differential expression

O. Stegle & K. Borgwardt GP Applications Tubingen 36

Page 77: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression Detecting Temporal Patterns of Differential Expression

Detecting Transition Points in Arabidopsis Microarray Time SeriesStart/Stop Times

I Studying the temporaldistribution of differentialexpression across genes.

I Considered were the top 6000differentially expressed genes.

I Differential expression appearsto occur in two waves with starttimes at 20h an 25h afterinfection.

I Only very few genes stopdifferential behavior within themeasured time interval.

Distribution of differential start/stop time

O. Stegle & K. Borgwardt GP Applications Tubingen 37

Page 78: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression Detecting Temporal Patterns of Differential Expression

Detecting Transition Points in Arabidopsis Microarray Time SeriesStart/Stop Times

I Studying the temporaldistribution of differentialexpression across genes.

I Considered were the top 6000differentially expressed genes.

I Differential expression appearsto occur in two waves with starttimes at 20h an 25h afterinfection.

I Only very few genes stopdifferential behavior within themeasured time interval.

Distribution of differential start/stop time

O. Stegle & K. Borgwardt GP Applications Tubingen 37

Page 79: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression Detecting Temporal Patterns of Differential Expression

Detecting Transition Points in Arabidopsis Microarray Time SeriesStart/Stop Times

I Studying the temporaldistribution of differentialexpression across genes.

I Considered were the top 6000differentially expressed genes.

I Differential expression appearsto occur in two waves with starttimes at 20h an 25h afterinfection.

I Only very few genes stopdifferential behavior within themeasured time interval.

Distribution of differential start/stop time

O. Stegle & K. Borgwardt GP Applications Tubingen 37

Page 80: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression Detecting Temporal Patterns of Differential Expression

Detecting Transition Points in Arabidopsis Microarray Time SeriesStart Times for Gene Categories

I This distribution can bebroke down into genecategories.

I WRKY Family oftranscription factors isknown to be involved instress response.

Distribution of differential start time

O. Stegle & K. Borgwardt GP Applications Tubingen 38

Page 81: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 2: differential gene expression Detecting Temporal Patterns of Differential Expression

Detecting Transition Points in Arabidopsis Microarray Time SeriesStart Times for Gene Categories

I This distribution can bebroke down into genecategories.

I WRKY Family oftranscription factors isknown to be involved instress response.

Distribution of differential start time

O. Stegle & K. Borgwardt GP Applications Tubingen 38

Page 82: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 3: Modeling transcriptional regulation using Gaussianprocesses

Outline

Application1: modelling physiological time seriesOverviewGaussian process prior for heart rateResults

Application 2: differential gene expressionOverviewA Gaussian process two-sample testExperimental Results on ArabidopsisDetecting Temporal Patterns of Differential Expression

Application 3: Modeling transcriptional regulation using Gaussianprocesses

Summary

O. Stegle & K. Borgwardt GP Applications Tubingen 39

Page 83: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 3: Modeling transcriptional regulation using Gaussianprocesses

Motivation

I This part of the course is inspired and based on results of apublication by Neil D. Lawrence et al.Modelling transcriptional regulation using Gaussian processesftp://ftp.dcs.shef.ac.uk/home/neil/gpsim.pdf.

O. Stegle & K. Borgwardt GP Applications Tubingen 40

Page 84: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 3: Modeling transcriptional regulation using Gaussianprocesses

Motivation

I Microarray technologies allow tomeasure mRNA levels.

I The functional proteins and theirconcentration levels remain unobserved.

I Motivation: Infer the hidden proteinconcentrations?

O. Stegle & K. Borgwardt GP Applications Tubingen 41

Page 85: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 3: Modeling transcriptional regulation using Gaussianprocesses

Motivation

I Microarray technologies allow tomeasure mRNA levels.

I The functional proteins and theirconcentration levels remain unobserved.

I Motivation: Infer the hidden proteinconcentrations?

O. Stegle & K. Borgwardt GP Applications Tubingen 41

Page 86: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 3: Modeling transcriptional regulation using Gaussianprocesses

Motivation

I Microarray technologies allow tomeasure mRNA levels.

I The functional proteins and theirconcentration levels remain unobserved.

I Motivation: Infer the hidden proteinconcentrations?

O. Stegle & K. Borgwardt GP Applications Tubingen 41

Page 87: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 3: Modeling transcriptional regulation using Gaussianprocesses

A single gene model

I The change in gene expression abundance yi for a gene i isapproximately described by a differential equation model of the form

dyi(t)

dt= Bi + Sif(t)−Diyi(t)

I f(t) regulatory transcription factor.I Bi basal transcription rate.I Si sensitivity of the gene to the transcription factor.I Di decay rate of the mRNA.

I Goal: infer the unobserved activation f(t) from mRNA measurementsof multiple target genes.

O. Stegle & K. Borgwardt GP Applications Tubingen 42

Page 88: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 3: Modeling transcriptional regulation using Gaussianprocesses

A single gene model

I The change in gene expression abundance yi for a gene i isapproximately described by a differential equation model of the form

dyi(t)

dt= Bi + Sif(t)−Diyi(t)

I f(t) regulatory transcription factor.I Bi basal transcription rate.I Si sensitivity of the gene to the transcription factor.I Di decay rate of the mRNA.

I Goal: infer the unobserved activation f(t) from mRNA measurementsof multiple target genes.

O. Stegle & K. Borgwardt GP Applications Tubingen 42

Page 89: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 3: Modeling transcriptional regulation using Gaussianprocesses

Derivative observations

I The key to solving this problem are derivative observations.

I Given knowledge about the derivative of a function f we would like toinfer its function values:

dy =∂f(t)

∂t

I We wish to find the joint probability of function values and functionderivatives

cov(dyi, yj) =∂

∂ticov(yi, yj)

cov(dyi, dyj) =∂2

∂ti∂tjcov(yi, yj)

I Using these covariance functions we can combine functionobservations and derivatives as training data in GP regression.

O. Stegle & K. Borgwardt GP Applications Tubingen 43

Page 90: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 3: Modeling transcriptional regulation using Gaussianprocesses

Derivative observations

I The key to solving this problem are derivative observations.

I Given knowledge about the derivative of a function f we would like toinfer its function values:

dy =∂f(t)

∂t

I We wish to find the joint probability of function values and functionderivatives

cov(dyi, yj) =∂

∂ticov(yi, yj)

cov(dyi, dyj) =∂2

∂ti∂tjcov(yi, yj)

I Using these covariance functions we can combine functionobservations and derivatives as training data in GP regression.

O. Stegle & K. Borgwardt GP Applications Tubingen 43

Page 91: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 3: Modeling transcriptional regulation using Gaussianprocesses

Derivative observationsSquared exponential kernel

I For the squared exponential kernel we obtain:

cov(yi, yj) = k(ti, tj) = A2e−0.5(ti−tj)

2

L2

cov(dyi, yj) = −cov(yi, yj)(ti − tj)L2

cov(dyi, dyj) = cov(yi, yj)1

L2

[δi,j −

1

L2(ti − tj)2

]

O. Stegle & K. Borgwardt GP Applications Tubingen 44

Page 92: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 3: Modeling transcriptional regulation using Gaussianprocesses

Derivative observationsExample

(From E. Solak et al.

Derivative observations in Gaussian Process models of dynamic systems)

O. Stegle & K. Borgwardt GP Applications Tubingen 45

Page 93: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 3: Modeling transcriptional regulation using Gaussianprocesses

Back to the ODE model for gene regulation

dyi(t)

dt= Bi + Sif(t)−Diyi(t)

I An explicit solution of the ODE system can be derived (standard ODEtechniques)

yi(t) =BiDi

+ kie−Dit + Sie

−Dit

∫ t

0

f(u) exp(Diu)du

yi(t) =BiDi

+ Li[f ](t)

I Realizing that Li is a linear operator (like taking the derivative), wecan again evaluate the covariance between f(t) and Li[f ](t).

O. Stegle & K. Borgwardt GP Applications Tubingen 46

Page 94: Oliver Stegle and Karsten Borgwardt - ETH Z

Application 3: Modeling transcriptional regulation using Gaussianprocesses

ODE model for gene regulationInference results

(From N. D. Lawrence et al.

Modelling transcriptional regulation using Gaussian processes)

O. Stegle & K. Borgwardt GP Applications Tubingen 47

Page 95: Oliver Stegle and Karsten Borgwardt - ETH Z

Summary

Outline

Application1: modelling physiological time seriesOverviewGaussian process prior for heart rateResults

Application 2: differential gene expressionOverviewA Gaussian process two-sample testExperimental Results on ArabidopsisDetecting Temporal Patterns of Differential Expression

Application 3: Modeling transcriptional regulation using Gaussianprocesses

Summary

O. Stegle & K. Borgwardt GP Applications Tubingen 48

Page 96: Oliver Stegle and Karsten Borgwardt - ETH Z

Summary

Summary

I The design and choice of covariance functions allows for flexiblemodeling tasks.

I Prior on heart rateI Derivative observations, ODE systems

I Model comparison using Gaussian processes.I Testing for differential gene expression.

O. Stegle & K. Borgwardt GP Applications Tubingen 49