processing of highprocessing of high-d- imensional data

17
Processing of high Processing of high-dimensional data dimensional data for mathematical modeling for mathematical modeling Clemens Kreutz, Jens Timmer July, 14 th 2009 [email protected] [email protected]

Upload: others

Post on 03-Dec-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Processing of highProcessing of high--dimensional data dimensional data for mathematical modeling for mathematical modeling

Clemens Kreutz, Jens Timmer

July, 14th 2009

[email protected]@fdm.uni-freiburg.de

PreprocessingPreprocessing: : ManyMany competingcompeting strategiesstrategies

block

spot

There are many competing reasonable

approaches.

A change of the data processing has

noticable effects on the outcomes.block

quality filtering

Affymetrix: Affymetrix: DependencyDependency on on thethe preprocessingpreprocessing

(A) All genes

Data processingapproach

Data (ranks) Mean squares (ranks) Test statistic (ranks)

g

r

m

v

g

r

m

v

g

r

m

v

Test statistic (ranks)

approach

(B) Present called genes

Data processing approach

Data processingapproach

Data (ranks) Mean squares (ranks) Test statistic (ranks)

g

r

m

v

p

g

r

m

v

p

g

r

m

v

p

Data processing approach

p p p

Cooperation with AG Walz/T. Kurz

Data processingapproach

Data processing approach

Affymetrix: Affymetrix: DependencyDependency on on thethe preprocessingpreprocessing

„Present“ called genes

gcrma

Mean squares

P-values

No. of

probe sets

co

nco

rda

nce

Data processing approach

Data processing approach rma

mas5

vsn

plier

Cooperation with AG Walz/T. Kurz

cDNAcDNA chipschips: : DependencyDependency on on thethe preprocessingpreprocessing

processingapproach

Data (ranks) Mean squares (ranks) Test statistic (ranks)

Data processing approach

Data processing

Cooperation with AG Walz/T. Kurz

cDNAcDNA chipschips: : DependencyDependency on on thethe preprocessingpreprocessing

Mean squares

P-values

No. of

probe sets

concord

ance

Data processing approach

Data processing approach

Cooperation with AG Walz/T. Kurz

Microarrays for Microarrays for FFPE FFPE tissuestissues

Ø Formalin fixation and paraffin (FFPE) embeddingis a standard sample preparation in histopathology.

Ø So far, microarrays were not applicableto FFPE samples.

We could show for FFPE colorectal tumor tissuesthat

1. Adapted sample preparation

2. Latest chip technology

3. Adapted normalization

yield informative data.

Cooperation with AG Werner/S. Lassmann

FFPE tissue yield informative dataFFPE tissue yield informative data

FFFFPE

Correctionfor the FFPE bias

Cooperation with AG Werner/S. Lassmann

FFPE tissue yield informative FFPE tissue yield informative datadata

correlation

fre

qu

ency

fresh-frozen samplesFFPE samples

Probe-sets from the same gene

are correlated.

Cooperation with AG Werner/S. Lassmann

44%

0.18%

56 5644 44 % overlap within the 100

mostly regulated genes between the samples.

Microarray based classification of CLL patientsMicroarray based classification of CLL patients

Chronic lymphatic leukemia, N=51, Affymetrix

Ø Prediction of the mutational status of the immunoglobulin heavy-chain gene:

IgV (H) mutated vs. IgV (H) not mutated (34 vs. 17)IgV (H) mutated vs. IgV (H) not mutated (34 vs. 17)

Ø The mutational status is known to be related to the prognosis

Ø Proof of concept study

Ø Classification performance evaluated for many classifier-, dimension reduction-, and cross validation approaches

Cooperation with AG Veelken/D. Pfeifer

Prediction of the mutational statusPrediction of the mutational status

Cooperation with AG Veelken/D. Pfeifer

~75-85% correct predictionsout of sample

Effect size based dimension reduction Effect size based dimension reduction and classificationand classification

LOOCV

Cooperation with AG Veelken/D. Pfeifer

New approach for gene New approach for gene set analysesset analyses

Ø Investigation of the behavior of functionally related genes

Ø We estimate the fraction of regulated genes in in a gene set.

Regulated Gene Ontology Categories

regulated genes in in a gene set.

not sign.

marginal

sign.

Cooperation with AG Veelken/D. Pfeifer

p-value

frequency

p-value

frequency

no regulation many regulated genes

0 1 0 1

Tiling array data and differentially expressed regionsTiling array data and differentially expressed regions

Position on the DNA

AGAG TimmerTimmer vs. vs. AG SchumacherAG Schumacher

Gene expression as predictor,clinical response

Exp. condition as predictor,gene expression as response

Clinical applicationsof high throughput data

Identification of the processes at the molecular level

clinical responsegene expression as response

Heterogeneous clinical samples (e.g. blood, clinical covariates)

Homogeneous biological samples (e.g. cell lines)

Different data processing,platform specific calibration Established data processing

Mechanistic explanation and modeling of the dynamics (e.g. regulatory networks)

Phenomenological prediction (e.g. for diagnosis, prognosis)

Core Facility for Data Analysis (ZBSA)Core Facility for Data Analysis (ZBSA)

Ø Adequate statistical analyses

Ø Experimental design

Ø Development of new methodological approaches

Ø Platform specific data processingØ Platform specific data processing

Ø Cooperation with the experimental core facilities(e.g. standard processing workflows)

Ø Cooperation with the IMBI

Ø Dynamic modeling of molecular interaction networks

Thanks for your attention.Thanks for your attention.Thanks for your attention.Thanks for your attention.

[email protected]@fdm.uni-freiburg.de