software for the analysis of data from cell-based...

42
Functional Genomics and Bioconductor Software for the analysis of data from cell-based assays

Upload: others

Post on 17-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

Functional Genomics and Bioconductor

Software for the analysis of data from cell-basedassays

Page 2: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

Functional Profiling

• each protein has one or several specific function(s) in the cell

• for a large part of the proteins the function is still unclear

• some functional information may be found in the proteinstructure or through homology

• the context/cellular environment is important for the functionstudy function within that context

Page 3: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

Functional Profiling: Identification of Disease Genes

(≈

-disease-associated genes

“hot” candidates

21,000+ human cDNAs(~genes)

Genome-wide

microarray study (cancervs. normal, in vitro)

cellular assay(in vivo)

Page 4: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

How to infer a protein‘s function

+

-perturbation

phenotype

but: phenotype ≠ function!

Page 5: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

• means to monitor effect of perturbation

expression or activation state of key regulatory proteins (fluorescence reader, FACS, automated microscope)

Design of cell-based assays

• means to monitor perturbation (beneficial but not mandatory)

expression of fluorescence protein tag

• system to willfully manipulate expression level of certain genes in cells

up regulation (transfection of expression vectors)

down regulation (RNA interference)

+/-

--

++

Page 6: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

RNAi as a loss of function perturbator

gene-sequence specific reagents

(eg siRNAs)

easy to make for any gene

(there are caveats...)

protein

living cells

mRNA

gene

degradation

translation

transcription

Page 7: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

gene-sequence specific reagents

(eg siRNAs)

easy to make for any gene

(there are caveats...)

protein

living cells

mRNA

gene

degradation

transcription

RNAi as a loss of function perturbator

Page 8: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

gene-sequence specific reagents

(eg siRNAs)

easy to make for any gene

(there are caveats...)

living cells

mRNA

gene

transcription

RNAi as a loss of function perturbator

Page 9: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

Any cellular process can be probed.- (de-)activation of a signaling pathway- cell differentiation- changes in the cell cycle dynamics- morphological changes- activation of apoptosisSimilarly, for organisms (e.g. fly embryos, worms)

Phenotypes can be registered at various levels of detail- yes/no alternative- single quantitative variable- tuple of quantitative variables- image- time course

What is a phenotype: it all depends on the assay

Page 10: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

Plate reader96 or 384 well, 1…4 measurements per well

FACS4…8 measurements per cell, thousands of cellsper well

Automated Microscopyunlimited

Monitoring Tools

Page 11: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

cellHTS (Ligia Bras, M. Boutros)genome-wide screens with scalar (or low-dimensional) read-outdata management, normalization, quality assessment, visualization,

hit scoring, reproducibility, publicationraw data -> annotated hit list

prada (Florian Hahne); flowCore et al. (B. Ellis, P. Haaland, N. Lemeur, F. Hahne)flow cytometrydata management

EBImage (O. Sklyar)image processing and analysisconstruction of feature extraction workflows for large sets of similar images

imageHTS (O. Sklyar, F. Fuchs, M. Boutros) (scheduled for release 2.1)web-based presentation of high-content screening data and results

Bioconductor packages for cell-based assays

Page 12: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

cell

num

ber

plate plots as graphical representation of experimental entities

• false color coding for concise display of numeric outcomes from statistical analyses

• HTML image map allows for hyper linking to include further information for each well

visualization of results

quantitative

Visualization: plate plots

Page 13: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

visualization of results

plate plots as graphical representation of experimental entities

• false color coding for concise display of numeric outcomes from statistical analyses

• HTML image map allows for hyper linking to include further information for each well

Visualization: plate plots

qualitative

Page 14: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

visualization of results

plate plots as graphical representation of experimental entities

• false color coding for concise display of numeric outcomes from statistical analyses

• HTML image map allows for hyper linking to include further information for each well

Visualization: plate plots

additionalinformation

Page 15: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

visualization of results

plate plots as graphical representation of experimental entities

• false color coding for concise display of numeric outcomes from statistical analyses

• HTML image map allows for hyper linking to include further information for each well

Visualization: plate plots

replicates

Page 16: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

Bioconductor package for the analysis of cell-based high-throughput screening (HTS) assays

genome-wide screens with scalar (or low-dimensional) read-out

Manage all data and metadata relevant for interpreting a cell-based screen

Data cleaning, preprocessing, primary statistical analysis

Raw data -> annotated hit list

Boutros, Bras, Huber. Analysis of cell-based RNAi screens. Genome Biology (2006)

The cellHTS package

Page 17: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

The cellHTS package: workflow

Page 18: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

per plate quality assessment• Dynamic range

• Distribution of the intensity values for each replicate

• Scatterplot between replicates and correlation coefficient

• Plate plots for individual replicates and for standard deviation between replicates

per experiment quality assessment• Boxplots grouped by plate

• Distribution of the signal in the control wells, Z'-factor

whole screen visualization

Quality Report rendered in HTML

The cellHTS package

Page 19: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

Z´- Factor

' 1 3 p n

p n

Zσ + σ

= −µ −µ

Zhang JH, Chung TD, Oldenburg KR, "A Simple Statistical Parameter for Use in Evaluation and Validation of High Throughput Screening Assays." J Biomol Screen. 1999;4(2):67-73.

Page 20: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

Plate to plate variability

Lum

ines

cenc

e

(384-well) Plate ID

Page 21: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

Normalization: Plate effects

Percent of control

Normalized percent inhibition

z-score

k-th welli-th plate100' ki

ki posi

xx =µ

×

100pos

' i kiki pos neg

i i

µ xx =µ µ

−×

' ki iki

i

xx =σ

µ−

Page 22: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

Spatial normalization

B-score:two-way medianpolish

rth rowcth column

ith plate

( )ˆˆˆrci i ri ci'rci

i

x µ + R + Cx =

MAD

after

fitted row and column effects

before

Malo et al., Nat. Biotech. 2006

Page 23: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

Normalization and library design

Hek293 cellsviability screenBoutros Lab DKFZ

Page 24: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

Plate 26

proteasome subunits or components;

ATP/GTP-binding site motifs

ribosomal proteins

like-Sm nucleoproteins and ribosomal proteins

Normalization problem…Too many hits

Page 25: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

How to estimate the normalization parameters?

From which data points:• Based on the intensities of the controls

if they work uniformly well across all plates

• Based on the intensities of the samples invoke assumptions such as "most genes have no effect", or "same distribution of effect sizes"

Which estimator:mean vs median vs shorthstandard deviation vs MAD vs IQR

In the best case, it doesn't matter.No universally optimal answer, it depends on the data.

Page 26: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

Estimators of location

Histogram of x

x

qy

-2 0 2 4 6 8 10

020

4060

8010

012

0

meanmedianshorthhalf.range.mode

mean

Page 27: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

4 different siRNAs per gene

G03

H13

I17A

04

H17

A01

B01J

12

G03

F11

I01F

11

B04A

10

G03

F11

G03

F11

A04A

07

B06

C05

K01A

02

C12

F09

Page 28: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

FACS: fluorescence activated cell sorting (= flow cytometry)

light scatter detector

Fluorescence detectors

Laser

• measures fluorescence intensities as well as morphological parameters on the basis of light emission

• offers single cell resolution

• robust, reliable, flexible

Page 29: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

flowCore package: overview

package flowCore contains data structures and functionality for flow cytometry data

• data import • data management• data preprocessing

- transformation- filtering

• flow-specific procedures- gating

associated packages: • flowViz visualization• flowQ quality assessment• flowUtils utilities

compatibility with other softwareby following the standardizeddescription of flow cytmetry data

Page 30: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

• FCS 3.0 files- standardized storage format for FACS data- contains fluorescence values in data segment, wealth of meta

data in text segment- can be imported into R

Data import and data structures

• flowFrameR internal representation of data from one FCS file

- raw data matrix

- list of meta data• flowSet

R internal representation of data from several FCS files (e.g. one 96 well plate)

Page 31: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

Software implementation

flowFrame:

description

parameters

1 0 0 1 1 1 0 0 1 1 1 0 0 0 1 1 1 0 0 0 0 0 0 11 0 0 0 1 1 0 1 1 1 0 1

dataexprs(cytoFrame)

description(cytoFrame)

plot(cytoFrame)

cytoFrame[1,]

read.FCS(file) construct flowFrame from FCS file

get/set data matrix

get meta data

smoothed scatter plot, histogram

subsetting

phenoData

frames

flowSet:

phenoData(flowSet)

fsApply(flowSet, foo)

flowSet[1:3]

flowSet[[1]]

subsetting to flowSet

subsetting to flowFrame

apply function each frame

get experiment meta data

Page 32: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

Gating/Filtering

Gate: region in multidimensional space defining the filteringoperation of a subset of the cell population

• rectangle gates• polygon gates• ellipsoid gates• data-driven gates

G1

G2

G1 ∪ G2

G1 ∆ G2

G1 \ G2

G1 ∩ G2 gate arithmetic

interactive drawing

Page 33: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

Data-driven filtering: kmeansFilter

• k1• k2• k3

Page 34: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

distinction on basis of morphological properties

variation between experiments

dynamic determination

cell size

gran

ular

ityData-driven filtering: FSC/SSC preprocessing

Page 35: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

Data-driven filtering: FSC/SSC preprocessing

assumption:bivariate normal distribution

robust fitting

discarding cells that do not lie within some given boundary of this distribution

=density ofdistribution

= discarded

X =midpoint ofdistribution

Page 36: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

Data-driven filtering: FSC/SSC preprocessing

=density ofdistribution

= discarded

X =midpoint ofdistribution

shape and location of main distribution can be used for quality control

assumption:bivariate normal distribution

robust fitting

discarding cells that do not lie within some given boundary of this distribution

Page 37: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

A typical phenotype

scatter plot of two measurement parameters:phenotype against level of perturbation

parameter 1(perturbation)

para

met

er 2

(phe

noty

pe)

Page 38: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

A typical phenotype

scatter plot of two measurement parameters:phenotype against level of perturbation

parameter 1(perturbation)

para

met

er 2

(phe

noty

pe)

activation

Page 39: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

A typical phenotype

scatter plot of two measurement parameters:phenotype against level of perturbation

parameter 1(perturbation)

para

met

er 2

(phe

noty

pe)

inhibition

Page 40: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

parameter correlation

cell size correlates with fluorescent intensities

(FL1)

(FL4)

specifictotal xsx ++= βα

induces spurious correlations in the data

s: cell size (FSC) xtotal : measured fluorescencexspecific: actual fluorescence emitted by dye

Page 41: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

• robust fitting of smoothed local regression function:y: response (phenotype)x: perturbation signalm: smooth function

: robust estimator of m at point xt

• z-score as dimension-less measure of effect:ratio of estimated slope δ at point xt and assay-widescale parameter δ0

z = 18.1 z = 0.4 z = -40.2

t* t* t*

Modeling of phenotype

( )( )txm

xxmyy′=

+−+=)δ

ε00

0δδ

=z

)(ˆ txm′

Page 42: Software for the analysis of data from cell-based …marray.economia.unimi.it/2007/material/day5/Lecture10a.pdfSoftware for the analysis of data from cell-based assays Functional Profiling

Acknowledgements

Wolfgang HuberAnnemarie PoustkaStefan Wiemann

Dorit ArltMeher MajetyMamatha SauermannChristian Schmidt

Andreas BuneßMarkus RuschhauptHeiko Rosenfelder

Alex MehrleDirk LedwinkaTim BeissbarthAchim Tresch

Michael BoutrosLigia BrasFlorian FuchsThomas HornDierk IngelfingerSandra Steinbrink

Robert GentlemanNolwenn Le MeurByron EllisPerry Haaland