using cellprofiler for biological image analysiscyto+u+webinar...user-friendly interface ... •...
TRANSCRIPT
1
Mark-Anthony Bray, Ph.D
Imaging Platform, Broad Institute
Cambridge, Massachusetts, [email protected] 0.4233
54,454
45.777
0.6886
0.0055
6.9994
83.333
14.113
1.5567
0.0954
0.5553
Using CellProfiler for Biological Image Analysis
Quantitative Analysis of Large-Scale Biological Image Data
2
2
Summary
• Background on image-based screening
• Introduction to CellProfiler considerations in image analysis
• Construction and use of a pipeline for analyzing typical image data
• Measurement export and preparation for additional analysis
3
3
Images Contain A Wealth Of Information
http://www.microscopyu.com Image: Javier Irazoqui
4
4
Visual Appearance Indicates Biological State
• Automatic image analysis is– Objective
– Quantitative, with statistics
– Can measure multiple properties at once for every cell
– Distinguishes subtle changes, even those undetectable by eye
– Faster, less tedious
• Images contain a wealth of biological information
• That information can be quantified
Localization
… + hundreds of other features
mRNA or
protein levels
morphology
5
5
Cells or organisms in multiwell plates, each well treated with a gene or chemical perturbant
Automated microscopy
(any manufacturer)
High-Content Screening
Data exploration
& machine learning
Anne
Carpenter
Ray
Jones
Cell measurements
(size, shape, intensity, texture, etc.)
6
6
Software Overview
• Available from www.cellprofiler.org• Free, open source (Python)• Software available for Windows, Mac and Linux
Image Analysis &
Quantification
Image-centric
Data Analysis
7
7
CellProfiler: Overview
• Process large sets of images
• Identifies and measures objects
• Export data for further analysis
• Goal: Provide powerful image analysis methods with a user-friendly interface
• Philosophy: Measure everything, ask questions later...
• Support data analysis based on individual cells
8
8
Typical CellProfiler Pipeline Workflow
• For image-based assays, the basic objective is always to – Identify cells/organisms
– Measure feature(s) of interest
• The uniqueness of each assay comes in– Deciding what compartments
to identify and how to identify them
– Determining which measure(s) are most useful to identify interesting samples
9
9
Typical CellProfiler Pipeline Workflow
10
10
The CellProfiler Interface
• Pipeline panel: Displays modules in pipeline– Modules executed in order from top to bottom
Change module position
Add or remove modules
Module help
11
11
Load pipeline by double-clicking on it
View images by double-clicking on the filename
The CellProfiler Interface
• File panel: Displays files in default image folder
12
12
The CellProfiler Interface
• The figure window has additional menu options
• Toolbar menu: Pan, zoom in/out
• CellProfiler Image Tools– Image Tool (also
displayed by clicking on image)
– Interactive zoom
– Show pixel data (location, intensity)
13
13
The CellProfiler Interface
• Folder panel: Change default input and output directories– Usually these should be separate folders
Input folder: Contains images to be analyzed
Output folder: Contains the output file plus exported data and images
14
14
The CellProfiler Interface
• Settings panel: View and change settings for each module– Clicking on a different module updates the settings view
15
15
Module Categories
• File processing: Image input, file output
• Image processing: Often used for pre-processing prior to object identification
• Object processing:Identification, modification of objects of interest
• Measurement: Collection of measurements from objects of interest
• Data Tools: Measurement exploration, measurement output
16
16
The First Module: LoadImages
• Related how? Depending on the imaging device, one file may represent– One channel at one imaging location
– Multiple channels at one imaging location
– Multiple channels at multiple locations
– Etc…
• Loads an image set
A group of related images to be processed
DNA GFP
17
17
The First Module: LoadImages
• Can use text matching to define the difference between images in a set
All images stained for GFP have the text Channel1- in the name
Same for DNA images (Channel2-)
Assign each image a meaningful name for downstream reference
18
18
Object Identification
• Once the images are loaded, how do you find objects of interest?
• Step 1: Distinguish the foreground from the background by picking a good threshold
• Step 2: Identify objects as regions brighter than the threshold
• Step 3: Cut and join objects to “improve” their shape
19
19
Primary Object Identification
• Many options for thresholding, cut and join methods, etc.
20
20
Thresholding
• Definition: Division of the image into background and foreground
• Method: Pick the method that provides the best results– Otsu: Default - Good for readily identifiable foreground / background
– Background, RobustBackground: Good for images in which most of the image is comprised of background
• What is the best threshold value for dividing the intensity into foreground and background pixels?
Pixel values
Fre
qu
en
cy
21
21
Thresholding
• Correction factor
– Multiplication factor applied to threshold
– Adjusts threshold stringency/leniency
– Setting this factor is empirical
• Upper/lower bounds
– Set safety limits on automatic threshold to guards against false positives
– Helpful for unexpected images: Empty wells, images with dramatic artifacts, etc
22
22
Object Separation
• We need to distinguish multiple objects contained in the same “clump”
Images from Carolina Wahlby
•••
••
••
•
• Once the foreground objects have been identified, what next?
23
23
Object Separation
• Two step process in “de-clumping”1. Identification of the objects in a clump2. Drawing boundaries between the clumped objects
Adjust settings to “de-clump” objects
24
24
Object Separation
– Intensity: Works best if objects are brighter at center, dimmer at edges
– Shape: Works best if objects have indentations where clumps touch (esp. if objects are round)
Peaks
2
1 2
Indentations
• Clump identification: Two options
1
1
•••
••
••
•
25
25
Object Separation
– Distance: Draws boundary lines midway between object centers
– Intensity: Draws boundary lines at dimmest line between objects
• Test Mode allows users to view results of all setting combinations
• Drawing boundaries: Two options
1
•••
••
••
•
26
26
Object Separation
• Additional separation settings: Adjust these settings if objects are being incorrectly split into pieces or merged together
Original image Smoothing filter
size = 4
Smoothing filter
size = 8
• Smoothing: Increase to reduce intensity irregularities which produce over-segmentation of objects
27
27
Object Separation
• Suppress Local Maxima– Smallest distance allowed between object intensity
peaks to be considered one object rather than a clump
– Decrease to reduce improper merging of objects in clumps
Original image Maxima
distance = 4
Maxima
distance = 8
Maxima
28
28
Object Separation
• Adjusting can produce more improper segmentation than it solves
• The proper settings are usually a matter of trial and error– The automatic settings are a good starting point, though
• However….
Original image Smoothing filter
size = 4
Smoothing filter
size = 8
29
29
Filtering Invalid Objects
• See FilterObjects module for more advanced filtering options
Discard objects that fail size criterion or touch the image border
30
30
Primary Object Identification
• Segmented objects are colored– Shows if each object has
been identified and separated properly
• Outlines: Valid objects– Green: Valid
– Yellow: Invalid – Touching border
– Red: Invalid – Size criterion
• Also outputs object count
31
31
Secondary Object Identification
• Goal: Identify cell boundaries by “growing” primary objects– Nuclei typically more uniform in shape, more easily separated than cells
• Approach: Segment nuclei → Seeds for cell segmentation by using a cell stain channel
32
32
Secondary Object Identification
• Methods– Distance-N: Ignores image
information• Useful in cases where no cell
stain is present
– Watershed, propagate, Distance-B: Uses image information
• Finds dividing lines between objects and background / neighbors
• Test mode allows user to view results of all methods
Propagation
Distance-N
33
33
Tertiary Object Identification
• Goal: Identify tertiary objects by removing the primary objects from secondary objects
– “Subtract” the nuclei objects from cell objects to obtain cytoplasm
Cells Nuclei Cytoplasm— ═
34
34
Pixel-Based Image Classification
• For images where a threshold cannot be found…
• CellProfiler is packaged with ilastik, a pixel-based classification tool– User manually labels regions of image– ilastik uses features to distinguish regions and create a classifier– Classifier used as input into ClassifyPixels module– Currently, Windows only
DIC ilastik Foreground/background mask
35
35
Measurement Modules: Object Morphology
Select the objects to measure
36
36
Module: MeasureObjectAreaShape
• Goal: Measure morphological features such as – Area
– Perimeter
– Eccentricity
– MajorAxisLength
– MinorAxisLength
– Orientation
– FormFactor: Compactness measure, circle = 1, line = 0
37
37
Measurement Modules: Object Intensity
Select the image to measure from
Select the objects to measure
38
38
Module: MeasureObjectIntensity
• Goal: Measure object intensity features such as
– Integrated intensity: Sum of the pixel intensities within an object
– Mean, median, standard deviation intensities
– Maximal and minimal pixel intensities
– Lower/Upper quartile
• The object intensity may be obtained from any image, not just the image used to identify the object
– Example: Ph3 intensity may be measured using the nuclei objects
39
39
Measurement Modules: Object Texture
Select the image to measure from
Select the objects to measure
Select the spatial scale
40
40
MeasureObjectTexture
• Goal: Determine whether the staining pattern is smooth on a particular scale
• Selection of the appropriate texture scale is essentially empirical
– A higher number measures larger patterns of texture
– Smaller numbers measure more localized (finer) patterns of texture
• Can also add several texture modules to the pipeline, each measuring a different texture scale
41
41
Other Measurement Modules
• CalculateMath: Arithmetic operations for measurements
• CalculateStatistics: Assay quality (V and Z' factors) and dose response data (EC50) for all measurements
• Image-based measures– MeasureImageAreaOccupied
– MeasureImageGranularity
– MessureImageIntensity
• Object-based measures– MeasureCorrelation
– MeasureObjectNeighbors
– MeasureRadialDistribution
42
42
Data Export Modules
• User may output images or image measurements
Select the objects to export
43
43
Measurement Display
• The averagemeasurements for all objects in the image are displayed in the figure window
• However, the individualmeasurements for each object are stored in the output file
44
44
Data Export Modules
• Goal: Retain images of intermediate image processing steps for quality control or save measurements for later analysis and exploration
• SaveImages: Writes an image to a file– Intermediate images in the pipeline are not saved unless
requested
– Choice of many image formats to write → module can be used as an image format converter
• ExportToSpreadsheet: Export measurements as a comma-separated file readable by spreadsheet programs
• ExportToDatabase: Export measurements as a per-object and per-table plus configuration file for a MySQL or SQLite database
45
45
Cluster Computing
• If processing time is too great on a single computer, then run the pipeline on a cluster– Install CellProfiler on a computing cluster
– Add the ExportToDatabase module
– Add/configure the CreateBatchFiles module to the end of the pipeline
– Run the pipeline to create a batch file
– Submit the batches to your cluster for processing
– Check the progress of processing
• For really big screens, it is necessary to process images in batches on a computing cluster.
46
46
Megakaryocyte Polyploidization: Leukemia
DNA stain, with
outlines identifying the nuclei
Martha Vokes
Mark
Bray
SU6656
(positive control)
Project in progress
per-cell DNA content (log2)
pro
po
rtio
n o
f ce
lls
SU6656DMSO
DMSO
(negative control)
John Crispino,
Northwestern
University
Jeremy Wen,
postdoc
Status: Identified 206 polyploidization
regulators from 10k compound screen
47
47
Images from BioImage SBS image analysis comparison. Thanks to Ilya Ravkin
Carpenter, et al., Genome Biology, 2006
Measuring Morphology
48
48
Upcoming: CellProfiler 2.1
• Major changes
– Streamlined loading of images and associated data
– Takes advantage of multiple CPU cores, so very large sets of images can now be processed on a regular desktop computer
• Release scheduled for early 2014
49
49
Final Notes
• Where to get help
– Access help from the CellProfiler main window
– Ask for help on the CellProfiler.org forum
51
51
Image assay developmentApply image analysis methods to biological questions
Mark
Bray
Anne
Carpenter David
Logan
Algorithm development & software engineeringDevelop & test new image analysis and data mining methods
and create open-source software tools
IT/Administration
Matthew
Veneskey
Vebjørn
Ljoså
Carolina
Wählby
Carpenter Lab / Broad Institute Imaging Platform
Lee
Kamentsky
Shantanu
Singh
Director
Holger
Hennig
52
52
Acknowledgments
S.D.G.
Free, at www.cellprofiler.org:
Recent funding for this work provided by:
NIH NIGMS (Carpenter: R01 GM089652 and Wahlby: R01 GM095672)
The Broad Institute of Harvard and MIT
Many thanks to our many biology collaborators who
provide images
Contact:[email protected]