Correction des rsultats exprimentaux lors du processus de
criblage haut dbit
Vladimir Makarenkov Universit du Qubec Montral (UQAM) Montral,
Canada Talk outline What is HTS ? Hit selection Random and
systematic error
Statistical methods to correct HTS measurements: Removal of the
evaluated background Well correction Comparison of methods for HTS
data correction and hit selection Conclusions What is HTS?
High-throughput screening (HTS) is a large-scale process to screen
thousands of chemical compounds to identify potential drug
candidates (i.e. hits). Malo, N. et al., Nature Biotechnol. 24, no.
2, (2006). An HTS procedure consists of running chemical compounds
arranged in 2-dimensional plates through an automated screening
system that makes experimental measurements. What is HTS? Samples
are located in wells The plates are processed in sequence
What is HTS? Samples are located in wells The plates are processed
in sequence Screened samples can be divided into active (i.e. hits)
and inactive ones Most of the samples are inactive The measured
values for the active samples are substantially different from the
inactive ones Automated plate handling system and plates of SSI
Robotics
HTS technology Automated plate handling system and plates of SSI
Robotics Classical hit selection
Compute the mean value and standard deviation of the observed plate
(or of the whole assay) Identify samples whose values differ from
the mean by at least c (where c is a preliminary chosen constant)
For example, in case of an inhibition assay, by choosingc = 3, we
select samples with values lower than - 3 Random and systematic
error
Random error : a variability that randomly deviates all measured
values from their true levels. This is a result of random
influences in each particular well on a particular plate (~ random
noise). Usually, it cannot be detected or compensated. Systematic
error : a systematic variability of the measured values along all
plates of the assay. It can be detected and its effect can be
removed from the data. Sources of systematic error
Various sources of systematic errors can affect experimental
HTSdata, and thus introduce a bias into the hit selection process
(Heueret al. 2003), including: Systematic errors caused by ageing,
reagent evaporation or cell decay. They can be recognized as smooth
trends in the plate means/medians. Errors in liquid handling and
malfunction of pipettes can also generate localized deviations from
expected values. Variation in incubation time, time drift in
measuring different wells or different plates, and reader effects.
Random and systematic error
Examples of systematic error (computed and plotted by HTS
Corrector): HTS assay, Chemistry Department, Princeton University :
inhibition of the glycosyltransferase MurG function of E. coli.,
164 plates, 22 rows and 16 columns. Hit distribution by rows and
columns of Assay 1 computed for 164 plates. Hits were selected with
the threshold m-1. Hit distribution surface of Assay 1 (a) raw data
(b) approximated data Raw and evaluated background: an
example
Computed and plotted by HTS Corrector Systematic error detection
tests comparison
Simulation 1. Plate Size: 96 wells Cohens Kappa vs Error Size (from
Dragiev et al., 2011) Systematic error size: 10% (at most 2 columns
and 2 rows affected). First column: (a) - (c): a = 0.01; Second
column: (d) - (f): a = 0.1. Systematic Error Detection Tests: ()
t-test and () K-S test. Systematic error detection tests
comparison
Simulation 2, Plate Size: 96 wells, Cohens Kappa vs Hit Percentage
(from Dragiev et al., 2011) Systematic error size: 10% (at most 2
columns and 2 rows affected). First column: cases (a) - (b): a =
0.01; Second column: cases (c) - (d): a = 0.1. Systematic Error
Detection Tests: () t-test, () K-S test and () goodness-of-fit
test. Hit distribution surface of Assay 1 (a) raw data (b)
approximated data Data normalization (called z-score
normalization)
Applying the following formula, we can normalize the elements of
the input data: where xi - input element, x'i - normalized output
element, - mean value, - standard deviation, and n total number of
elements in the plate. The output data conditions will be x=0 and
x=1. Evaluated background An assay background can be defined as the
mean of normalized plate measurements, i.e. where : x'i,j -
normalized value in well i of plate j, zi - background value in
well i, N - total number of plates. Removing the evaluated
background: main steps
Normalization of the HTS data by plate Elimination of the outliers
Computation of the evaluated background Elimination of systematic
errors by subtracting the evaluated background from the normalized
data Selection of hits in the corrected data Removing the evaluated
background:hit distribution surface before and after the correction
Hit distribution after the correction
HTS assay, Chemistry Department, Princeton University : inhibition
of the glycosyltransferase MurG function of E. coli., 164 plates,
22 rows and 16 columns. Hit distribution by rows in Assay 1 (164
plates): (a) hits selected with the threshold m-1 ; (b) hits
selected with the threshold m-2 Well correction method: main
ideas
Once the data are plate normalized, it is possible to analyze their
values in each particular well along the entire assay The
distribution of inactive measurements along a fixed well should be
zero-mean centered (if there is no systematic error) and the
compounds are randomly distributed along this well Values along
each well may have ascending and descending trends that can be
detected via a polynomial approximation Example of systematic error
in the McMaster dataset
Hit distribution surface for a McMaster dataset (1250 plates - a
screen of compounds that inhibit the Escherichia coli dihydrofolate
reductase). Values deviating from the plate means for more than 1
standard deviation (SD) were taken into account during the
computation. Well, row and column positional effects are shown.
Well correction method: an example
McMaster University HTS laboratory screen of compounds to inhibit
the Escherichia coli dihydrofolate reductase. 1250 plates with 8
rows and 10 columns. Well correction method: main steps
Normalize the data within each plate (plate normalization) Compute
the trend along each well using polynomial approximation Subtract
the obtained trend from the plate normalized values Normalize the
data along each well (well normalization) Reexamine the hit
distribution surface Hit distribution for different thresholds for
the raw and corrected McMaster data
Hit distributions for the raw (a, c, and e) and well-corrected (b,
d, and f) McMaster datasets (a screen of compounds that inhibit the
Escherichia coli dihydrofolate reductase; 1250 plates of size 8x10)
obtained for the thresholds: (a and b) : m - 1s (c and d) : m 1.5s
(e and f) : m - 2s Simulations with random data: Constant noise
Comparison of the hit selection methods
Hit detection rate False positives - classical hit selection O -
background subtraction method - well correction method Simulations
with random data: Variable noise Comparison of the hit selection
methods
Hit detection rate False positives - classical hit selection O -
background subtraction method - well correction method Comparison
of methods for data correction in HTS (1)
Method 1. Classical hit selection using the value m - cs as a hit
selection threshold, where the mean value m and standard deviation
s are computed separately for each plate and c is a preliminary
chosen constant (usually c equals 3). Method 2. Classical hit
selection using the value m - cs as a hit selection threshold,
where the mean value m and standard deviation s are computed over
all the assay values, and c is a preliminary chosen constant. This
method can be chosen when we are certain that all plates of the
given assay were processed under the same testing conditions.
Comparison of methods for data correction in HTS (2)
Method 3. Median polish procedure (Tukey 1977) can be used to
remove the impact of systematic error. Median polish works by
alternately removing the row and column medians. In our study,
Method 2 was applied to the values of the matrix of the obtained
residuals in order to select hits. Method 4 (designed at Merck
Frost). B score normalization procedure (Brideau et al., 2003) is
designed to remove row/column biases in HTS (Malo et al., 2006).
The residual (rijp) of the measurement for row i and column j on
the p-th plate is obtained by median polish. The B score is
calculated as follows: B score = where MADp = median{|rijp
median(rijp)|}. To select hits, this computation was followed by
Method 2, applied to the B score matrix. Comparison of methods for
data correction in HTS (3)
Method 5. Well correction procedure followed by Method 2 applied to
the well-corrected data. The well correction method consists of two
main steps: 1. Least-squares approximation of the data carried out
separately for each well of the assay. 2. Z score normalization of
the data within each well across all plates of the assay Simulation
design Random standard normal datasets N(0, 1) were generated.
Different percentages of hits (1% to 5%) were added to them.
Systematic error following a standard normal distribution with
parameters N(0, c), where c equals 0, 0.6s, 1.2s, 1.8s, 2.4s, or 3s
was added to them. Two types of systematic error were : A.
Systematic errors stemming from row x column interactions:
Different constant values were applied to each row and each column
of the first plate. The same constants were added to the
corresponding rows and columns of all other plates. B. Systematic
error stemming from changing row x column interactions: As in (A)
but with the values of the row and column constants varying across
plates. Systematic error stemming from constant row x column
interactions
A. Systematic errors stemming from row x column interactions:
Different constant values were applied to each row and each column
of the first plate. The same constants were added to the
corresponding rows and columns of all other plates. Results for
systematic error stemming from row x column interactions which are
constant across plates (SD = s) The results were obtained with the
methods using plates parameters (i.e., Method 1, ), assay
parameters (i.e., Method 2, ), median polish (x), B score (), and
well correction (). The abscissa axis indicates the noise factor (a
and c - with fixed hit percentage of 1%) and the percentage of
added hits (b and d - with fixed error rate of 1.2SD). Systematic
error stemming from varying row x column interactions
B. Systematic error stemming from changing row x column
interactions: As in (A) but with the values of the row and column
constants varying across plates. Results for systematic error
stemming from row x column interactions which are varying across
plates (SD = s) The results were obtained with the methods using
plates parameters (i.e., Method 1, ), assay parameters (i.e.,
Method 2, ), median polish (x), B score (), and well correction ().
The abscissa axis indicates the noise factor (a and c - with fixed
hit percentage of 1%) and the percentage of added hits (b and d -
with fixed error rate of 1.2SD). HTS Corrector Software
Main features Visualization of HTS assays Data partitioning using
k-means Evaluation of the background surface Correction of
experimental datasets Hit selection using different methods
Chi-square analysis of hit distribution Contact: Download from:
Conclusions The proposed well correction method (Makarenkov et al,
2007) rectifies the distribution of assay measurements by
normalizing data within each considered well across all assay
plates. Simulations suggest that the well correction procedure is a
robust method that should be used prior to the hit selection
process. When neither hits nor systematic errors were present in
the data, the well correction method showed the performances
similar to the traditional methods of hit selection. Well
correction generally outperformed the Median polish and B score
methods as well as the classical hit selection procedure. The
simulation study confirmed that Method 2 based on the assay
parameters was more accurate than Method 1 based on the plates
parameters. Therefore, in case of identical testing conditions for
all plates of the given assay, all assay measurements should be
treated as a single batch. Future research Application of
clustering methods for hit selection in HTS Incorporation into the
method the information about chemical properties of the tested
compounds Possible combinations with the virtual HTS screening
(vHTS) information and combinatorial chemistry models References
Brideau C., Gunter B., Pikounis W., Pajni N. and Liaw A. (2003)
Improved statistical methods for hit selection in HTS. J. Biomol.
Screen., 8, Dragiev, P., Nadon, R. and Makarenkov, V. (2011).
Systematic error detection in experimental high-throughput
screening, BMC Bioinformatics, 12:25. Heyse S. (2002) Comprehensive
analysis of high-throughput screening data. In Proc. of SPIE 2002,
4626, Kevorkov D. and Makarenkov V. (2005) Statistical analysis of
systematic errors in HTS. J. Biomol. Screen., 10, Makarenkov V.,
Kevorkov D., Zentilli P., Gagarin A., Malo N. and Nadon R. (2006)
HTS-Corrector: new application for statistical analysis and
correction of experimental data, Bioinformatics, 22, Makarenkov V.,
Zentilli P., Kevorkov D., Gagarin A., Malo N. and Nadon R. (2007)
An efficient method for the detection and elimination of systematic
error in HTS, Bioinformatics, 23, Malo N., Hanley J.A., Cerquozzi
S., Pelletier J. and Nadon R. (2006) Statistical practice in HTS
data analysis. Nature Biotechnol., 24, Tukey J.W. (1977)
Exploratory Data Analysis. Cambridge, MA: Addison-Wesley.