estimating signal with next generation affymetrix software earl hubbell, ph.d. principal...

20
Estimating Signal with Next Estimating Signal with Next Generation Affymetrix Software Generation Affymetrix Software Earl Hubbell, Ph.D. Earl Hubbell, Ph.D. Principal Statistician, Applied Research Principal Statistician, Applied Research

Upload: isabel-kerrie-norman

Post on 01-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Estimating Signal with Next Generation Estimating Signal with Next Generation Affymetrix Software Affymetrix Software

Earl Hubbell, Ph.D.Earl Hubbell, Ph.D.Principal Statistician, Applied ResearchPrincipal Statistician, Applied Research

Estimating Signal with Next Generation Estimating Signal with Next Generation Affymetrix Software Affymetrix Software

Earl Hubbell, Ph.D.Earl Hubbell, Ph.D.Principal Statistician, Applied ResearchPrincipal Statistician, Applied Research

Quick Review of AvgDiffQuick Review of AvgDiff

• Operates on PM-MMOperates on PM-MM

• Removes largest & smallest valuesRemoves largest & smallest values

• Removes >3 standard deviation valuesRemoves >3 standard deviation values

-1200

-1000

-800

-600

-400

-200

0

200

400

600

800

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Probe Pairs

Inte

nsit

y PM-MMLower limitUpper limit

Areas for improvementAreas for improvement

• AvgDiff Minimally Robust against Minority AvgDiff Minimally Robust against Minority ProbesProbes

• Negative Values Negative Values ImpossibleImpossible for for Concentration or IntensityConcentration or Intensity

• Negative Values Indicate Bias Is Larger Negative Values Indicate Bias Is Larger than True Effect than True Effect

• Incompatible with Standard Log-Incompatible with Standard Log-TransformationTransformation

Desirable PropertiesDesirable Properties

• Robust against minority probesRobust against minority probes

• Doesn’t yield unphysical results for signalDoesn’t yield unphysical results for signal

• Reasonable predictor of concentrationReasonable predictor of concentration

A simple model for intensityA simple model for intensity

• PM Intensity = Real Signal+ Stray SignalPM Intensity = Real Signal+ Stray Signal

• Real, Stray, PM all non-negativeReal, Stray, PM all non-negative

• log(Real) = log(Real) = log(Affinity) + log(Concentration) + elog(Affinity) + log(Concentration) + e

• (multiplicative error model)(multiplicative error model)

AvgDiff (MAS 4.0)AvgDiff (MAS 4.0)

• PMPM

• Stray Estimate = MMStray Estimate = MM

• Super-Olympic-Super-Olympic-Scoring on PM-MM Scoring on PM-MM (mean like statistic)(mean like statistic)

Making an estimate of signalMaking an estimate of signal- observe PM - observe PM - adjust PM for stray signal- adjust PM for stray signal- value = statistic(adjusted PM)- value = statistic(adjusted PM)

Signal (MAS 5.0)Signal (MAS 5.0)

• PMPM

• Stray Estimate = CT Stray Estimate = CT [best of two estimates][best of two estimates]

• Tukey Biweight on Tukey Biweight on log(PM-CT)log(PM-CT)(median like)(median like)

Handling stray signalHandling stray signal

• PM intensities have stray signal component PM intensities have stray signal component (intensity not due to real signal)(intensity not due to real signal)

• Many MM have similar stray signal to PMMany MM have similar stray signal to PM

• But some MM are not useful for estimation of But some MM are not useful for estimation of stray signalstray signal

• Anomalous MM values can be handled with Anomalous MM values can be handled with imputationimputation

At zero concentration PM has non-zero intensity

As concentration increases, intensity increases

0

100

200

300

400

500

600

700

800

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Probe

Inte

ns

ity

0pM1pM2pM

Some mismatches don’t tell us about stray signal

0

200

400

600

800

1000

1200

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Probe

Inte

nsi

ty

PM MM

Model-violating MM values censor real signal information

- Impute typical stray signal for such PM probes

-4

-2

0

2

4

6

8

10

12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

probe pair

log

(in

ten

sity

)

log(PM-MM) log(PM-Proportion)

Removal of stray signal estimate leaves positive values

0

100

200

300

400

500

600

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

probe

inte

nsi

ty

0 picomolar1 picomolar2 picomolar

Signal calculation (equations)Signal calculation (equations)

• Signal = TukeySignal = Tukeybiweightbiweight(log(Adjusted PM))(log(Adjusted PM))

• Stray = MM (if physically possible)Stray = MM (if physically possible) oror

• log(Stray) = log(PM)-log(Stray proportion) log(Stray) = log(PM)-log(Stray proportion) (if impossible)(if impossible)

• Stray proportion = max(SB, positive)Stray proportion = max(SB, positive)

• SB = TukeySB = Tukeybiweightbiweight(log(PM)-log(MM)) (log(PM)-log(MM)) (“typical” log-ratio)(“typical” log-ratio)

Is signal a reasonable predictor of Is signal a reasonable predictor of concentration?concentration?

• Near linear behaviorNear linear behavior

• Stabilized varianceStabilized variance

Average Signal for 12 human spiked transcripts (3x replicate)

6

8

10

12

14

16

18

-3 -1 1 3 5 7 9 11

log(conc)

log

(sig

nal

)

Signal is near-linear and has stabilized variance inthe middle range of concentrations

8

9

10

11

12

13

14

15

16

17

18

-3 -1 1 3 5 7 9 11

log(concentration)

Sig

nal

Resistance to outliersResistance to outliers

• Introduce 10% artificial outliers to check Introduce 10% artificial outliers to check robustnessrobustness

• Nonparametric correlation to handle both Nonparametric correlation to handle both log-scale and linear-scale datalog-scale and linear-scale data

• Verify data against known spike Verify data against known spike concentrationconcentration

Superior performance against outliers

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Signal AvgDiff

Ken

dal

l co

rrel

atio

n

0% outliers10% outliers

-200

-150

-100

-50

0

50

100

150

200

250

1 2 3 4 5 6 7 8 9

Experiment

Val

ue

AvgDiff Signal

MAS 5.0 more robust against outliers in biological samples

Adrenal Kidney Pancreas

1535_at from Hu95A

SummarySummary

• Mas 5.0 Signal is a reasonable predictor of Mas 5.0 Signal is a reasonable predictor of concentrationconcentration

• Tukey biweight resists outliers Tukey biweight resists outliers

• AvgDiff insufficiently robust in biological AvgDiff insufficiently robust in biological samplessamples

• Log-scale transformation now possibleLog-scale transformation now possible

• Continued algorithm development Continued algorithm development underway...underway...

AcknowledgementsAcknowledgements

• Wei-Min LiuWei-Min Liu

• Fred ChristiansFred Christians

• Tom RyderTom Ryder

• Suzanne DeeSuzanne Dee

• Steve SmeekensSteve Smeekens

• Paul KaplanPaul Kaplan

• Rui MeiRui Mei

• Teresa WebsterTeresa Webster

• Xiaojun DiXiaojun Di

• Ming-hsiu HoMing-hsiu Ho

• Jyoti BaidJyoti Baid

• Chris HarringtonChris Harrington

• Tarif AwadTarif Awad