estimating autoantibody signatures to detect autoimmune ... · 2 z. wu and others i. smoothing. for...

13
Biostatistics (2017), 0, 0, pp. 113 doi:10.1093/biostatistics/gel-supplementary-materials˙rev1 Supplementary Materials to “Estimating AutoAntibody Signatures to Detect Autoimmune Disease Patient Subsets” ZHENKE WU *,1 , LIVIA CASCIOLA-ROSEN 2 , AMI A. SHAH 2 , ANTONY ROSEN 2 , SCOTT L. ZEGER 3 1 Department of Biostatistics and Michigan Institute of Data Science, University of Michigan, Ann Arbor, Michigan 48109 2 Division of Rheumatology, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, 21224 3 Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21205 * [email protected] Appendix S1. Raw Data and Standardization Let (t raw , M raw )= (t raw gs ,M raw gis ) represent the raw, high-frequency GEA data, for pixel s = 1,...,S g , lane i =1,...,N g , gel g =1,...,G. Here t raw is a grid that evenly splits the unit interval [0, 1] with t raw gs = s/S g [0, 1], where a large S g represents a high imaging resolution. Note that in the raw data, S g varies by gel from 1, 437-1, 522 in our application. M raw gis is the radioactive intensity scanned at t raw gs for lane i =1,...,N g , gel g =1,...,G. Let N = g N g be the total number of samples tested. For the rest of this section, we process the high-frequency data (t raw , M raw ) into high-frequency data (t 0 , M 0 ) that have been standardized across multiple gels. The latter will be used as input for peak detection (Section 2.2). * To whom correspondence should be addressed. c The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: [email protected]

Upload: others

Post on 15-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Estimating AutoAntibody Signatures to Detect Autoimmune ... · 2 Z. Wu and others i. Smoothing. For each sample lane, smooth the raw intensity data by LOESS, with a span h= 0.022

Biostatistics (2017), 0, 0, pp. 1–13doi:10.1093/biostatistics/gel-supplementary-materials˙rev1

Supplementary Materials to

“Estimating AutoAntibody Signatures to Detect

Autoimmune Disease Patient Subsets”

ZHENKE WU∗,1, LIVIA CASCIOLA-ROSEN2, AMI A. SHAH2,

ANTONY ROSEN2, SCOTT L. ZEGER3

1 Department of Biostatistics and Michigan Institute of Data Science, University of Michigan,Ann Arbor, Michigan 48109

2 Division of Rheumatology, Department of Medicine, Johns Hopkins University School ofMedicine, Baltimore, Maryland, 21224

3 Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21205

[email protected]

Appendix S1. Raw Data and Standardization

Let (traw,Mraw) ={

(trawgs ,Mrawgis )

}represent the raw, high-frequency GEA data, for pixel s =

1, . . . , Sg, lane i = 1, . . . , Ng, gel g = 1, . . . , G. Here traw is a grid that evenly splits the unit

interval [0, 1] with trawgs = s/Sg ∈ [0, 1], where a large Sg represents a high imaging resolution.

Note that in the raw data, Sg varies by gel from 1, 437-1, 522 in our application. M rawgis is the

radioactive intensity scanned at trawgs for lane i = 1, . . . , Ng, gel g = 1, . . . , G. Let N =∑g Ng be

the total number of samples tested.

For the rest of this section, we process the high-frequency data (traw,Mraw) into high-frequency

data (t0,M0) that have been standardized across multiple gels. The latter will be used as input

for peak detection (Section 2.2).

∗To whom correspondence should be addressed.

c© The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: [email protected]

Page 2: Estimating AutoAntibody Signatures to Detect Autoimmune ... · 2 Z. Wu and others i. Smoothing. For each sample lane, smooth the raw intensity data by LOESS, with a span h= 0.022

2 Z. Wu and others

i. Smoothing. For each sample lane, smooth the raw intensity data by LOESS, with a span

h = 0.022. Let M = {Mgis} denote the smoothed mean function evaluated at raw imaging

location trawgs .

ii. Standardization Across Gels. Imaging resolution may vary by gel, we hence standardize

the smoothed intensity values M into B0 = 700 bins using a set of evenly-spaced break

points {0 = κ0 < κ1 < . . . < κB0 = 1} shared by all gels.

We clip the images at the right end {b : tb > 0.956} because they represent small molecular

weight molecules migrating at the dye front (that is, not separable by gel type used). Their

exclusion from autoantibody comparisons is standard practice. We denote the standardized

data by (t0,M0) ={(t0b ,M

0gib

)}.

Appendix S2. Peak Calling Algorithm

Given a half-width h, collect the peak-candidate bins defined by B0gi(h) = {b | scoregi(b) = 3}.

Because h controls the locality of a peak, we perform peak-candidacy search for a few values of

h. Denote the union of the identified peak-candidate bins by B0gi = ∪hB0gi(h). B0gi is comprised of

multiple blocks, each comprised of contiguous peak-candidate bins. Among the blocks, we merge

two nearby ones, for example, if B0gij and B0gi,j+1 satisfy minB0gi,j+1−maxB0gij 6 γ(= 5). We also

remove short blocks of length less than three to obtain the final peak-candidate bins {Bgij}Jgij=1.

Finally, we pick the bin bgij that maximizes its within-block intensities. We refer to them as peaks

j = 1, . . . , Jgi for lane i = 1, . . . , Ng on gel g = 1 . . . , G.

The true and false peak detection rates are determined by several factors including the half-

peak width h, the minimum peak elevation C0, the true intensities at each bin and the measure-

ment errors inherent in autoradiography. We calibrate the first two parameters so that 1) the

reference lanes have exactly 7 detected peaks (perfect observed sensitivity and specificity), and

2) the actin peaks stand out clearly. For example, the middle panel of Figure 1(b) shows the

Page 3: Estimating AutoAntibody Signatures to Detect Autoimmune ... · 2 Z. Wu and others i. Smoothing. For each sample lane, smooth the raw intensity data by LOESS, with a span h= 0.022

Supplementary Materials to Wu et al. 2017 3

result of peak detection by blue asterisks for one set of the gels (Section 4.3), where the peaks

that are slightly higher than the neighboring intensities are effectively captured, most notably for

lanes 5,10,15 where the small actin peaks are identified. Note that, we have reduced the impact of

measurement noise on peak detection by computing the local difference scores from the smoothed

data rather than the raw data. In our analyses, we have chosen the minimum peak amplitude

parameter C0 = 0.01 that is of higher order than the noise level obtained from LOESS smoothing.

Alternative approaches to peak detection include random process modeling (e.g., Carlson and

others, 2015), multiplicity adjustment after local maxima hunting (e.g., Schwartzman and others,

2011) and filtering methods (e.g., Du and others, 2006). From our experience, they are designed

and hence more suitable for data with appreciably higher noise levels; our data show much lower

noise level in the measured autoradiographic intensities. For example, in random process models

that are motivated by the analysis of pulsatile, or episodic time series data, the unknown locations

of peaks and the observed intensity values are modeled by double stochastic processes, such as

Cox processes, to fit the continuous intensity data (e.g., Carlson and others, 2015). However,

because a minor peak has a narrow span comprised of few bins, its presence or absence is more

uncertain than major peaks under random process models. In addition, to detect peaks via the

random process models for hundreds of samples and hundreds of dimensions per sample, the

iterative MCMC model fitting algorithm could be computationally expensive.

Appendix S3. Reference Alignment via Piecewise Linear Dewarping

Let (t = (t1, . . . , tB),Mg = {Mgib}) represent the high-frequency intensity data for a query gel

g and (t,Mg0 = {Mg0ib}) the data for a template gel g0. We describe a procedure to align gel g

towards g0 using piecewise linear dewarping (Uchida and Sakoe, 2001). Let κg represent the set of

knots comprised of two endpoints κg1 = ν0, κgR = νL+1 and the reference peaks {κg2, . . . , κg,R−1}

on gel g. The knot κgj corresponds to a bin defined by {bgj : tbgj = κgj}, j = 1, . . . , R. For the

Page 4: Estimating AutoAntibody Signatures to Detect Autoimmune ... · 2 Z. Wu and others i. Smoothing. For each sample lane, smooth the raw intensity data by LOESS, with a span h= 0.022

4 Z. Wu and others

template gel g0, we similarly denote the knots and the bins by κg0 and bg0j , j = 1, . . . , R. Note

that κg and κg0 need to have identical cardinality R. In our application, R = 9: seven reference

peaks plus two endpoints.

In two steps, we define a piecewise linear function Wg(·; g0) for reference alignment. The

function first matches the query knots κg to the template knots κg0 and then linearly stretches

or compresses gel g between the matched knots. It finds a bin number Wg(b; g0) in the query gel

g and match it to bin b in the template gel g0. For example, we will have Wg(bg0j ; g0) = bgj i.e.,

the j-th knots in both gels are exactly matched.

For the bin b = 1, . . . , B on the template gel g0:

1. Find the nearest knot to its left: {j = j(b; g0) ∈ {1, . . . , R− 1} : κg0j 6 tb < κg0,j+1}.

2. Find the bin in the query gel g to be matched by bin b by

W(j)g (·; g0) : b 7→ bw · bg,j+1 + (1− w)bgjc, (A1)

where w = w(b; g0) = (b− bg0j)/

(bg0,j+1 − bg0j) , and bac is the largest integer smaller

than or equal to a.

We thus obtain the piecewise linear dewarping function

Wg(·; g0) : b 7→R−1∑j=1

W(j)g (b; g0)1{κg0j6tb<κg0,j+1}, b = 1, . . . , B,

that aligns the query gel g to the template gel g0 (g 6= g0).

Figure S1 illustrates the results before and after the piecewise linear dewarping by Wg(·; g0).

The piecewise linear dewarping automatically matches the reference peaks from multiple gels to

facilitate cross-gel band comparisons.

Page 5: Estimating AutoAntibody Signatures to Detect Autoimmune ... · 2 Z. Wu and others i. Smoothing. For each sample lane, smooth the raw intensity data by LOESS, with a span h= 0.022

Supplementary Materials to Wu et al. 2017 5

Appendix S4. Details on Shrinkage Prior for Hyperparameters {σ−2gs }

For the hyperpriors on the smoothing parameters {τ2gs = σ−2gs }, s = 2, . . . , Tν − 1, g = 1, . . . , G,

we specify a two-component mixture distribution with one component favoring small and the

other favoring large values (Morrissey and others, 2011):

τ2gs ∼ ξgsGamma(· | aτ , bτ ) + (1− ξgs)InvPareto(· | a′τ , b′τ ), (A2)

InvPareto(τ ; a, b) =a

b

(τb

)a−1, a > 0, 0 < τ < b, (A3)

where the Gamma-distributed component (aτ = 3, bτ = 2) concentrates near smaller values while

the inverse-Pareto component prefers larger values (a′τ = 1.5, b′τ = 400).

We designed the mixture priors for τ2gs so that given s, βgst, t = 1, . . . , Tu are similar or

discrepant according as τ2gs being large or small. The random smoothness indicator ξs represents

different (1) or the same amount of (0) warping across lanes. Let ξgs ∼ Bernoulli(ρg) with suc-

cess probability ρg. We specify a hyperprior ρg ∼ Beta(aρ, bρ) to let data inform the degree of

smoothness. In our application, we use aρ = bρ = 1 so that the prior has a mean of 1/2 that

assigns equal prior probabilities to all the submodels defined by (ξg2, . . . , ξg,Tν−1) ∈ {0, 1}Tν−1.

We specify the hyperprior for the smoothing parameter τ2g1 = σ−2g1d∼ InvPareto(· | a′τ , b′τ ). We

also fix the measurement error variance σε = ∆/3 where ∆ is the minimum distance among grid

points {ν`} in the standardized scale. These parameters are chosen to constrain the shape of Sg

and are shown to have good dewarping performances, e.g., aligning all the actin peaks to a single

landmark (maximum a posteriori).

Appendix S5. Details on Posterior Sampling Algorithm

We sample from the joint posterior by the following algorithm:

1. Update peak-to-landmark indicators Zgij for peak j = 1, . . . , Jgi, lane i = 1, . . . , Ng and

Page 6: Estimating AutoAntibody Signatures to Detect Autoimmune ... · 2 Z. Wu and others i. Smoothing. For each sample lane, smooth the raw intensity data by LOESS, with a span h= 0.022

6 Z. Wu and others

gel g = 1, . . . , G by categorical distribution

P(Zgij = ` | others) ∝ N(Tgij ;Sg(ui, ν`;βg), σ2

ε

){1− exp(−λ∗` )} , (A4)

for ` = 1, . . . , L that satisfy the support constraint |ν` − Tgij | < A0 and order restriction

Zgij < Zgi,j+1, j = 1, . . . , Jgi − 1.

2. Update the B-spline basis coefficients βg = {βgst} for gel g = 1, . . . , G.

To establish notation, let ∆1 be the first order difference operator of dimension (Tν − 2)×

Tν − 1 with entries ∆1kk′ = δ(k + 1, k′) − δ(k, k′) and δ(k, k′) equals 1 if k = k′ and 0

otherwise; Similarly we define ∆2 with Tν replaced by Tu+1. The random walk priors (2.6)

and (2.7) can then be written as

βg[−Tν ]1d∼ NTν−1(·;βid

[−Tν ], σ−2g1 ∆′1∆1)1{ν0 = βg11 < . . . < βg,Tν−1,1 < νL+1},

and

βgs·d∼ NTu(·; 0, σ−2gs ∆′2∆2)1{ν0 = βg1t < βg,s−1,t < βgst < νL+1,∀t > 2}, s = 2, . . . , Tν − 1.

Although both ∆1 and ∆2 are rank deficient, we will show that the conditional posterior

for βg is proper under a scattering condition.

We now write the likelihood (2.2) of the Pg =∑Ngi=1 Jgi observed peaks Tg. Let vec[βg] be

a vector of length TνTu that stacks the columns of βg. Let Bg be a Pg by TνTu matrix;

the i-th row is defined by the Kronecker product of two sets of B-spline basis functions

evaluated at (νZgij , ugi): Bg2(ugi)′ ⊗Bg1(νZgij )

′.

Update the B-spline basis coefficients βg = {βgst} for gel g = 1, . . . , G by the multivariate

Gaussian distribution (with constraints (2.4) and (2.5))

[vec[βg] | others] ∝ exp

(− 1

σ2ε

∣∣∣∣∣∣∣∣Tg −Bgvec[βg]

∣∣∣∣∣∣∣∣22

)

Page 7: Estimating AutoAntibody Signatures to Detect Autoimmune ... · 2 Z. Wu and others i. Smoothing. For each sample lane, smooth the raw intensity data by LOESS, with a span h= 0.022

Supplementary Materials to Wu et al. 2017 7

× exp

(− 1

σ2g1

∣∣∣∣∣∣∣∣∆aug1 βid,aug −∆aug

1 vec[βg]

∣∣∣∣∣∣∣∣22

)

× exp

(−Tν−1∑s=2

1

σ2gs

∣∣∣∣∣∣∣∣∆aug2s vec[βg]

∣∣∣∣∣∣∣∣22

),

where βid,aug is a column vector formed by stacking βid for Tu times hence is of length

TνTu. Here ∆aug1 is an augmented first-order difference operator applied to a vector of

length TνTu. It augments ∆1 to [∆1 | 0(Tν−2)×(TuTν−Tν+1)]. Similarly, we define a first-

order difference matrix ∆aug2s that augments ∆2 to a (Tu− 1)×TuTν matrix; we replace the

(s, s+ Tν , . . . , s+ (Tu − 1)Tν)-th columns 0(Tu−1)×TνTu by the columns of ∆2.

The conditional distribution above then simplifies to a multivariate Gaussian distribution

with a mean vector

Λ−1g

{σ−2ε B′gTg + σ−2g1 ∆aug′

1 ∆aug1 βid,aug

}where the precision matrix Λg = σ−2ε B′gBg + σ−2g1 ∆aug′

1 ∆aug1 +

∑Tν−1s=2 σ−2gs ∆aug′

2s ∆aug2s .

The matrix B′gBg is of full rank when the observed peaks are well scattered across lanes

and along the gels. Let (cs0 , cs0+1) be the support of the s0-th B-spline basis B1s0(·) in

the ν-direction. Bg can be rank deficient, for example, when no peak appears in between

(cs0 , cs0+1). That is, we have νZgij /∈ (cs0 , cs0+1) and Bg1s0(νZgij ) = 0, for i = 1, . . . , Ng,

j = 1, . . . , Jgi. By the definition of Bg, it will have constant zeros in the columns s0,

s0 + Tν ,. . . , and s0 + (Tν − 1)Tu. As a result, the marginal posterior of {βgs0t}Tut=1 can only

be learned indirectly through its neighboring coefficients via random walk priors (2.6) and

(2.7). Rank deficiency also occurs if multiple neighboring lanes have no observed peaks.

Given sparse peaks, Bg can be made full rank by reducing the number of basis functions.

However, the warping function Sg will be less flexible. We therefore recommend a minimal

number of basis functions necessary to ensure parameter identifiability provided the family

of warping functions Sg is flexible enough to characterize the image deformations (e.g.,

curvature of the actin peaks). We refer to the condition that Bg being full rank as the

Page 8: Estimating AutoAntibody Signatures to Detect Autoimmune ... · 2 Z. Wu and others i. Smoothing. For each sample lane, smooth the raw intensity data by LOESS, with a span h= 0.022

8 Z. Wu and others

scattering condition.

In our application, failure of the scattering condition is rare. Λg is then a sum of positive

definite matrix and semi-definite matrices and hence is invertable. Also recall that B′gBg is

sparse as constructed from sparse B-spline bases. Because ∆aug′

1 ∆aug1 and

∑Tν−1s=2 σ−2gs ∆aug′

2s ∆aug2s

are both sparse square matrices with at most O(TνTu) nonzeros, Λg preserves the sparsity

of B′gBg. So we use sparse Cholesky factorization of Λg to produce its Cholesky factors.

We first block update {βgst}Tut=1 for s = 2 from [βgst | βgj1j2 , j1 6= s, others] with the

constraint βgst > βg,s−1,t, t = 1, . . . , Tu and continue for s = 3, . . . , Tν − 1. This step

requires calculation of inverse of submatrices of Λ−1g for Tν − 2 times. In our application,

computing one such inverse when Tν = 10 and Tu = 6 requires < 0.001 seconds.

3. Update the smoothing parameters τ2gs = σ−2gs and smoothness selection indicator ξgs (Ap-

pendix S4). First randomly switch ξgs to ξ∗gs either from 0 to 1, or 1 to 0 for s = 2, . . . , Tν−1,

g = 1, . . . , G. For the parameter τ2gs, we propose its candidate τ∗2gs from the log-normal dis-

tribution with log-mean parameter τ2gs. We accept (τ∗2gs , ξ∗gs) with probability

α(1)g = min

{1,p(Tg;Zg,βg, τ

∗2gs )π(τ∗2gs | ξ∗gs)q(τ2gs | τ∗2gs )q(ξgs | ξ∗gs)

p(Tg;Zg,βg, τ2gs)π(τ2gs | ξgs)q(τ∗2gs | τ2gs)q(ξ∗gs | ξgs)

}, (A5)

where p(Tg;Zg,βg, τ∗2gs ) denotes the Gaussian likelihood (2.2) and π(τ2gs | ξgs) is the prior

distribution (A2).

We update τ2gs again because it is continuous and therefore has a much bigger parameter

space than that of discrete parameter. Using random walk Metroplis-within-Gibbs algo-

rithm, we propose τ∗2gs from the log-normal distribution with log-mean parameter τ2gs and

accept with probability

α(2)g = min

{1,p(Tg;Zg,βg, τ

∗2gs )π(τ∗2gs | ξ∗gs)q(τ2gs | τ∗2gs )

p(Tg;Zg,βg, τ2gs)π(τ2gs | ξgs)q(τ∗2gs | τ2gs)

}. (A6)

4. Update the smoothing parameter in the ν-direction τ2g1 = σ−2g1 , g = 1, . . . , G by proposing

Page 9: Estimating AutoAntibody Signatures to Detect Autoimmune ... · 2 Z. Wu and others i. Smoothing. For each sample lane, smooth the raw intensity data by LOESS, with a span h= 0.022

Supplementary Materials to Wu et al. 2017 9

its candidate τ2g1 from the log-normal distribution with log-mean parameter τ2g1. We accept

the proposal with probability min

{1,

NTν−1({βgs1}Tν−1s=1 ;0,τ∗2g1 I)q(τ

2g1|τ

∗2g1 )

NTν−1({βgs1}Tν−1s=1 ;0,τ2

g1I)q(τ∗2g1 |τ2g1)

}.

5. Update the smoothness selection hyperparameter ρg (Appendix S4), for g = 1, . . . , G from

[ρg | others] ∼ Beta(aρ +∑s

1{ξgs = 1}, bρ +∑s

1{ξgs = 0}). (A7)

We calibrate the scale of the proposals in Step 3-4 at the burn-in period of the MCMC to achieve

an acceptance rate between 30% and 70%. All the posterior analyses were based on three chains of

10,000 iterations with a burn-in period of 5,000 iterations. We monitor the MCMC convergence via

chain histories (visual inspection of good mixing), kernel density plots (unimodal for continuous

parameters) and Brooks-Gelman-Rubin statistic R (Brooks and Gelman, 1998) (three chains with

random starting values; we used the criterion R < 1.01 to declare posterior convergence for every

model parameter). Convergence is fast within thousands of burn-in iterations.

Page 10: Estimating AutoAntibody Signatures to Detect Autoimmune ... · 2 Z. Wu and others i. Smoothing. For each sample lane, smooth the raw intensity data by LOESS, with a span h= 0.022

10 Z. Wu and others

Appendix Figures

Molecular Weight (kDa)

Inte

nsity

21.531456697116200

1.2

1

0.8

0.6

0.4

0.2

0

0.2

0.4

0.6

0.8

1

1.2Refence Lane for Gel Set 1

Refence Lane for Gel Set 4

Gel 1 aligned towards Gel Set 4

Figure S1. Before (bottom) and after (top) piecewise linear dewarping towards Gel 4 (Section 4.3).Top: 21 intensity curves, 20 solid curves from one GEA experiement (g = 1) after reference alignment;one dashed curve for the reference lane in gel g0 = 4. The two curves not in purple denote the Lane 1intensities from the two gels and are aligned. Bottom: The same 21 intensity curves without referencealignment. The reference lanes (non-purple ones) are mismatched.

Page 11: Estimating AutoAntibody Signatures to Detect Autoimmune ... · 2 Z. Wu and others i. Smoothing. For each sample lane, smooth the raw intensity data by LOESS, with a span h= 0.022

Supplementary Materials to Wu et al. 2017 11

● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●

● ● ● ●

● ● ● ● ● ● ●

● ● ● ●

● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●

● ● ● ● ●

● ● ● ●

● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ●

● ● ● ● ●

● ● ● ● ●

● ● ● ● ● ●

● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ●

Ser

um S

ampl

e La

ne

Landmark #

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

23456789

1011121314151617181920

23456789

1011121314151617181920

200

116

97

66

45

31

21.5

200

116

97

66

45

31

21.5

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 1001 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99 101

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 1001 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99 101

Long

Exp

osur

eSh

ort E

xpos

ure

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 1001 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99 101

Figure S2. Bayesian spatial dewarping results for the experiment with replicates (short/long exposure).Top: For each gel set, 19 serum lanes over a grid of L = 100 interior landmarks (reference lanes excluded).

Each detected peak Tgij (solid blue dots “•”) is connected to its maximum a posteriori landmark Zgij

(red triangle “∆”). The image deformations are shown by the bundle of black vertical curves {u 7→Sg(ν`, u) : u = 2, . . . , 20}, ` = 1, . . . , L, g = 1, 2, each of which connects the estimated locations ofidentical molecular weight. Bottom: Marginal posterior probability of each landmark protein present ina sample.

Page 12: Estimating AutoAntibody Signatures to Detect Autoimmune ... · 2 Z. Wu and others i. Smoothing. For each sample lane, smooth the raw intensity data by LOESS, with a span h= 0.022

12 REFERENCES

Pai

r.1

Pai

r.1

Pai

r.8

Pai

r.8

Pai

r.11

Pai

r.11

Pai

r.4

Pai

r.6 Pai

r.19

Pai

r.20

Pai

r.2

Pai

r.19

Pai

r.2

Pai

r.20

Pai

r.16

Pai

r.16

Pai

r.12

Pai

r.12

Pai

r.5

Pai

r.7

Pai

r.14

Pai

r.14

Pai

r.9

Pai

r.9

Pai

r.13

Pai

r.13

Pai

r.15

Pai

r.5

Pai

r.7

Pai

r.15

Pai

r.17

Pai

r.17

Pai

r.10

Pai

r.10

Pai

r.4

Pai

r.6

Pai

r.3

Pai

r.3

Pai

r.18

Pai

r.18

0.0

0.5

1.0

1.5

2.0 With Processing

Cluster method: completeDistance: cosine

Hei

ght

100100 10093 100100 92 5991 8895 10010090 6294 94 989386 97 9595947794 93 79100 63 65947459

74

7575

100

au

100100 10092 100100 87 4696 8884 9910088 3072 62 9293100 76 7774768978 68 12100 24 24668971

20

2020

100

bp

12 34 56 7 89 1011 121314 1516 17 181920 21 2223242526 27 2829 30 31323334

35

3637

38

edge #

Pai

r.1

Pai

r.1

Pai

r.19

Pai

r.2

Pai

r.2

Pai

r.20

Pai

r.20

Pai

r.8

Pai

r.8

Pai

r.11

Pai

r.11

Pai

r.4

Pai

r.3

Pai

r.6

Pai

r.9

Pai

r.14

Pai

r.9

Pai

r.14

Pai

r.16

Pai

r.5

Pai

r.7

Pai

r.12

Pai

r.16

Pai

r.15

Pai

r.5

Pai

r.7

Pai

r.12

Pai

r.17

Pai

r.17

Pai

r.10

Pai

r.10

Pai

r.3

Pai

r.4

Pai

r.6

Pai

r.13

Pai

r.13

Pai

r.15

Pai

r.19

Pai

r.18

Pai

r.18

0.0

0.5

1.0

1.5

2.0

No Proprocessing

Hei

ght

100 9667 92 7369 97 978361 100 99 7477 7162 8666 8679 81838281 8569 8493 8592 85 9575

93

95100 95

100

au

100 9462 87 3466 89 867368 100 100 6154 5650 6175 6292 421004876 4044 2547 1454 42 1824

10

28100 28

100

bp

1 23 4 56 7 8910 11 12 1314 1516 1718 1920 21222324 2526 2728 2930 31 3233

34

3536 37

38

edge #

Figure S3. Estimated dendrograms with (top) and without (bottom) pre-processing for the replicationexperiment. The red boxes show the subtree with > 95% confidence levels. The actual confidence levelsare shown in red on top of the subtrees. The numbers below the edges denote the edge numbers.

References

Brooks, Stephen P and Gelman, Andrew. (1998). General methods for monitoring conver-

gence of iterative simulations. Journal of computational and graphical statistics 7(4), 434–455.

Carlson, Nichole E, Grunwald, Gary K and Johnson, Timothy D. (2015). Using

cox cluster processes to model latent pulse location patterns in hormone concentration data.

Biostatistics, kxv046.

Du, Pan, Kibbe, Warren A and Lin, Simon M. (2006). Improved peak detection in mass

Page 13: Estimating AutoAntibody Signatures to Detect Autoimmune ... · 2 Z. Wu and others i. Smoothing. For each sample lane, smooth the raw intensity data by LOESS, with a span h= 0.022

REFERENCES 13

spectrum by incorporating continuous wavelet transform-based pattern matching. Bioinfor-

matics 22(17), 2059–2065.

Morrissey, Edward R, Juarez, Miguel A, Denby, Katherine J and Burroughs,

Nigel J. (2011). Inferring the time-invariant topology of a nonlinear sparse gene regulatory

network using fully bayesian spline autoregression. Biostatistics 12(4), 682–694.

Schwartzman, Armin, Gavrilov, Yulia and Adler, Robert J. (2011). Multiple testing

of local maxima for detection of peaks in 1d. Annals of statistics 39(6), 3290.

Uchida, Seiichi and Sakoe, Hiroaki. (2001). Piecewise linear two-dimensional warping.

Systems and Computers in Japan 32(12), 1–9.