high predictive accuracy of an unbiased proteomic profile ......high predictive accuracy of an...

10
High Predictive Accuracy of an Unbiased Proteomic Profile for Sustained Virologic Response in Chronic Hepatitis C Patients Keyur Patel, 1 Joseph E. Lucas, 2 * J. Will Thompson, 2 * Laura G. Dubois, 2 Hans L. Tillmann, 1 Alexander J. Thompson, 1 Diane Uzarski, 1,3 Robert M. Califf, 3 Martin A. Moseley, 2 Geoffrey S. Ginsburg, 2 John G. McHutchison, 1 and Jeanette J. McCarthy, 1 for the MURDOCK Horizon 1 Study Team Chronic hepatitis C (CHC) infection is a leading cause of endstage liver disease. Current standard-of-care (SOC) interferon-based therapy results in sustained virological response (SVR) in only one-half of patients, and is associated with significant side effects. Accurate host predictors of virologic response are needed to individualize treatment regimens. We applied a label-free liquid chromatography mass spectrometry (LC-MS)-based proteomics discovery platform to pretreatment sera from a well-characterized and matched training cohort of 55 CHC patients, and an independent validation set of 41 CHC genotype 1 patients with characterized IL28B genotype. Accurate mass and retention time methods aligned samples to generate quantitative peptide data, with predictive modeling using Bayes- ian sparse latent factor regression. We identified 105 proteins of interest with two or more peptides, and a total of 3,768 peptides. Regression modeling selected three identified meta- proteins, vitamin D binding protein, alpha 2 HS glycoprotein, and Complement C5, with a high predictive area under the receiver operator characteristic curve (AUROC) of 0.90 for SVR in the training cohort. A model averaging approach for identified peptides resulted in an AUROC of 0.86 in the validation cohort, and correctly identified virologic response in 71% of patients without the favorable IL28B ‘‘responder’’ genotype. Conclusion: Our prelim- inary data indicate that a serum-based protein signature can accurately predict treatment response to current SOC in most CHC patients. (HEPATOLOGY 2011;53:1809-1818) C hronic hepatitis C infection (CHC) is a global health problem, with an estimated 120 to 130 million chronic hepatitis C virus (HCV) car- riers worldwide. Current standard-of-care (SOC) treat- ment with pegylated interferon (PEG-IFN) in combi- nation with ribavirin is costly, associated with significant side effects, and results in sustained virolog- ical response (SVR) in only about one-half of treated subjects in controlled clinical trial settings with care- fully selected patients. In standard clinical practice, comorbid conditions and potential adverse events result in frequent dosage adjustments, reduced compli- ance, and early withdrawal from treatment, further reducing SVR. 1 Identifying host-viral factors that predict the likeli- hood of SVR prior to initiating therapy would be a very useful clinical tool that could help reduce costs and avoid unnecessary exposure to therapy with signifi- cant side effects. Data from clinical registration trials have identified genotype and pretreatment HCV RNA Abbreviations:: AUROC, area under the receiver operator characteristic curve; CHC, chronic hepatitis C; HCC, hepatocellular carcinoma; HCV, hepatitis C virus; LC-MS, liquid chromatography mass spectrometry; PEG-IFN, pegylated interferon; SOC, standard-of-care; SVR, sustained virological response. From the 1 Duke Clinical Research Institute, 2 Duke Institute for Genome Sciences & Policy, 3 Duke Translational Medicine Institute, Duke University, Durham, NC. Received November 17, 2010; accepted February 21, 2011. This study was funded in part by a generous grant from the David H Murdock Institute for Business and Culture; this project was also supported in part by CTSA Grant No.1 UL1 RR024128-01 from NCCR and NIH Roadmap for Medical Research. *These authors contributed equally to this work. Address reprint requests to: K. Patel, M.D., Duke Clinical Research Institute, PO Box 17969, Durham, NC 27715. E-mail: [email protected]; fax: (919) 668-7164. Copyright V C 2011 by the American Association for the Study of Liver Diseases. View this article online at wileyonlinelibrary.com. DOI 10.1002/hep.24284 Disclosures: A.T. and J.G.M. are coinventors of a patent relating to the IL- 28B discovery. Additional Supporting Information may be found in the online version of this article. 1809

Upload: others

Post on 29-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: High Predictive Accuracy of an Unbiased Proteomic Profile ......High Predictive Accuracy of an Unbiased Proteomic Profile for Sustained Virologic Response in Chronic Hepatitis C

High Predictive Accuracy of an Unbiased ProteomicProfile for Sustained Virologic Response in Chronic

Hepatitis C PatientsKeyur Patel,1 Joseph E. Lucas,2* J. Will Thompson,2* Laura G. Dubois,2 Hans L. Tillmann,1

Alexander J. Thompson,1 Diane Uzarski,1,3 Robert M. Califf,3 Martin A. Moseley,2 Geoffrey S. Ginsburg,2

John G. McHutchison,1 and Jeanette J. McCarthy,1 for the MURDOCK Horizon 1 Study Team

Chronic hepatitis C (CHC) infection is a leading cause of endstage liver disease. Currentstandard-of-care (SOC) interferon-based therapy results in sustained virological response(SVR) in only one-half of patients, and is associated with significant side effects. Accuratehost predictors of virologic response are needed to individualize treatment regimens. Weapplied a label-free liquid chromatography mass spectrometry (LC-MS)-based proteomicsdiscovery platform to pretreatment sera from a well-characterized and matched trainingcohort of 55 CHC patients, and an independent validation set of 41 CHC genotype 1patients with characterized IL28B genotype. Accurate mass and retention time methodsaligned samples to generate quantitative peptide data, with predictive modeling using Bayes-ian sparse latent factor regression. We identified 105 proteins of interest with two or morepeptides, and a total of 3,768 peptides. Regression modeling selected three identified meta-proteins, vitamin D binding protein, alpha 2 HS glycoprotein, and Complement C5, with ahigh predictive area under the receiver operator characteristic curve (AUROC) of 0.90 forSVR in the training cohort. A model averaging approach for identified peptides resulted inan AUROC of 0.86 in the validation cohort, and correctly identified virologic response in71% of patients without the favorable IL28B ‘‘responder’’ genotype.Conclusion:Our prelim-inary data indicate that a serum-based protein signature can accurately predict treatmentresponse to current SOC inmost CHC patients. (HEPATOLOGY 2011;53:1809-1818)

Chronic hepatitis C infection (CHC) is a globalhealth problem, with an estimated 120 to 130million chronic hepatitis C virus (HCV) car-

riers worldwide. Current standard-of-care (SOC) treat-ment with pegylated interferon (PEG-IFN) in combi-nation with ribavirin is costly, associated withsignificant side effects, and results in sustained virolog-ical response (SVR) in only about one-half of treatedsubjects in controlled clinical trial settings with care-fully selected patients. In standard clinical practice,comorbid conditions and potential adverse eventsresult in frequent dosage adjustments, reduced compli-ance, and early withdrawal from treatment, furtherreducing SVR.1

Identifying host-viral factors that predict the likeli-hood of SVR prior to initiating therapy would be avery useful clinical tool that could help reduce costsand avoid unnecessary exposure to therapy with signifi-cant side effects. Data from clinical registration trialshave identified genotype and pretreatment HCV RNA

Abbreviations:: AUROC, area under the receiver operator characteristic curve;CHC, chronic hepatitis C; HCC, hepatocellular carcinoma; HCV, hepatitis Cvirus; LC-MS, liquid chromatography mass spectrometry; PEG-IFN, pegylatedinterferon; SOC, standard-of-care; SVR, sustained virological response.From the 1Duke Clinical Research Institute, 2Duke Institute for Genome

Sciences & Policy, 3Duke Translational Medicine Institute, Duke University,Durham, NC.Received November 17, 2010; accepted February 21, 2011.This study was funded in part by a generous grant from the David H

Murdock Institute for Business and Culture; this project was also supported inpart by CTSA Grant No.1 UL1 RR024128-01 from NCCR and NIHRoadmap for Medical Research.*These authors contributed equally to this work.Address reprint requests to: K. Patel, M.D., Duke Clinical Research Institute,

PO Box 17969, Durham, NC 27715. E-mail: [email protected]; fax:(919) 668-7164.CopyrightVC 2011 by the American Association for the Study of Liver Diseases.View this article online at wileyonlinelibrary.com.DOI 10.1002/hep.24284Disclosures: A.T. and J.G.M. are coinventors of a patent relating to the IL-

28B discovery.Additional Supporting Information may be found in the online version of

this article.

1809

Page 2: High Predictive Accuracy of an Unbiased Proteomic Profile ......High Predictive Accuracy of an Unbiased Proteomic Profile for Sustained Virologic Response in Chronic Hepatitis C

levels as the most significant viral predictors of SVR.2

Baseline host characteristics that influence virologicresponse include race, gender, age, disease stage, insu-lin resistance, and body weight, but are unable toaccurately predict SVR. Pretreatment serum biochemi-cal markers that are often obtained routinely, such asliver transaminases, total cholesterol and c-glutamyl-transferase, are also unreliable predictors of SVR.3 As aresult, predictive models for SVR that are based onpretreatment host-viral characteristics have not beendeveloped for routine clinical use. Recent data indicatethat a single nucleotide polymorphism in the hostIL28B region on chromosome 19q13 identified inone-third of HCV genotype 1-infected patients is alsostrongly associated with SVR.4 The allele frequency forthis SNP varies with genetic ancestry and, for example,the response variant is present in 15% of African-Americans, who have the lowest SVR rates of any eth-nic group to current SOC therapy. Thus, alternativeand more reliable baseline predictors of virologicresponse are still required for most CHC patients.Most proteomic studies in HCV to date have used

unbiased proteomic profiling to identify novel markersof HCV-related fibrogenesis or identify pathways of he-patocellular carcinoma (HCC) development. A priorstudy evaluating treatment response in CHC patientsusing surface-enhanced laser desorption/ionization /time-of-flight mass spectrometry (SELDI-ToF/MS) Pro-teinChip technology noted that two ‘‘peaks’’ of interestin combination with fibrosis stage and viral genotypecould predict treatment response in 81% of a verificationcohort with an area under the receiver operator charac-teristic curve (AUROC) of 0.75.5 However, these protein‘‘peaks’’ have not yet been further characterized or vali-dated in other CHC cohorts, and as such are unlikely tohave further clinical application. In this study we used anunbiased proteomic discovery platform with high resolu-tion, accurate mass liquid chromatography mass spec-trometry (LC-MS) to analyze pretreatment sera from awell-characterized training cohort of 55 CHC patients,and validation in a further 41 CHC genotype 1 patientswith characterized IL28B genotype. We identified novelsignature pathways of derived metaproteins that can pre-dict SVR with an AUROC of 0.86-0.90 before com-mencing SOC antiviral therapy.

Patients and Methods

Patient Population. Patients were selected from theDuke Hepatology Clinical Research (DHCR) database,

an ongoing biorepository of greater than 3,000 HCV-infected patients initiated in 2002. Only sera fromselected CHC patients with available treatment historyand demographic data were included in this study.SVR was defined as undetectable HCV RNA at 6months following end-of-treatment with current SOCtherapy.Pretreatment serum samples from 55 CHC patients

in three different phenotypic groups were selected asthe initial ‘‘test’’ group for this study: 19 nonresponderpatients with HCV genotype 1; 17 SVR patients withHCV genotype 1; and 20 SVR patients with HCV ge-notype 2/3. An additional 41 genotype 1 CHCpatients were selected for the validation study (n ¼ 26responders and n ¼ 15 nonresponders). Respondersand nonresponders were carefully matched as much aspossible with respect to clinical and demographic vari-ables, such as age, race, gender, and viral load, knownto affect treatment response (Supporting Table 1). Allpatients provided written informed consent and allstudy procedures were approved by the Duke Univer-sity Institutional Review Board.Sample Preparation and LC-MS/MS Analysis. A

detailed sample preparation protocol and LC-MSmethods section are available in the Supporting Meth-ods section. Serum samples were statistically random-ized and immunodepleted using MARS14 columns inan LC format (Agilent Technologies, Santa Clara, CA)according to the manufacturer’s recommendations. Af-ter buffer exchange, protein concentration was normal-ized across all samples and �25 lg of protein fromeach sample was subjected to in-solution digestionwith trypsin. All samples were spiked with 50 fmolMassPREP ADH digestion standard per lg of totalprotein (Waters, Milford, MA). LC-MS/MS analysiswas carried out on a nanoAcquity liquid chromato-graph coupled to a QToF Premier mass spectrometer(Waters). The Rosetta Elucidator v. 3.3 software pack-age (Rosetta Biosoftware, Rosetta Inpharmatics, Seattle,WA) was used to import and align all LC-MS rawdata files and perform feature quantitation. Mascot v.2.2 (Matrix Sciences, Boston, MA) and ProteinlynxGlobal Server v. 2.4 (Waters) database search engineswere used to make peptide identifications. MS/MSidentifications were curated using a forward/reversedatabase search, at a false discovery rate of 1%. Usingthis data collection and informatics pipeline, the massspectrometer is used to measure expression for trypticpeptides and these peptides can be used to inferexpression of the parent proteins.IL28B Polymorphism. Genotyping was performed

with the Illumina Human610-quad BeadChip

1810 PATEL, LUCAS, THOMPSON, ET AL. HEPATOLOGY, June 2011

Page 3: High Predictive Accuracy of an Unbiased Proteomic Profile ......High Predictive Accuracy of an Unbiased Proteomic Profile for Sustained Virologic Response in Chronic Hepatitis C

(Illumina, San Diego, CA) as described.7 Genotype atthe polymorphic site rs12979860 on chromosome 19was suitable for analysis in 41 CHC patients. Geno-types were pooled for analysis (CC versus CT or TT)and we refer to an IL28B polymorphism, noting thatthe association SNP actually lies 3 kb upstream of theIL28B gene.Predictive Model Based on Meta-Proteins. To

build predictive models based on the proteomic data,we used a ‘‘metaprotein’’ classification approach whichaims to minimize the effects of several types of erroron a quantitative proteomic dataset, including incor-rect peptide identification and incorrect protein infer-ence due to tryptic peptide sequence identity. The cen-tral difference to standard approaches is that we usethe expression profile of a peptide or group of peptidesto assist its grouping with similarly quantified peptides,in addition to the traditional grouping by common‘‘parent’’ protein sequence. We generated 105 metapro-teins from which to build a model predictive ofresponse to therapy; however, we expect most of theseto be nonpredictive either individually or within a lin-ear model. We used shotgun stochastic search variableselection, as described,6 to obtain a set of parsimoni-ous model. This procedure is similar to penalizedregression schemes such as AIC or BIC, but allows forthe use of model averaging in prediction. A weightedaverage of the set of fitted models is then computed(with models demonstrating good fit weighted moreheavily). This model averaging scheme has been shownto outperform the single best model in predictive accu-racy because it more accurately estimates the uncer-tainty associated with model choice.7 As subjects wereselected to balance the most relevant clinical predictors(such as race, HCV RNA levels, and gender) we didnot use these in our predictive model. This type of fil-tered sample selection would likely distort the relation-ships between relevant clinical variables and our out-come, thereby likely decreasing the predictive accuracyof our model. We observed a difference in our predic-tor between responders and nonresponders of 0.4 onour training data, and the within-group standard devi-ation was observed to be 0.23. With this magnitude ofeffect size, and assuming 15 nonresponders and 26 res-ponders in our validation group, we expected a powerof >97% to detect a statistically significant difference(P < 0.01) in our predictor in the validation dataset.See Supporting Methods, Tables, and Figures for

additional information. MS/MS peptide identificationshave been uploaded to the PRIDE database (http://www.ebi.ac.uk/pride/), accession numbers 10679 and10680.

ResultsBaseline DemographicsResponders and nonresponders from both the train-

ing (n ¼ 55) and validation (n ¼ 41) cohorts ofCHC patients selected for this study were wellmatched in terms of host-viral characteristics and weretypical of a U.S. tertiary referral center CHC cohort(Supporting Table 1).

Training Cohort

Metaprotein-Based Response Prediction. The datacollection approach used in this study, specificallydata-independent MS/MS (or MSE), enables peptideidentification as well as high-quality label-free quanti-fication in the same analysis. Integrating databasesearches for single-dimension LC-MSE analyses of all55 subjects in the training cohort with traditional LC-MS/MS analyses of a pooled plasma sample, we iden-tified a total of 105 proteins with two or more pep-tides, with a total of 3,768 peptides (an average of 36peptides per protein). Latent factor modeling wasused to cluster identified peptides into 105 metapro-tein factors, which were then used as independent var-iables in a regression model with SVR as theoutcome.8

Three metaproteins were significantly associatedwith treatment response even after correction for mul-tiple hypotheses (Benjamini-Hochberg, alpha level0.05). These three included vitamin D binding pro-tein (VTDB; P ¼ 3.2 � 10�4), Alpha 2 HS glycopro-tein (FETUA; P ¼ 4.4 � 10�5), and ComplementC5 (CO5; P ¼ 4.9 � 10�5), and when combined ina probit regression model, these provided a combinedAUROC for SVR of 0.90. Inclusion of demographicvariables (female sex, African-American [AA] race, andHCV RNA) in our model increased the AUROC forSVR to 0.94, but this marginal incremental differencewas not significant (P ¼ 0.98) (Fig. 1). This lack ofimprovement after the inclusion of demographic vari-ables is likely due to subjects in the responder andnonresponder groups being matched for these varia-bles. The three metaprotein regression model (includ-ing demographic variables) for predicting SVR isgiven as:

Probit ðPÞ ¼ 1:22� 0:90ðVTDBÞ � 0:54ðFETUAÞþ 0:53ðCO5Þ � 0:02ðfemaleÞ � 1:5ðAAÞ� 0:95ðlog HCV RNAÞ

where P ¼ probability of SVR. The Probit function isthe inverse of the cumulative distribution function of a

HEPATOLOGY, Vol. 53, No. 6, 2011 PATEL, LUCAS, THOMPSON, ET AL. 1811

Page 4: High Predictive Accuracy of an Unbiased Proteomic Profile ......High Predictive Accuracy of an Unbiased Proteomic Profile for Sustained Virologic Response in Chronic Hepatitis C

standard normal distribution. The model without de-mographic variables is given as:

Logit ðPÞ ¼ 0:90� 0:87ðVTDBÞ � 0:58ðFETUAÞþ 0:50ðCO5Þ

We also carried out a separate analysis of these data,using clustering on a subset of the isotope groups

selected by false discovery rate. This analysis supportsan association between these three proteins and sus-tained viral response (see Supporting peptide levelanalysis). In summary, decreased levels of VTDB andFETUA and increased levels of CO5 were associatedwith SVR, representing the single best model. How-ever, for the purposes of prediction in our validationgroup, we use a model averaging approach, the resultsof which are presented below.

Fig. 1. (A,D,G) Heatmaps of the peptides in each of the most predictive factors. They are split by responders (labeled 1 on the right side ofthe white bar) and nonresponder (labeled 0 on the left side of the vertical white bar). Each of these sets of peptides make up a single factor,and the expression of those factors are shown in scatterplots (B,E,H). These scatterplots show the average expression of the peptides for thethree corresponding factors in the heatmap. (C,F,I,L) Histograms showing the makeup of the factor shown in that row. The majority of peptides la-beled ‘‘other’’ were unidentified by the proteomics analysis. (J) The overall performance of the predictor based on the VTDB, FETUA, and CO5 fac-tors. (K) The receiver operating characteristic curve for the predictor in (J) (in red), along with a predictor based on only clinical variables (black)and a predictor based on both factors and clinical variables (blue).

1812 PATEL, LUCAS, THOMPSON, ET AL. HEPATOLOGY, June 2011

Page 5: High Predictive Accuracy of an Unbiased Proteomic Profile ......High Predictive Accuracy of an Unbiased Proteomic Profile for Sustained Virologic Response in Chronic Hepatitis C

Validation Cohort

Metaprotein-Based Response Prediction. An inde-pendent sample preparation and LC-MS/MS data col-lection were performed on 41 additional subjects, andthese data were used as an initial verification of thepredictive power of the metaprotein model in ablinded fashion. In the independent analysis of the val-idation CHC cohort (n ¼ 41), 112 proteins wereidentified with two or more peptides, with a total of3,211 peptides. The sample complexity results in lessthan 100% reproducibility in the identification of pep-tides from sample-to-sample, making alignment of twoor more datasets challenging. For the purposes of pre-diction, we must rely on the subset of peptides thatare identified with the same modifications and in thesame charge state in both datasets. Figure 2 outlinesthe overall strategy by which the predictive model,built on the training cohort, is deployed for blindedprediction of a new cohort. The initial training datasetwas generated from 55 individuals, yielding a trainingmetaprotein model describing the dataset (Fig. 2A).The quantitation of 3,966 peptides from this dataset

are portrayed on the left side of the Venn diagram(Fig. 2B). A second proteomic expression dataset wascollected on n ¼ 41 blinded individuals (Fig. 2C),which independently contains expression data for3,358 peptides (Fig. 2D). The peptide identificationsin common between the two datasets, in this case2,051 peptides (Fig. 2E) are then used to weigh themetaprotein coefficients created in (A) based on whichpeptides have expression data available (Fig. 2F). Themodel is then used for independent treatment responseprediction for the test set (Fig. 2G). Importantly, pep-tides to the three significantly associated metaproteinsin the discovery cohort were identified in all samples.The MatLab codes for these analyses are provided inthe Supporting Material. It is important to note thatexpression data from the validation dataset is not usedto build the predictive model. Evaluation of our vali-dation cohort in this fashion, without setting a thresh-old level, resulted in a significant (t test P value 3.6 �10�5) differential between SVR and NR groups in thetest set. Setting the threshold is challenging due tobatch effects; however, at the optimal threshold thetest resulted in an AUROC of 0.86 and sensitivity andspecificity of 0.92 and 0.8, respectively (Fig. 3).IL28B Polymorphism. We evaluated the IL28B

rs12979860 genotype (CC versus non-CC) as a pre-dictor of SVR in this validation cohort and found itto have very high specificity, suggesting that carriers ofthe CC genotype have a high predicted probability ofSVR. The CC genotype was present in 20/41 (49%)of this validation cohort, with sensitivity and specificityfor SVR of 0.73 and 0.93, respectively. In comparison,our proteomic signature had slightly better sensitivity,but lower specificity than the IL28B genotype. Amongthe IL28B non-CC patients (n ¼ 21/41, 51%), identi-fied metaproteins were able to identify SVR in 4/7(57%) and NR in 11/14 (78%) of patients with anoverall accuracy of 71% (Table 1).Pathway Analysis. Ingenuity pathway analysis indi-

cates that the differentially regulated metaproteins areinvolved in multiple host regulatory pathways includ-ing innate and adaptive immune responses, pro- andantiinflammatory signaling, coagulation cascade, fibro-genesis, and hepatocyte regeneration. Several of thesemetaproteins share a common pathway related to IFNresponse, and corroborate the observed associationbetween IL28B signaling pathway and our responsesignature (Fig. 4). As the lower limit of quantitation inthe mass spectrometer is somewhat better than thelower limit at which a peptide can be identified, thereare some as-yet-unidentified peptides which make up aportion of the differentiating signal. These unidentified

Fig. 2. Initial training dataset was generated from 55 individuals,yielding a training metaprotein model describing the dataset (A). Thepeptide identifications available from these data are portrayed on theleft side of the Venn diagram (B). A second proteomic expressiondataset was collected on n ¼ 41 blinded individuals (C), of which asecond subset of peptides and proteins is independently identified(D). The overlap in peptide identifications (E) is then used to weightthe metaprotein model created in (A) based on which peptides haveexpression data available for use (F). This weighted model is thenused to independently predict treatment response for the test datasetin a blinded fashion (G). For the study described herein, the numberof peptide identifications for B,D,E were 1,915, 1,307, and 2,051respectively.

HEPATOLOGY, Vol. 53, No. 6, 2011 PATEL, LUCAS, THOMPSON, ET AL. 1813

Page 6: High Predictive Accuracy of an Unbiased Proteomic Profile ......High Predictive Accuracy of an Unbiased Proteomic Profile for Sustained Virologic Response in Chronic Hepatitis C

peptides are in the minority based on signal intensityfor the sample and in the predictive metaprotein signa-ture. Therefore, although in-depth profiling of thesamples to obtain further peptide identification is notstrictly necessary to improve upon the predictive powerof our current predictive model for SVR, these addi-tional peptides or proteins may provide improvedinsight into disease pathology.

Discussion

Using an unbiased proteomics discovery platform ina well-characterized cohort of CHC patients, we havedescribed a three-metaprotein signature that is able toaccurately predict sustained or virologic nonresponseprior to current SOC therapy in over 90% of patientsin the training cohort, 88% of our validation group,and in 71% of patients with the poor response variantsfor the IL28B polymorphism. This represents a signifi-cant advance over response predictors based on hostand viral characteristics derived from clinical registra-tion trials. Only one-half of CHC patients areexpected to achieve sustained viral clearance from se-rum with prolonged IFN-based therapy, which is asso-ciated with significant side effects, cost, and detrimen-tal effects on quality of life measures. Thus,approaches such as our three metaprotein algorithm,which provide an accurate prediction of antiviral effi-cacy, should be a useful adjunctive clinical tool in thetreatment decision-making process. This study isreported according to recent guidelines for scientificreporting of proteomic biomarker data,9 and representsthe first application of the state-of-the-art ‘‘bottom-up’’approach to unbiased platform differential proteomicexpression to predict therapeutic response in chronicHCV infection.

To date, most of the efforts to employ genomictechnologies to outcomes of HCV infection havefocused on genetic approaches or studies of targeted orgenome-wide gene expression.10 Proteomics has beenused less frequently, but has several advantages overother ‘‘omics’’ platforms. Genetics does not address thedynamics of disease process, and the level of messengerRNA (mRNA) expression does not account for poten-tial silencing of genes, for example, by methylation,and only partially correlates with protein expression.Gene expression profiling has been applied to CHCpatient samples such as liver or tumor tissue, mostly toidentify novel markers of HCV-related fibrogenesis, oridentify pathways of HCC development. Recent stud-ies have also evaluated hepatic gene expression in rela-tion to virologic responses to therapy11 and IL-28B.12

Unbiased protein profiling of liver tissue has beenapplied to identify pathways associated with CHC fi-brosis,13 and could improve the likelihood of identify-ing relevant, low-abundance proteins in relation tovirologic response. However, obtaining adequate livertissue samples is often difficult. Thus, recent proteomicprofiling studies in CHC infection have mostly used

Fig. 3. The plot on the leftshows the predicted scores of thevalidation samples and the ROCcurve shows overall performancefor selecting SVR.

Table 1. Performance of IL28B Genotype and ProteomicSignature in 41 HCV Genotype 1 Patients

Sensitivity Specificity SVR NR

IL28B genotype 0.73 0.93

rs12979860 CC þ 19 1

rs12979860 non-CC 7 14

Proteomic signature (PS) 0.77 0.80

PS þ 20 3

PS � 6 12

Proteomic signature among

rs12979860 non-CC

0.57 0.78

PS þ 4 3

PS � 3 11

rs12979860 CCþ represents the response predictive genotype.

1814 PATEL, LUCAS, THOMPSON, ET AL. HEPATOLOGY, June 2011

Page 7: High Predictive Accuracy of an Unbiased Proteomic Profile ......High Predictive Accuracy of an Unbiased Proteomic Profile for Sustained Virologic Response in Chronic Hepatitis C

serum or plasma samples to identify proteomic signa-tures of disease pathogenesis.14,15 Several mass spec-trometry-based proteomic discovery methods havebeen used to assess clinical outcomes in various diseasestates, including MALDI-ToF MS, SELDI-ToF, 2Dgel electrophoresis with qualitative LC-MS/MS, andqualitative and quantitative gel-free LC-MS. A disad-vantage of SELDI and MALDI-based approaches isthe difficulty in identifying and quantitating the differ-ential peaks of interest.16 A prior study evaluated pro-tein ‘‘peaks’’ using SELDI-TOF in relation to virologicresponse in a CHC training cohort that includedHCV genotypes 1-5. Two of six peaks were significantin the training set, and with fibrosis stage and HCVgenotype were able to predict viral response with anAUROC of 0.75.5 However, these protein ‘‘peaks’’have not been further identified or validated. In con-trast, our study used identified peptides and proteinprofiles from the training cohort to develop a predic-tive model that could be applied to the training set.Furthermore, our approach allows for development ofa targeted MS platform for verification in largerpatient cohorts. Our ‘‘bottom-up’’ approach uses enzy-matic digestion of the proteins to create peptide ‘‘sur-rogates’’ for the proteins, followed by the LC-MS anal-ysis of the peptide mixtures. Analysis of peptide‘‘surrogates’’ has several advantages over analyzing par-ent proteins. These peptides are small enough to bevery efficiently separated using liquid chromatographicseparations, and sequenced using tandem mass spec-trometry.17 This yields datasets in which several hun-dred thousand distinct features can be quantitated in a

few hours. In addition, with this ‘‘bottom-up’’approach, changes to individual protein epitopes, suchas posttranslational modifications, proteolytic cleavageevents, or splice variants can be probed. This approachmay have applicability to identification of peptide sur-rogates of disease progression or therapeutic responsesin other nonviral chronic liver diseases that rely on bi-opsy to assess clinical outcome measures.We identified three metaproteins of interest that

were able to predict virologic response to current SOCtherapy in 55 patients with an AUROC of 0.90, and0.94 when combined with demographic variables ofgender, race, and HCV RNA levels. In clinical trials,HCV genotype is the most important predictor ofvirologic response, with expected SVR rates 42%-46%in patients with HCV genotype 1 infection, andaround 80% in patients with HCV genotype 2 and 3infection. However, actual SVR rates in practice arelikely to be lower compared with those observed fromclinical registration trials with highly selected patientcohorts. Other host and viral factors, such as HCVRNA levels, race, gender, body weight, early stage dis-ease, younger age, and absence of insulin resistance arealso important variables in determining outcomes totherapy, but have a poor predictive value for SVR.Our cohort controlled for these demographic variables,but noted a predictive AUROC of only 0.69. Oncetreatment has commenced, adherence to therapy, andearly virologic responses in the first 4 to 12 weeks oftreatment, are currently the most important factorsthat determine the likelihood of achieving SVR inHCV genotype 1-infected patients. Predictive

Fig. 4. Ingenuity pathway analysis indi-cating direct (solid line) and indirect(dashed line) relationships betweenderived metaproteins in a training cohortand IFN-response signaling pathways. GC,vitamin D binding protein; AHSG, alpha-2-HS-glycoprotein/fetuin A; C5, complementC5; AKTI, protein kinase akt-1; ST6GAL1,beta-galactoside alpha-2,6-sialyltransfer-ase; STAT3, signal transducer and activatorof transcription 3; IFNG, interferon gamma;TNF, tumor necrosis factor; pkc (s), proteinkinase C; IL28RA, interleukin-28 receptorsubunit alpha.

HEPATOLOGY, Vol. 53, No. 6, 2011 PATEL, LUCAS, THOMPSON, ET AL. 1815

Page 8: High Predictive Accuracy of an Unbiased Proteomic Profile ......High Predictive Accuracy of an Unbiased Proteomic Profile for Sustained Virologic Response in Chronic Hepatitis C

algorithms have been developed based on combiningbaseline factors and on-treatment responses, but havenot yet been validated or adopted into routine clinicalpractice. However, this still entails a several-week pe-riod of therapy for the patient with associated cost andrisk of adverse events.Recent genome-wide association data in a large clin-

ical study of an HCV genotype 1 population adherentto therapy indicated the presence of single nucleotidepolymorphisms in the host IL28B region on chromo-some 19q13 that encodes type III interferon-k3, to bestrongly associated with SVR.4 The frequency of thefavorable genotype varies by population, and is presentin less than 40% of patients of European ancestry andless than 20% of African-American patients. The pres-ence of the good response IL28B variant alone appearsto have modest baseline predictive performance forSVR in HCV genotype 1 patients, with predictive val-ues below 0.7 in larger cohorts, and its future clinicalutility may depend on combination with other baselineor on-treatment predictors of virologic response. Weevaluated the IL28B genotype in our validation cohort,and despite the small number of patients, we notedthat our metaprotein model was able to determinevirologic responses in 71% of patients with poorresponse (non-CC) IL28B genotype. In clinical prac-tice, there may be the potential for adjunctive use ofmetaproteins, along with IL28B genotype determina-tion, to increase sensitivity and provide accurate deter-mination of virologic response in the majority ofHCV genotype 1 patients. Of clinical relevance fornon-CC genotype CHC patients is that a secondarypredictive measure of virologic outcome allows theoption of further individualizing therapeutic regimens.Our pretreatment predictive metaprotein model is

based on a tertiary center referral cohort that com-pleted assigned duration of therapy, controlled forbaseline host and viral variables known to affectresponse. There was one patient (responder) in thetraining set and five patients (three responders and twononresponders) in the validation cohort with META-VIR stage 4. However, the overall rate of responseamong cirrhotic patients was similar to those withoutcirrhosis. Additionally, when the five subjects with cir-rhosis were removed from the validation dataset, theoverall AUROC for SVR did not change. Although wedid not include any HCV genotype non-1 nonres-ponders in our training cohort, latent factor expressionpatterns were similar between HCV genotype 1 andnon-1-infected patients that achieved SVR, indicatinga specific virologic response expression phenotype, andnot a pattern reflecting differences in viral genotype.

Restriction of the model fit on the training data tojust those patients in the study with HCV genotype 1disease results in an AUROC of 0.89, indicating nosignificant association between the metaprotein predic-tors and viral genotype.Pathway analysis provides biological plausibility in

that the identified metaproteins are associated with vari-ous immunoregulatory functions related to the inflam-matory process in CHC infection. Vitamin D bindingprotein is a polymorphic multifunctional 52.9 kDa pro-tein encoded by the albumin gene family, synthesizedin the liver and found in plasma, urine, cerebrospinalfluid, and on the surface of many cells. It carries thevitamin D sterols, associates with membrane-bound im-munoglobulin on the surface of B-lymphocytes, and isinvolved in several biological functions, such as fattyacid transport and macrophage activation.18 VTDB lev-els may provide prognostic information in acute liverfailure, and augments complement C5a mediated che-motaxis during inflammation. Alpha-2-HS glycoprotein(also termed Fetuin-A, FETUA) is a 45-kDa plasmaprotein synthesized by hepatocytes and regulated as anegative acute phase reactant.19 FETUA promotes op-sonization, inhibits insulin receptor tyrosine kinase, actsas an antagonist of TGF-b and regulates cytokine de-pendent bone mineralization. Fetuin-A has been shownto decline in CHC patients with increasing fibrosisstage.15 However, both SVR and nonresponder patientswere matched for fibrosis severity in our training andvalidation cohorts. Complement C5 is a 180-kDa pro-tein that is cleaved into active peptides: C5a, whichmediate local inflammatory responses, and C5b, thatinitiates the formation of the complement membraneattack complex, and is a key component of the innateimmune response.Further validation of this proteomic signature will

be required in external CHC cohorts, and the determi-nation of IL28B genotype variants will also be impor-tant in this regard, and likely provide adjunctive datato improve baseline predictive indices of response andthe potential for individualized targeted therapy. How-ever, any association between these proteins andimmunomodulatory pathways linked to IFN-basedtherapies remain hypothetical at this stage. Several pro-teins implicated by metaprotein analysis have immuno-assays (e.g., enzyme-linked immunosorbent assay[ELISA]) available, so this provides a potential avenuefor proteomic signature validation and future clinicalassay development. However, because many of the dif-ferentiating signals included in the metaprotein modelmay be peptide-specific, a better option for validationand initial clinical implementation may be quantitative

1816 PATEL, LUCAS, THOMPSON, ET AL. HEPATOLOGY, June 2011

Page 9: High Predictive Accuracy of an Unbiased Proteomic Profile ......High Predictive Accuracy of an Unbiased Proteomic Profile for Sustained Virologic Response in Chronic Hepatitis C

(MRM) mass spectrometry. The importance of thisapproach in biomarker validation was recently high-lighted by the Clinical Proteomics Technology Assess-ment for Cancer, and we have planned further targetedmultiple reaction monitoring (MRM) studies in thisregard.20

The power of modern mass spectrometryapproaches is to be able to investigate the samples inan unbiased manner (i.e., without a priori knowledgeof what might be changing), while gaining high confi-dence identifications and very accurate quantitativedata on the species measured in the study. Splice var-iants and posttranslational modifications are just someof the protein isoforms that are present with a fre-quency of several orders of magnitude, and withproper data treatment some of these changes to highabundance proteins can be monitored within the data-set as well. Still, a significant challenge to proteomebiomarker discovery is that the blood proteomeremains a complex biological system to evaluate withsignificantly more features compared to the genome.The quantitative dynamic range of identified plasmaproteins alone is 1010 (from IL-6 at 0-5 pg/mL to al-bumin at 35-50 g/dL). Although our methodologyemployed current high-affinity depletion techniques,low-abundance markers of antiviral response werealmost certainly missed in our cohort, due to the lim-ited dynamic range of the technique (�3 orders ofmagnitude). Additional immunodepletion or subpro-teome enrichment techniques, in conjunction withhigh-resolution LC-MS, are powerful tools for lower-level biomarker discovery that have distinct promise infuture HCV research.21,22

Disease variability may not be as important in achronic infection such as HCV with a prolonged natu-ral history, but accounting for inherent patient variabili-ty in the plasma proteome will continue to provide asignificant challenge to discovery efforts.23 Likewise,adapting unbiased discovery methods to the rapidlychanging therapeutic landscape in CHC infection willprovide further ongoing challenges in the future. Insummary, this preliminary study provides the firstdescription of the potential clinical utility of unbiasedproteomics, to identify metaproteins of interest that canaccurately predict virologic responses to current SOCtherapy in the majority of hepatitis C patients.

Acknowledgment: K.P., J.L., J.W.T., H.T., A.T.,R.M.C., M.A.M., G.S.G., J.G.M., J.J.M. conceptual-ized and implemented the study design; K.P., J.L.,J.W.T., J.G.M., and J.J.T. were responsible for dataanalysis and interpretation and drafted the initial arti-

cle; J.W.T., M.A.M., and L.G.D. performed proteomeanalysis; D.U. implemented material transfer and arti-cle preparation. All authors had access to data andcontributed to the final version of the article. Wethank Martha Stapels and Scott Geromanos for assis-tance in LC-MS method development and data collec-tion, Cindy Chepanoske and Andrey Bondarenko forassistance in raw data processing, and Crystal Catesand Melissa Spain for assistance with biorepository se-rum sample processing.

References1. Falck-Ytter Y, Kale H, Mullen KD, Sarbah SA, Sorescu L, McCullough

AJ. Surprisingly small effect of antiviral treatment in patients with hep-atitis C. Ann Intern Med 2002;136:288-292.

2. Manns MP, McHutchison JG, Gordon SC, Rustgi VK, Shiffman M,Reindollar R, et al. Peginterferon alfa-2b plus ribavirin compared withinterferon alfa-2b plus ribavirin for initial treatment of chronic hepatitisC: a randomised trial. Lancet 2001;358:958-965.

3. Kau A, Vermehren J, Sarrazin C. Treatment predictors of a sustainedvirologic response in hepatitis B and C. J Hepatol 2008;49:634-651.

4. Ge D, Fellay J, Thompson AJ, Simon JS, Shianna KV, Urban TJ, et al.Genetic variation in IL28B predicts hepatitis C treatment-induced viralclearance. Nature 2009;461:399-401.

5. Paradis V, Asselah T, Dargere D, Ripault MP, Martinot M, Boyer N,et al. Serum proteome to predict virologic response in patients withhepatitis C treated by pegylated interferon plus ribavirin. Gastroenterol-ogy 2006;130:2189-2197.

6. Hans C, Dobra A, West M. Shotgun stochastic search for ‘‘Large p’’regression. J Am Stat Assoc 2007;102:507-516.

7. Raftery AE, Madigan D, Hoeting JA. Bayesian model averaging for lin-ear regression models. J Am Stat Assoc 1997;92:179-191.

8. Lucas J, Carvalho C, West M. A Bayesian analysis strategy for cross-study translation of gene expression biomarkers. Stat Appl Genet MolBiol 2009;8:Article 11.

9. Mischak H, Allmaier G, Apweiler R, Attwood T, Baumann M, BenigniA, et al. Recommendations for biomarker identification and qualifica-tion in clinical proteomics. Sci Transl Med 2010;2:46ps42.

10. Asselah T, Bieche I, Sabbagh A, Bedossa P, Moreau R, Valla D, et al.Gene expression and hepatitis C virus infection. Gut 2009;58:846-858.

11. Asselah T, Bieche I, Narguet S, Sabbagh A, Laurendeau I, Ripault MP,et al. Liver gene expression signature to predict response to pegylatedinterferon plus ribavirin combination therapy in patients with chronichepatitis C. Gut 2008;57:516-524.

12. Honda M, Sakai A, Yamashita T, Nakamoto Y, Mizukoshi E, Sakai Y,et al. Hepatic ISG expression is associated with genetic variation ininterleukin 28B and the outcome of IFN therapy for chronic hepatitisC. Gastroenterology 2010;139:499-509.

13. Diamond DL, Jacobs JM, Paeper B, Proll SC, Gritsenko MA, CarithersRL Jr, et al. Proteomic profiling of human liver biopsies: hepatitis C vi-rus-induced fibrosis and mitochondrial dysfunction. HEPATOLOGY 2007;46:649-657.

14. White IR, Patel K, Symonds WT, Dev A, Griffin P, Tsokanas N, et al.Serum proteomic analysis focused on fibrosis in patients with hepatitisC virus infection. J Transl Med 2007;5:33.

15. Cheung KJ, Tilleman K, Deforce D, Colle I, Van Vlierberghe H. TheHCV serum proteome: a search for fibrosis protein markers. J ViralHepat 2009;16:418-429.

16. Diamond DL, Proll SC, Jacobs JM, Chan EY, Camp DG 2nd, SmithRD, et al. HepatoProteomics: applying proteomic technologies to thestudy of liver function and disease. HEPATOLOGY 2006;44:299-308.

17. Kito K, Ito T. Mass spectrometry-based approaches toward absolutequantitative proteomics. Curr Genomics 2008;9:263-274.

HEPATOLOGY, Vol. 53, No. 6, 2011 PATEL, LUCAS, THOMPSON, ET AL. 1817

Page 10: High Predictive Accuracy of an Unbiased Proteomic Profile ......High Predictive Accuracy of an Unbiased Proteomic Profile for Sustained Virologic Response in Chronic Hepatitis C

18. Speeckaert M, Huang G, Delanghe JR, Taes YE. Biological and clinicalaspects of the vitamin D binding protein (Gc-globulin) and its poly-morphism. Clin Chim Acta 2006;372:33-42.

19. Ketteler M, Bongartz P, Westenfeld R, Wildberger JE, Mahnken AH,Bohm R, et al. Association of low fetuin-A (AHSG) concentrations inserum with cardiovascular mortality in patients on dialysis: a cross-sec-tional study. Lancet 2003;361:827-833.

20. Addona TA, Abbatiello SE, Schilling B, Skates SJ, Mani DR, BunkDM, et al. Multi-site assessment of the precision and reproducibility of

multiple reaction monitoring-based measurements of proteins inplasma. Nat Biotechnol 2009;27:633-641.

21. Schiess R, Wollscheid B, Aebersold R. Targeted proteomic strategy forclinical biomarker discovery. Mol Oncol 2009;3:33-44.

22. Bandow JE. Comparison of protein enrichment strategies for proteomeanalysis of plasma. Proteomics 2010;10:1416-1425.

23. Rifai N, Gillette MA, Carr SA. Protein biomarker discovery and valida-tion: the long and uncertain path to clinical utility. Nat Biotechnol2006;24:971-983.

1818 PATEL, LUCAS, THOMPSON, ET AL. HEPATOLOGY, June 2011