downloaded from on april 21, 2020 by guest · 156 signal in both readouts and the time elapsed...

1

INTER‐LABORATORY REPRODUCIBILITY AND PROFICIENCY TESTING WITHIN THE HUMAN 1

PAPILLOMAVIRUSES CERVICAL CANCER SCREENING PROGRAM IN CATALONIA, SPAIN 2

3

R. Ibáñez1, M. Felez‐Sánchez1,2, J.M. Godínez2, C. Guardiá3, E. Caballero4, R. Juve5, N. Combalia6, 4

B. Bellosillo7, D. Cuevas8, J. Moreno‐Crespi9, Ll. Pons10, J. Autonell11, C. Gutierrez12, J. Ordi13, S. 5

de Sanjosé1,2,14, I.G. Bravo1,2. 6

7

1 Infections and Cancer Unit, Cancer Epidemiology Research Program, Catalan Institute of 8 Oncology (ICO). L'Hospitalet de Llobregat, Barcelona, Spain 9

2 Infections and Cancer, Bellvitge Institute of Biomedical Research (IDIBELL) L'Hospitalet de 10 Llobregat, Barcelona, Spain 11

3 Primary Care Clinical Laboratory Barcelonés Nord y Vallés Oriental. Catalan Institute of 12 Health. 08911 Badalona (Spain) 13

4 Microbiology Department. University Hospital Vall d’Hebron. Catalan Institute of Health. 14 08035 Barcelona (Spain) 15

5 Primary Care Clinical Laboratory Bon Pastor. Catalan Institute of Health. 08030 Barcelona 16 (Spain) 17

6 Pathology Department. UDIAT‐CD, University Hospital Parc Tauli. Universitat Autònoma de 18 Barcelona. 08208 Barcelona (Spain) 19

7 Pathology Department. Hospital del Mar. 08003 Barcelona (Spain) 20

8 Pathology and Molecular Genetics Department. University Hospital Arnau de Vilanova. 21 Catalan Institute of Health. 25198 Lleida (Spain) 22

9 Pathology Department. University Hospital of Girona Dr. Josep Trueta. Infections and Cancer 23 Unit. Catalan Institute of Oncology. 17007 Girona (Spain) 24

10 Pathology Deparment, IISPV, URV, Hospital de Tortosa Verge de la Cinta (HTVC) . Catalan 25 Institute of Health. 43500 Tortosa (Spain) 26

11 Pathology Department. Hospital Consortium of Vic. 08500 Vic. Barcelona (Spain) 27

12 Clinical Laboratory ICS Tarragona, Molecular Biology Section. University Hospital of 28 Tarragona Joan XXIII. Catalan Institute of Health. 43005 Tarragona (Spain) 29

13 Pathology Department, CRESIB – Hospital Clínic, University Barcelona. 08036 Barcelona 30 (Spain) 31

14 CIBER Epidemiology and Public Health, Spain 32

JCM Accepts, published online ahead of print on 26 February 2014J. Clin. Microbiol. doi:10.1128/JCM.00100-14Copyright © 2014, American Society for Microbiology. All Rights Reserved.

on June 20, 2020 by guesthttp://jcm

.asm.org/

Dow

nloaded from

http://jcm.asm.org/

2

33

34

Corresponding author: 35

Ignacio G. Bravo 36

Infections and Cancer Laboratory, Cancer Epidemiology Research Programme, Catalan Institute 37

of Oncology 38

Avda. Gran Vía, 199‐203 ‐ 08908 L’Hospitalet de Llobregat, Barcelona; Spain. 39

Phone: +34 93 260 7123 /7812 40

E‐mail address: [email protected] 41

42

Keywords: 43

Human papillomaviruses, HPV, Hybrid capture, HC2, Reproducibility, Reliability, Cervical cancer 44

screening, Quality control. 45

Running title: 46

Proficiency testing within a cervical cancer screening program 47

48

49


.asm.org/

Dow

nloaded from

http://jcm.asm.org/

3

ABSTRACT 50

In Catalonia, a screening protocol for cervical cancer including Human Papillomaviruses (HPVs) 51

DNA testing using the Digene Hybrid Capture assay (HC2) was implemented in 2006. In order 52

to monitor inter‐laboratory reproducibility, a proficiency testing (PT) survey of the HPVs 53

samples was launched in 2008. The aim of this study was to explore the repeatability of HC2 54

performance. Participant laboratories provided annually twenty samples, five randomly 55

chosen samples from each of the following relative light units (RLU) intervals: <0.5; 0.5‐0.99; 1‐56

9.99; and ≥10. Kappa statistics was used to determine the agreement level between the 57

original and the PT readings. The nature and origin of the discrepant results were calculated by 58

bootstrapping. A total of 946 specimens were retested. Kappa values were 0.91 for 59

positive/negative categorical classification and 0.79 for the four RLU intervals studied. Sample 60

retesting yielded systematically lower RLU values than the original test (p<0.005), 61

independently of the time elapsed between both determinations (median 53 days), possibly 62

due to freeze‐thaw cycles. The probability for a sample to show clinically discrepant results 63

upon retesting was a function of the RLU value: samples with RLU in the 0.5‐5 interval showed 64

10.80% probability to yield discrepant results (95% CI:7.86‐14.33) compared to 0.85% 65

probability for samples outside this interval (95% CI:0.17‐1.69,). Globally, HC2 assay shows a 66

high inter‐laboratory concordance. We have identified differential confidence thresholds and 67

suggested the guidelines for inter‐laboratory PT in the future, as analytical quality assessment 68

of the HPVs DNA detection remains a central component of the screening program for cervical 69

cancer prevention. 70

71


.asm.org/

Dow

nloaded from

http://jcm.asm.org/

4

INTRODUCTION 72

Persistent infection with oncogenic Human Papillomaviruses (HPVs) is a necessary cause in the 73

development of invasive cervical cancer (CC) (8, 16, 24). CC screening based on cervical 74

cytology has been instrumental in decreasing incidence and mortality associated to CC in those 75

countries with high screening coverage rates (13). However, the infectious aetyology of this 76

disease lead to the suggestion that the detection of the DNA of HPVs responsible for cellular 77

transformation could provide a strong predictive marker for early detection of women at risk 78

(20). Worldwide studies have established that testing for the presence of HPVs DNA shows a 79

higher sensitivity than cytology for the detection of high grade cervical intraepithelial lesions 80

(2). Despite this demonstrated higher sensitivity, its application to CC screening programs is 81

still not widespread (1, 2, 17, 19, 21). 82

Currently one of the most widely used tests for the detection of HPVs DNA is the Digene 83

Hybrid Capture assay (HC2; Qiagen, Gaithersburg, MD), approved by the U.S. Food and Drug 84

Administration (FDA) in 2003 for its use in clinical settings. Large prospective cohort studies 85

and randomized controlled trials have proved this assay as a highly clinical sensitivity test (90‐86

95%) for the detection of high‐grade CIN (2). The HC2 procedure does not include a PCR 87

amplification step, and uses cocktail of probes designed to detect 13 HPVs classified as 88

carcinogenic (groups 1 and 2A) by the International Agency for Research on Cancer: HPV16, 18, 89

31, 33, 35, 39, 45, 51, 52, 56, 58, 59, and 68 (12). 90

In the last years several tests for detection of DNA or transcripts of HPVs have been developed 91

(7, 18). A number of these tests have been proposed to be applied for CC screening. Beyond 92

sensitivity and specificity, an additional critical factor before their introduction as a screening 93

technique is the need for high reproducibility under the diverse conditions that the different 94

clinical laboratories may face (15). The current guidelines for HPVs test propose an intra‐95

laboratory reproducibility and inter‐laboratory agreement with a lower confidence bound not 96


.asm.org/

Dow

nloaded from

http://jcm.asm.org/

5

less than 87% for both of them (15). Good reproducibility of blinded clinical samples in the 97

laboratories might further warrant results reliability (4). 98

In Catalonia, North‐East of Spain, a screening protocol including HPVs testing for selected 99

indications was implemented in 2006 (9). Women became eligible for the carcinogenic HPVs 100

testing if they belonged to any of the three following categories: i) inadequate screening 101

history, i.e. women aged over 40 years and with no history of cytology in the previous five 102

years, ii) incident diagnosis of atypical squamous cell of undetermined significance (ASC‐US); or 103

iii) first follow‐up visit after a surgical conization. During the 2006 to 2012 period, 116,970 tests 104

for carcinogenic HPVs have been performed within CC screening activities in the Public Health 105

Sector in Catalonia. In order to monitor inter‐laboratory reproducibility, a proficiency testing 106

(PT) survey of the HPVs samples was launched in 2008, covering the twelve laboratories 107

involved in the screening activities. 108

The aim of the present study was to evaluate inter‐laboratory reproducibility and error rate in 109

the HC2 performance in cervical specimens among the twelve reference laboratories 110

participating in the HPVs screening in Catalonia during the period 2008‐2011. Additionally, we 111

aimed to investigate the nature and origin of the discrepant results in order to provide 112

guidelines for the assessment of inter‐laboratory concordance thresholds in the future. 113

114


.asm.org/

Dow

nloaded from

http://jcm.asm.org/

6

MATERIALS AND METHODS 115

Design of external proficiency testing program 116

All tests (original and PT assays) were performed using HC2 with the “high‐risk” probe pool 117

only, following the manufacturer's instructions. According to the FDA‐approved guidelines, the 118

threshold for positivity is the HC2 response against a control sample containing 1.0 pg mL‐1 119

HPV DNA, roughly equivalent to 5,000 genomic copies. A sample was considered positive if it 120

rendered a relative light unit ratio (RLU) of ≥1 with respect to the positive control. 121

The twelve laboratories for HPVs screening in Catalonia (Hospital Hospital Universitari Dr. 122

Josep Trueta, Consoci hospitalari de Vic, Hospital Universitari Joan XXIII, Hospital del Mar, 123

Hospital Universitari de Bellvitge –Institut Català d’Oncologia (ICO), Hospital Clínic, Hospital 124

Universitari Vall d’Hebron, Hospital Universitari Verge de la Cinta, Hospital Universitari Arnau 125

de Vilanova, Consorci sanitari Parc Tauli, and the Laboratoris d’Atenció Primària Doctor Robert 126

and Bon Pastor) participated in the PT program. The laboratory at the ICO was designated by 127

the Health Department of the Government of Catalonia as the reference laboratory for PT 128

purposes. The overall project was approved by the ethical committee of the ICO/IDIBELL. All 129

information regarding the identification of patients was anonymized before analysis. 130

The PT was run annually and blindly. Between 2008 and 2011 each participating laboratory 131

delivered 20 samples to the reference laboratory. The PT of the reference laboratory itself was 132

performed by one of the twelve laboratories. For PT, the laboratories provided the residual 133

aliquot of the original samples. Eleven of the participant laboratories collected the samples in 134

STM medium. One laboratory served liquid‐cytology based screening and in this case, buffer 135

conversion previous to retesting was performed according to the manufacturer’s guidelines. 136

Samples were kept frozen at ‐20 ºC before delivery. As far as possible, laboratories were 137

requested to deliver samples for PT within the three months after the initial testing. 138

Nevertheless, the possible effect of time elapsed between the original and the PT test was also 139


.asm.org/

Dow

nloaded from

http://jcm.asm.org/

7

assessed. In order to ensure a uniform coverage distribution of positive and negative HC2 140

values, the laboratories were asked to provide annually five randomly chosen samples of each 141

of the following RLU intervals: <0.5; 0.5‐0.99; 1‐9.99; and ≥10. 142

Concordance analysis 143

HPV samples were collated and analysed to allow for inter‐laboratory comparisons. Paired HC2 144

test results were categorized as Original‐negative/PT‐negative, Original‐positive/PT‐negative, 145

Original‐negative/PT‐positive, and Original‐positive/PT‐positive based on the 1.0 RLU/PC 146

positive cutoff. 147

Cohen’s Kappa statistics was used to determine the level of agreement between the original 148

and the PT categories of results (positive/negative and RLU intervals). The Kappa statistic is a 149

measure of inter‐rater agreement, which tests as null hypothesis the inter‐rater agreement by 150

chance (14). Generally a Kappa score between 0.8 and 1 is considered as an excellent 151

agreement, values between 0.61 and 0.8 substantial agreement; between 0.41 and 0.6 152

moderate agreement, between 0.21 and 0.4 fair agreement; and between 0 and 0.2 slight 153

agreement (3). 154

Correlation between original and PT HC2 readouts as well as correlation between changes in 155

signal in both readouts and the time elapsed between both tests were assessed by linear 156

regression using SPSS Statistics17.0 (SPSS Statistics/IBM, Chicago IL, USA). Paired Wilcoxon & 157

Mann‐Whitney test implemented in R (http://www.r‐project.org/) was used to test the null 158

hypothesis of the median difference between RLU values in origin and at PT to be non‐159

different from zero. 160

For all statistical analyses, a p‐value below 0.05 was considered as a significant for rejection of 161

the null hypothesis. 162

Bootstrapping analysis 163


.asm.org/

Dow

nloaded from

http://jcm.asm.org/

8

In order to estimate the limits for the expected number of discrepancies between the results 164

at origin and at the PT, a bootstrapping analysis was done. One thousand “virtual labs” were 165

generated by bootstrapping among the pooled samples, allowing for replacement. All these 166

“virtual labs” were constructed with the same data structure that was requested for the PT, 167

i.e. 20 samples in total, five samples of each of the intervals above defined. We performed also 168

the same analyses in another hypothetical scenario, with 40 samples per “virtual laboratory”, 169

ten of each interval. The total numbers of discrepancies and concordances, as well as the 170

Cohen’s Kappa statistics were computed for each of the “virtual labs”. Distribution of 171

discrepancies was fitted to a Poisson model and the 95% confidence interval was calculated 172

based on the fitted model. 173

We explored further the differential repeatability of the technique for different values of the 174

RLU variable. Each of the intervals 0.5‐0.99 and 1.0‐10 was divided in 0.1 unit sub‐intervals. For 175

each sub‐interval, one thousand “virtual laboratories” were created by drawing random 176

samples with replacement from the original pooled samples. The size of each “virtual lab” was 177

equal to the original number of samples within the sub‐interval. The mean percentage of 178

discrepancies was calculated for each of the defined sub‐intervals. The 2.5% and 97.5% 179

quantiles of the 1000 bootstrap samples were used to define the lower bound and the upper 180

bound of 95% CI. In all cases bootstrapping analyses were performed, using in‐house Perl 181

scripts. 182

183


.asm.org/

Dow

nloaded from

http://jcm.asm.org/

9

RESULTS 184

During the period 2008‐2011, a total of 946 specimens were retested for PT within the 185

framework of the CC screening program in Catalonia, Spain. The results of these paired tests 186

were used to study inter‐laboratory reproducibility. 187

Comparison between origin‐PT paired tests showed high correlation 188

The distribution of the samples selected for PT was chosen to show a flat distribution of RLU 189

values around the clinical cutoff value. The precise distribution of the samples tested in the PT 190

and its comparison with the distribution of the general population is given in Table 1. In the 191

general population in which HC2 was used to assess the presence of DNA of oncogenic HPVs 192

(n=46,949), the distribution of positive and negative results for the period 2006‐2009 was 22% 193

and 78%, respectively. In contrast, in the PT samples this distribution was 52.2% and 47.8%, 194

respectively. Regarding the classification by categories of RLU values, in the HPV‐tested 195

general population the central intervals spanning 0.5‐9.99 comprised only the 9.8% of all 196

samples, while 47% of the PT samples were located in these intervals. These differences arise 197

from the focus itself of the PT program, aiming to evaluate the inter‐laboratory reliability of 198

the technique and not necessarily matching with the distribution of the RLU values in the 199

general population. 200

A comparison of RLU values of the HC2 test by the twelve tested laboratories and of the 201

repeated test is shown in Figure 1. The overall correlation coefficient for the original and the 202

PT values was 0.95 (p<0.05), ranging between 0.88 and 0.97 for each of the individual 203

laboratories (data not shown). Correlation restricted to samples interpreted as positive 204

(RLU≥1) in both origin and in the PT retest increased up to 0.97 (p<0.05) (data not shown). The 205

PT retests rendered systematically lower RLU values than the reported by origin laboratories, 206

with a pairwise median decrease of 9% in signal. This median difference was significantly 207

different from zero, as determined by a paired Wilcoxon & Mann‐Whitney test (p<0.005). 208


.asm.org/

Dow

nloaded from

http://jcm.asm.org/

10

Time elapsed between original and PT test did not influence reproducibility 209

The mean time elapsed between the first and the PT read was 90.5 days, with a median of 53.0 210

days, and a range from 8 to 885 days. To determine whether time elapsed between 211

consecutive assays on the same sample could influence the differences between the RLU 212

values reported by HC2, the correlation between the original and the PT readout values was 213

assessed (Figure 2). No significant correlation was found (p=0.83), indicating that the time 214

interval between consecutive tests did not result in a decrease of the RLU readout value. 215

All laboratories exhibited high agreement with the PT results 216

In order to assess the agreement between the two tests, Cohen’s Kappa was calculated for the 217

positive/negative categorical classification and RLU intervals (Table 2). Paired comparison 218

between original and PT analyses for positive/negative results rendered an overall excellent 219

agreement (Kappa=0.91). The Kappa values for the individual laboratories ranged between 220

0.84 and 0.97, being above 0.90 for eight (66.7%) of them (Table S1). The laboratory using 221

liquid cytology‐based samples did not behave different from the rest of laboratories tested, 222

which used STM as collecting medium. The total number of discordant results was 44 (4.6% of 223

the total samples). A slightly asymmetric distribution of discordant tests was observed, with 224

more discordant samples being positive in origin and negative in the PT than vice versa (25 vs 225

19 respectively), although this difference was not significant (Z‐Score=1.28, p‐value=0.20). The 226

vast majority of discrepant results (97.7%) were reported in samples with RLU values between 227

0.5 and 10.0 in original tests. Remarkably, 13 of them showed a large difference between the 228

paired tests: ten positive samples with an original RLU between 2.0 and 9.0 produced negative 229

results between 0.08 and 0.77 in the PT tests. In contrast, only two samples with original 230

negative result (RLU of 0.61 and 0.72) produced positive results (RLU of 2.97 and 2.98 231

respectively) in the PT retest. Only one sample was an outlier following Tukey’s criterion (23), 232

showing an original result of 38.57 and a 0.17 result in the PT test. 233


.asm.org/

Dow

nloaded from

http://jcm.asm.org/

11

The reproducibility of the test in each of the RLU intervals here defined showed an overall 234

Kappa value of 0.79 with values ranging from 0.70 to 0.84 for each of the individual 235

laboratories (Table S2). The observed concordance in the extreme intervals, i.e. RLU below 0.5 236

and RLU above 10, was higher than in the intervals around the cutoff, i.e. RLU between 0.5 and 237

10, with overall agreement values of 96.4% and 70.1%, respectively. 238

Exploring new alternatives in the design of the PT test 239

Finally, we explored different alternatives for future improvement of the PT design. First, we 240

aimed to provide a threshold for identifying significant deviations from the maximum expected 241

number of discrepancies, under the current sampling scheme for PT. Limits for a discrepancy 242

report were determined under two scenarios: 20 samples analysed for PT per laboratory with 243

five samples per each interval (i.e. the one currently implemented); and 40 samples per 244

laboratory with ten samples for each interval (which could be implemented depending on 245

budget). We generated, by bootstrapping with replacement from the original data, 1,000 data 246

sets that corresponded to “virtual laboratories”, using the same stratified structure of samples. 247

The results were fitted to a Poisson distribution and the accumulated probability for the 248

detection of the corresponding number of discrepant results was calculated. We aimed to find 249

the number of discrepant results that could be expected to appear by chance with a 250

probability below 5% (Figure 4). Using the stratified structure of samples evaluated in the 251

present study (n=20, five samples of each interval), the probability of having two or more 252

discrepancies was 7.5%, whereas the probability of three or more discrepancies was 1.7%. If 40 253

samples in total were collected, the probability of finding three or more discrepancies was 254

13.1%, while retrieving four or more discrepancies presented a probability of 4.7%. Thus, 255

under the present PT structure and retesting 20 samples per laboratory, the analyses should 256

be labelled as significantly discordant when PT yields three or more discrepant samples. If 40 257


.asm.org/

Dow

nloaded from

http://jcm.asm.org/

12

were to be retested using the same structure, the flag for significant discordance should be 258

raised for sets with four or more discrepant samples. 259

Finally we explored whether the results could be used to improve the PT design, by modifying 260

the boundaries for sample stratification. We focused on the readout values with lower 261

reproducibility, identified above by means of Kappa values as the area around the positivity 262

threshold. This gray area comprised two of our RLU categories: 0.5‐0.99 and 1‐9‐99. By using 263

bootstrapping analyses as described above, we explored the expected percentage of 264

discrepancies along both RLU categories (Figure 4). The results indicated that in the 0.5≤RLU<1 265

interval the expected number of samples showing discrepant results did not experiment 266

significant changes, remaining flat around 13.7% (95%CI: 5.5‐28.6%). In the 1≤RLU<10 interval 267

we observed a significant monotonic decrease (p‐value for trend<0.001) in the percentage of 268

samples showing discrepant results, starting from 25.9% (95%CI: 14.8‐38.9%) for values in the 269

1≤RLU<2 interval and stabilising after RLU≥5 interval to reach 9.7% (95%CI: 5.7‐13.4%). 270

Moreover, we have explored the differential probability for a sample to show discrepancy 271

upon retesting, as a function of the RLU values. The RLU interval 0.5‐5 concentrated the 272

highest probability for discrepancies: the probability for a sample in this interval to show 273

discrepancy was 10.80% (95% CI: 7.86‐14.33) compared to 0.85% (95% CI: 0.17‐1.69,) for 274

samples outside this interval. The accumulated probabilities for discrepancy for the different 275

intervals of the RLU variable are given in Table 3. These improved intervals will be 276

implemented recommended in the future for the PT activities when using HC2. We will thus 277

ask the participating laboratories to would be asked to provide annually with randomly chosen 278

samples of each of the following RLU intervals with the following structure: RLU<0.5, five 279

samples; 0.5≤RLU<1, ten samples; 1≤RLU<5, ten samples; and RLU≥5, five samples. In this case, 280

the probability of having three or more discrepancies by chance was will be 13.5%, whereas 281

the probability of three four or more discrepancies by chance was will be 4.8%. 282


.asm.org/

Dow

nloaded from

http://jcm.asm.org/

13

283

DISCUSSION 284

An inter‐laboratory PT was successfully implemented for the detection of high‐risk HPV types, 285

within the CC screening activities in the Public Health System in Catalonia. It was considered 286

instrumental to ensure the accuracy of the results obtained by HPV testing. A total of 946 287

samples were analysed for PT during the 2008‐2011 period and a total of 44 (4.6%) 288

discrepancies were found. The present PT study was in agreement with the guidelines 289

proposed by Meijer and co‐workers for the validation of high‐risk HPV tests for primary CC 290

screening (15). These guidelines proposed that inter‐laboratory agreement should be 291

determined by evaluation of at least 500 samples, with a 30% of the samples having tested 292

positive in a reference laboratory using a clinically validated assay, and reaching an agreement 293

with a lower confidence bound not less than 87%. 294

It is very important to note that the distribution of the RLU values used for the PT does not 295

follow the one obtained from the general population participating in the screening algorithms 296

in which HC2 was used to assess the presence of DNA of oncogenic HPVs (Table 1). Differences 297

between distributions reflect the analytical nature of our PT program, as we were interested in 298

assessing the overall performance of the laboratories running the HC2 technique. We chose 299

therefore to design a balanced PT sample distribution exploring equally the full range of the 300

readout variable, as otherwise the central values, close to cutoff and more prone to 301

discrepancy, would have been undersampled. 302

Paired test demonstrated an almost excellent inter‐laboratory agreement in all twelve 303

participating laboratories, both for positive/negative agreement (Kappa= 0.91) as well as for 304

the four RLU categories (Kappa= 0.79). In our study, RLU values were slightly but significantly 305

lower in the PT retesting compared to those obtained in the original laboratory, with a median 306

decrease of 9.0% from the original RLU value. One possible explanation for this trend could 307


.asm.org/

Dow

nloaded from

http://jcm.asm.org/

14

have been specimen degradation between the two tests, as it has been reported for HC2 to a 308

certain extent (5, 6). We tested the hypothesis of the influence of time elapsed between the 309

two consecutive tests to account for this difference, but we did not find a significant 310

association between the decrease in the RLU readout variable and the time between the two 311

HC2 tests. An alternative explanation could be the impact of consecutive freeze‐thaw cycles, as 312

samples were denatured, analysed in origin, frozen, sent, and thawed for PT analysis, 313

especially because the HC2 test does not include an amplification step. Finally, although in 314

absolute terms more samples were found positive in origin and tested negative in PT than the 315

inverse case, the difference between both numbers was not statistically significant. Overall, 316

the small decrease in signal during retesting entails little or none clinical significance because 317

HPV retesting is not part of routine clinical practice (6). 318

Several reports have addressed the use of HC2 in CC screening and its performance compared 319

to other methods for detecting HPVs genetic material (10, 11, 18, 22). However, few data on 320

the reproducibility of HC2 are available. In the ALTS study, inter‐laboratory reproducibility 321

values across four laboratories were found to be similar to those herein communicated 322

(Kappa=0.84) (6). The inter‐laboratory reproducibility for HC2 in seven laboratories 323

participating in an Italian clinical trial was also high (4). In both cases RLU values around the 324

cutoff (RLU=1) were more prone to show discrepant results, as observed in our study. Thus, 325

the random variation of RLU values was more influential on the classification as positive or 326

negative for samples rendering values close to the cutoff (4). One fundamental contribution of 327

the present study was the detailed description of the differential reproducibility of the HC2 328

technique for different intervals of the RLU. We conclude therefore that, in our settings, the 329

0.5≤RLU<5 is the interval in which PT paired values of the readout RLU were more likely to 330

show discrepancy. 331


.asm.org/

Dow

nloaded from

http://jcm.asm.org/

15

A central aim of the initial PT assessment after the first four years of the CC screening activities 332

was to provide guidelines for a better PT control itself in the future. As such, we have 333

estimated the number of samples that should test discrepant after retesting, to be considered 334

as a “significant discordant”. In our case and with our data structure, this critical value is three 335

or more individual discrepant samples in the sets of 20 samples to be retested. Two factors in 336

our PT design are susceptible of changes for improvement: the number of retested samples 337

and its stratified structure. Increasing the number of retested samples would allow us to be 338

more accurate at finding a number of discrepancies significantly higher than expected by 339

chance. This has been exemplified in the simulation described above including 40 instead of 20 340

samples per laboratory per year. Obviously, this change also doubles the cost of the PT 341

assessment and cost‐benefit equilibrium must be found. Regarding sampling structure, the aim 342

of the PT program is analytical, and not clinical, and we have thus chosen to equally monitor 343

throughout the dynamic range of the readout variable, instead of randomly sampling from the 344

entire population. This strategy allows placing a special emphasis on the region in which the 345

technique is less reproducible. In order to focus in the sensitive area in which samples were 346

more likely to be discordant (0.5‐5) we propose to apply the following sample structure for 347

future PT assessment: five samples with RLU<0.5; ten samples with 0.5≤RLU<1; ten samples 348

with 1≤RLU<5; and five samples with RLU≥5. 349

Conclusions 350

The results of the PT assessment for the CC screening program in Catalonia have shown that 351

the HC2 assay has a high inter‐laboratory concordance. The use of a common, standardized 352

protocol with sharp anti‐contamination measures for samples and processing fluids in the 353

twelve laboratories involved has been instrumental to achieve these excellent results. We have 354

also detected a significant decrease in HC2 signal after retesting that cannot be linked to the 355

time elapsed between consecutive tests, and that may be attributable to the additional freeze‐356


.asm.org/

Dow

nloaded from

http://jcm.asm.org/

16

thaw cycle. We have additionally explored in depth the differential repeatability along the 357

dynamic response of the readout variable and have demonstrated that most discrepant results 358

accumulate in the 0.5‐5 RLU interval, with samples in this interval being 12.7 times more likely 359

to show discrepant results than samples outside this interval. With this information we have 360

further defined future recommendations and confidence thresholds for inter‐laboratory PT in 361

the future when using HC2 as screening test, as analytical quality assessment of the HPVs DNA 362

detection remains a central component of the screening program for CC prevention. 363

364


.asm.org/

Dow

nloaded from

http://jcm.asm.org/

17

Authors’ contributions 365

Conceived the analyses: RI, IGB. Coordinated the screening activities: RI, SdS. Analysed the 366

data: RI, MFS, IGB. Performed experiments: JMG, CG, EC, RJ, NC, BB, CD, JMC, LlP, JA, CG, JO. 367

Drafted the manuscript: RI, MFS, IGB. All authors read and approved the final manuscript. 368

369

Acknowledgments 370

This study has been partially supported by the Pla Director d’Oncologia of the Health 371

Department in Catalonia and grants from the Instituto de Salud Carlos III (Spanish 372

Government, grants RCESP C03/09, RTICESP C03/10, RTIC RD06/0020/0095, RD12/0036/0056, 373

and CIBERESP) and from the Agència de Gestió d’Ajuts Universitaris i de Recerca (Catalan 374

Government, grants AGAUR 2005SGR 00695 and 2009SGR126). 375

SdS has received occasional travel grants from Merck and Qiagen and holds a restricted grant 376

from Qiagen and from Merck not related to the study presented. None of the funding bodies 377

had any role in study design; in collection, analysis, and data interpretation; in the writing of 378

the manuscript; and in the decision to submit the manuscript for publication. 379

The authors want to thank Prof. Attila Lorincz for his comments and remarks to an initial 380

version of the manuscript. We would like to thank the following persons for their collaboration 381

in this study: Marylène Lejeune and Carlos López, Cristina Callau from Molecular Biology and 382

Research Section, IISPV, URV, Hospital de Tortosa Verge de la Cinta (HTVC); Anna Ferran 383

Gibert, Mercé Rey Ruhi, Ruth Orellana Fernandez and Irmgard Costa Tranchel from UDIAT‐CD, 384

Hospital Universitari Parc Taulí; Roser Esteve from Barcelona Hospital Clínic; Montserrat Sardà 385

Roca and Àngels Verdaguer Autonell from Consorci Hospitalari de Vic; Jo Ellen Klaustermeier, 386

Vanesa Camón, Ana Esteban, Yolanda Florencia, Isabel Català and Núria Baixeras from Hospital 387

universitari de Bellvitge – Institut Català d’Oncologia; José Maria Navarro Olivella, Carme 388


.asm.org/

Dow

nloaded from

http://jcm.asm.org/

18

Perich Alsina, and Belen Viñado from laboratori Clínic Bon Pastor; Cristina Antúnes Vitales, 389

Glòria Oliveras Serrat, Lluís Bernadó Turmo and Pilar Castro Marqueta from Institut Català 390

d'Oncologia (Girona); Ana Velasco Sánchez, Anna Serrate López, Marta Romero Fernández, 391

Azahar Romero Jiménez and Nuria Llecha from Hospital Universitari Arnau de Vilanova; 392

Vanessa Guerrero Hormiga and Ignacia Ramos Mora from Hospital Universitari Joan XXIII; Rosa 393

Tenllado Gimenez and Adoración Velilla from Laboratori Barcelonès Nord i Vallès Oriental; and 394

Belén Lloveras, Francesc Alameda and Merced Muset from Hospital del Mar. 395

396


.asm.org/

Dow

nloaded from

http://jcm.asm.org/

19

REFERENCES 397

398

1. Anttila, A., L. Kotaniemi‐Talonen, M. Leinonen, M. Hakama, P. Laurila, J. Tarkkanen, 399 N. Malila, and P. Nieminen. 2010. Rate of cervical cancer, severe intraepithelial 400 neoplasia, and adenocarcinoma in situ in primary HPV DNA screening with cytology 401 triage: randomised study within organised screening programme. BMJ 340:c1804. 402

2. Arbyn, M., G. Ronco, A. Anttila, C. J. Meijer, M. Poljak, G. Ogilvie, G. Koliopoulos, P. 403 Naucler, R. Sankaranarayanan, and J. Peto. 2012. Evidence regarding human 404 papillomavirus testing in secondary prevention of cervical cancer. Vaccine 30 Suppl 405 5:F88‐99. 406

3. Brennan, P., and A. Silman. 1992. Statistical methods for assessing observer variability 407 in clinical measures. BMJ 304:1491‐4. 408

4. Carozzi, F. M., A. Del Mistro, M. Confortini, C. Sani, D. Puliti, R. Trevisan, L. De Marco, 409 A. G. Tos, S. Girlando, P. D. Palma, A. Pellegrini, M. L. Schiboni, P. Crucitti, P. Pierotti, 410 A. Vignato, and G. Ronco. 2005. Reproducibility of HPV DNA Testing by Hybrid Capture 411 2 in a Screening Setting. Am J Clin Pathol 124:716‐21. 412

5. Castle, P. E., A. Hildesheim, M. Schiffman, C. A. Gaydos, A. Cullen, R. Herrero, M. C. 413 Bratti, and E. Freer. 2003. Stability of archived liquid‐based cytologic specimens. 414 Cancer 99:320‐2. 415

6. Castle, P. E., C. M. Wheeler, D. Solomon, M. Schiffman, and C. L. Peyton. 2004. 416 Interlaboratory reliability of Hybrid Capture 2. Am J Clin Pathol 122:238‐45. 417

7. Cuzick, J., C. Bergeron, M. von Knebel Doeberitz, P. Gravitt, J. Jeronimo, A. T. Lorincz, 418 J. L. M. M. C, R. Sankaranarayanan, J. F. S. P, and A. Szarewski. 2012. New 419 technologies and procedures for cervical cancer screening. Vaccine 30 Suppl 5:F107‐420 16. 421

8. de Sanjose, S., W. G. Quint, L. Alemany, D. T. Geraets, J. E. Klaustermeier, B. Lloveras, 422 S. Tous, A. Felix, L. E. Bravo, H. R. Shin, C. S. Vallejos, P. A. de Ruiz, M. A. Lima, N. 423 Guimera, O. Clavero, M. Alejo, A. Llombart‐Bosch, C. Cheng‐Yang, S. A. Tatti, E. 424 Kasamatsu, E. Iljazovic, M. Odida, R. Prado, M. Seoud, M. Grce, A. Usubutun, A. Jain, 425 G. A. Suarez, L. E. Lombardi, A. Banjo, C. Menendez, E. J. Domingo, J. Velasco, A. 426 Nessa, S. C. Chichareon, Y. L. Qiao, E. Lerma, S. M. Garland, T. Sasagawa, A. Ferrera, 427 D. Hammouda, L. Mariani, A. Pelayo, I. Steiner, E. Oliva, C. J. Meijer, W. F. Al‐Jassar, 428 E. Cruz, T. C. Wright, A. Puras, C. L. Llave, M. Tzardi, T. Agorastos, V. Garcia‐Barriola, 429 C. Clavel, J. Ordi, M. Andujar, X. Castellsague, G. I. Sanchez, A. M. Nowakowski, J. 430 Bornstein, N. Munoz, and F. X. Bosch. 2010. Human papillomavirus genotype 431 attribution in invasive cervical cancer: a retrospective cross‐sectional worldwide study. 432 Lancet Oncol 11:1048‐1056. 433

9. Departament de Salut. Direcció General de Planificació i Avaluació. Generalitat de 434 Catalunya. 2007. Protocol de les Activitats per al Cribratge del Càncer de Coll Uterí a 435 l’Atenció Primària., Barcelona. 436

10. Gravitt, P. E., M. Schiffman, D. Solomon, C. M. Wheeler, and P. E. Castle. 2008. A 437 comparison of linear array and hybrid capture 2 for detection of carcinogenic human 438 papillomavirus and cervical precancer in ASCUS‐LSIL triage study. Cancer Epidemiol 439 Biomarkers Prev 17:1248‐54. 440

11. Hesselink, A. T., N. W. Bulkmans, J. Berkhof, A. T. Lorincz, C. J. Meijer, and P. J. 441 Snijders. 2006. Cross‐sectional comparison of an automated hybrid capture 2 assay 442 and the consensus GP5+/6+ PCR method in a population‐based cervical screening 443 program. J Clin Microbiol 44:3680‐5. 444


.asm.org/

Dow

nloaded from

http://jcm.asm.org/

20

12. IARC. 2012. IARC Monographs on the Evaluation of Carcinogenic Risks to Humans. 445 Volume 100B: A review on human carcinogens. International Agency for Research on 446 Cancer., Lyon, France. 447

13. IARC. 2005. IARC Working Group on the Evaluation of Cancer‐Preventive Strategies. In 448 I. Press (ed.), Cervix Cancer Screening, vol. 10, Lyon, France. 449

14. Landis, J. R., and G. G. Koch. 1977. The measurement of observer agreement for 450 categorical data. Biometrics 33:159‐74. 451

15. Meijer, C. J., H. Berkhof, D. A. Heideman, A. T. Hesselink, and P. J. Snijders. 2009. 452 Validation of high‐risk HPV tests for primary cervical screening. J Clin Virol 46 Suppl 453 3:S1‐4. 454

16. Muñoz, N., F. X. Bosch, S. de Sanjosé, R. Herrero, X. Castellsague, K. V. Shah, P. J. F. 455 Snijders, and C. J. L. M. Meijer. 2003. Epidemiologic classification of human 456 papillomavirus types associated with cervical cancer. N Engl J Med 348:518‐527. 457

17. Naucler, P., W. Ryd, S. Tornberg, A. Strand, G. Wadell, K. Elfgren, T. Radberg, B. 458 Strander, B. Johansson, O. Forslund, B. G. Hansson, E. Rylander, and J. Dillner. 2007. 459 Human papillomavirus and Papanicolaou tests to screen for cervical cancer. N Engl J 460 Med 357:1589‐97. 461

18. Poljak, M., J. Cuzick, B. J. Kocjan, T. Iftner, J. Dillner, and M. Arbyn. 2012. Nucleic acid 462 tests for the detection of alpha human papillomaviruses. Vaccine 30 Suppl 5:F100‐6. 463

19. Rijkaart, D. C., J. Berkhof, L. Rozendaal, F. J. van Kemenade, N. W. Bulkmans, D. A. 464 Heideman, G. G. Kenter, J. Cuzick, P. J. Snijders, and C. J. Meijer. 2012. Human 465 papillomavirus testing for the detection of high‐grade cervical intraepithelial neoplasia 466 and cancer: final results of the POBASCAM randomised controlled trial. Lancet Oncol 467 13:78‐88. 468

20. Ronco, G., J. Dillner, K. M. Elfstrom, S. Tunesi, P. J. Snijders, M. Arbyn, H. Kitchener, 469 N. Segnan, C. Gilham, P. Giorgi‐Rossi, J. Berkhof, J. Peto, and C. J. Meijer. 2013. 470 Efficacy of HPV‐based screening for prevention of invasive cervical cancer: follow‐up of 471 four European randomised controlled trials. Lancet. 472

21. Ronco, G., P. Giorgi‐Rossi, F. Carozzi, M. Confortini, P. Dalla Palma, A. Del Mistro, B. 473 Ghiringhello, S. Girlando, A. Gillio‐Tos, L. De Marco, C. Naldoni, P. Pierotti, R. Rizzolo, 474 P. Schincaglia, M. Zorzi, M. Zappa, N. Segnan, and J. Cuzick. 2010. Efficacy of human 475 papillomavirus testing for the detection of invasive cervical cancers and cervical 476 intraepithelial neoplasia: a randomised controlled trial. Lancet Oncol 11:249‐57. 477

22. Sankaranarayanan, R., L. Gaffikin, M. Jacob, J. Sellors, and S. Robles. 2005. A critical 478 assessment of screening methods for cervical neoplasia. Int J Gynaecol Obstet 89 479 Suppl 2:S4‐S12. 480

23. Tukey, J. W. 1977. Exploratory data analysis. Addison‐Wesley, Reading. 481 24. Walboomers, J. M. M., M. V. Jacobs, M. M. Manos, F. X. Bosch, J. A. Kummer, K. V. 482

Shah, P. J. F. Snijders, C. J. L. M. Meifer, and N. Munoz. 1999. Human papillomavirus is 483 a necessary cause of invasive cervical cancer worldwide. J Pathol 189:12‐19. 484

485 486

487


.asm.org/

Dow

nloaded from

http://jcm.asm.org/

21

Figure legends 488

489

Figure 1. Scatterplot of the RLU values for the results of the HC2 test at the origin laboratory 490

and for the paired proficiency test. Results are expressed as logarithmic values of relative 491

luminiscence units (RLU) reads. The correlation coefficient for linear regression was 0.95 (p‐492

value<0.05). 493

Figure 2. Scatterplot of relative luminiscence units (RLU) ratios between original and 494

proficiency test as a function of time elapsed between the two tests. Results are expressed as 495

logarithmic values of the ratio of RLU and time in days. The correlation coefficient obtained for 496

linear regression was 0.003 (p‐value=0.219), and therefore no loss in signal can be attributed 497

to the time elapsed between consecutive tests. 498

Figure 3. Distribution of number of clinically discrepant results between the original 499

laboratory and the Proficiency test (circles) and fit to a Poisson distribution (grey line). 500

Accumulated probability for the detection of the corresponding number of discrepancies are 501

shown above each point. By bootstrapping with replacement of the all samples, 1,000 “virtual 502

labs” were generated and the number of discrepancies was calculated. With the same 503

stratifies structure of samples used in this study (RLU<0.5; 0.5≤RLU<1; 1≤RLU<10; and 504

RLU≥10), two different scenarios were tested: (A) 20 samples, five from each interval; and (B) 505

40 samples, ten from each interval. 506

Figure 4. Expected percentage of discrepancies along the RLU intervals 0.5‐0.99 (A) and 1‐507

9.99 (B). Mean percentage of discrepancies were obtained by bootstrapping. Grey area 508

encompasses 95% confidence interval. RLU: Relative luminiscence units. 509

510

511


.asm.org/

Dow

nloaded from

http://jcm.asm.org/


.asm.org/

Dow

nloaded from

http://jcm.asm.org/

Table 1. Data from the general population compared to data from the reference HPV laboratories used for the HPV proficiency testing

CATEGORIES OF RESULTS

SCREENING GENERAL POPULATION *

N (%)

PROFICIENCY TESTING** N (%)

HPV negative 36,600 (78) 452 (47.8)

HPV positive 10,349 (22) 494 (52.2)

Total 46,949 (100) 946 (100)

RLU<0.5 34,580 (73.7) 253 (26.7)

0.5≤RLU<1 2,020 (4.3) 199 (21)

1 ≤RLU<10 2,563 (5.5) 246 (26)

RLU≥10 7,786 (16.6) 248 (26.2)

* Data obtained from the HPV test performed to the screening general population in the period 2007-2009.

** Data obtained from the reference HPV laboratories for the HPV proficiency testing during the period 2008-2011.


.asm.org/

Dow

nloaded from

http://jcm.asm.org/

Table 2: Comparison of tested laboratories and HPV Proficiency testing Laboratory paired Hybrid Capture 2 (HC2) Test Results

Proficiency testing HC2

results

HC2 results of origin Total Analysis of agreement

NEGATIVE POSITIVE NEGATIVE 433 25 458 Kappa 0.91 POSITIVE 19 469 488 p-value 0.00 Total 452 494 946

RLU<0.5 0.5 ≤RLU<1 1 ≤RLU<10 RLU≥10 Total Analysis of agreement

RLU<0.5 243 81 11 1 336

0.5≤RLU<1 10 99 13 0 122 Kappa 0.79

1 ≤RLU<10 0 19 213 7 239 p-value 0.00 RLU≥10 0 0 9 240 249 Total 253 199 246 248 946


.asm.org/

Dow

nloaded from

http://jcm.asm.org/

Table 3. Comparison between the percentage of discrepancies between results at origin and results after proficiency testing, as a function of the relative luminescence units (RLU). The categories used in the current proficiency testing scheme and those proposed for the improved proficiency testing are shown.

Categories of samples Mean probability of discrepancies, % (95% CI) Proficiency testing

RLU/CO <0.5 0 (0) 0.5 ≤RLU/CO <1 9.59 (6.02-13.58) 1 ≤RLU/CO <10 9.87 (6.50-13.82) RLU/CO ≥10 0.41 (0-1.21)

Proposed proficiency testing RLU/CO <0.5 0 (0) 0.5 ≤RLU/CO <1 9.59 (6.02-13.58) 1 ≤RLU/CO <5 12.56 (7.64-18.47) RLU/CO ≥5 1.18 (0.30-2.87)


.asm.org/

Dow

nloaded from

http://jcm.asm.org/

downloaded from on april 21, 2020 by guest · 156 signal in both readouts and the time elapsed...

Documents