downloaded from on april 21, 2020 by guest · 156 signal in both readouts and the time elapsed...
TRANSCRIPT
1
INTER‐LABORATORY REPRODUCIBILITY AND PROFICIENCY TESTING WITHIN THE HUMAN 1
PAPILLOMAVIRUSES CERVICAL CANCER SCREENING PROGRAM IN CATALONIA, SPAIN 2
3
R. Ibáñez1, M. Felez‐Sánchez1,2, J.M. Godínez2, C. Guardiá3, E. Caballero4, R. Juve5, N. Combalia6, 4
B. Bellosillo7, D. Cuevas8, J. Moreno‐Crespi9, Ll. Pons10, J. Autonell11, C. Gutierrez12, J. Ordi13, S. 5
de Sanjosé1,2,14, I.G. Bravo1,2. 6
7
1 Infections and Cancer Unit, Cancer Epidemiology Research Program, Catalan Institute of 8 Oncology (ICO). L'Hospitalet de Llobregat, Barcelona, Spain 9
2 Infections and Cancer, Bellvitge Institute of Biomedical Research (IDIBELL) L'Hospitalet de 10 Llobregat, Barcelona, Spain 11
3 Primary Care Clinical Laboratory Barcelonés Nord y Vallés Oriental. Catalan Institute of 12 Health. 08911 Badalona (Spain) 13
4 Microbiology Department. University Hospital Vall d’Hebron. Catalan Institute of Health. 14 08035 Barcelona (Spain) 15
5 Primary Care Clinical Laboratory Bon Pastor. Catalan Institute of Health. 08030 Barcelona 16 (Spain) 17
6 Pathology Department. UDIAT‐CD, University Hospital Parc Tauli. Universitat Autònoma de 18 Barcelona. 08208 Barcelona (Spain) 19
7 Pathology Department. Hospital del Mar. 08003 Barcelona (Spain) 20
8 Pathology and Molecular Genetics Department. University Hospital Arnau de Vilanova. 21 Catalan Institute of Health. 25198 Lleida (Spain) 22
9 Pathology Department. University Hospital of Girona Dr. Josep Trueta. Infections and Cancer 23 Unit. Catalan Institute of Oncology. 17007 Girona (Spain) 24
10 Pathology Deparment, IISPV, URV, Hospital de Tortosa Verge de la Cinta (HTVC) . Catalan 25 Institute of Health. 43500 Tortosa (Spain) 26
11 Pathology Department. Hospital Consortium of Vic. 08500 Vic. Barcelona (Spain) 27
12 Clinical Laboratory ICS Tarragona, Molecular Biology Section. University Hospital of 28 Tarragona Joan XXIII. Catalan Institute of Health. 43005 Tarragona (Spain) 29
13 Pathology Department, CRESIB – Hospital Clínic, University Barcelona. 08036 Barcelona 30 (Spain) 31
14 CIBER Epidemiology and Public Health, Spain 32
JCM Accepts, published online ahead of print on 26 February 2014J. Clin. Microbiol. doi:10.1128/JCM.00100-14Copyright © 2014, American Society for Microbiology. All Rights Reserved.
on June 20, 2020 by guesthttp://jcm
.asm.org/
Dow
nloaded from
2
33
34
Corresponding author: 35
Ignacio G. Bravo 36
Infections and Cancer Laboratory, Cancer Epidemiology Research Programme, Catalan Institute 37
of Oncology 38
Avda. Gran Vía, 199‐203 ‐ 08908 L’Hospitalet de Llobregat, Barcelona; Spain. 39
Phone: +34 93 260 7123 /7812 40
E‐mail address: [email protected] 41
42
Keywords: 43
Human papillomaviruses, HPV, Hybrid capture, HC2, Reproducibility, Reliability, Cervical cancer 44
screening, Quality control. 45
Running title: 46
Proficiency testing within a cervical cancer screening program 47
48
49
on June 20, 2020 by guesthttp://jcm
.asm.org/
Dow
nloaded from
3
ABSTRACT 50
In Catalonia, a screening protocol for cervical cancer including Human Papillomaviruses (HPVs) 51
DNA testing using the Digene Hybrid Capture assay (HC2) was implemented in 2006. In order 52
to monitor inter‐laboratory reproducibility, a proficiency testing (PT) survey of the HPVs 53
samples was launched in 2008. The aim of this study was to explore the repeatability of HC2 54
performance. Participant laboratories provided annually twenty samples, five randomly 55
chosen samples from each of the following relative light units (RLU) intervals: <0.5; 0.5‐0.99; 1‐56
9.99; and ≥10. Kappa statistics was used to determine the agreement level between the 57
original and the PT readings. The nature and origin of the discrepant results were calculated by 58
bootstrapping. A total of 946 specimens were retested. Kappa values were 0.91 for 59
positive/negative categorical classification and 0.79 for the four RLU intervals studied. Sample 60
retesting yielded systematically lower RLU values than the original test (p<0.005), 61
independently of the time elapsed between both determinations (median 53 days), possibly 62
due to freeze‐thaw cycles. The probability for a sample to show clinically discrepant results 63
upon retesting was a function of the RLU value: samples with RLU in the 0.5‐5 interval showed 64
10.80% probability to yield discrepant results (95% CI:7.86‐14.33) compared to 0.85% 65
probability for samples outside this interval (95% CI:0.17‐1.69,). Globally, HC2 assay shows a 66
high inter‐laboratory concordance. We have identified differential confidence thresholds and 67
suggested the guidelines for inter‐laboratory PT in the future, as analytical quality assessment 68
of the HPVs DNA detection remains a central component of the screening program for cervical 69
cancer prevention. 70
71
on June 20, 2020 by guesthttp://jcm
.asm.org/
Dow
nloaded from
4
INTRODUCTION 72
Persistent infection with oncogenic Human Papillomaviruses (HPVs) is a necessary cause in the 73
development of invasive cervical cancer (CC) (8, 16, 24). CC screening based on cervical 74
cytology has been instrumental in decreasing incidence and mortality associated to CC in those 75
countries with high screening coverage rates (13). However, the infectious aetyology of this 76
disease lead to the suggestion that the detection of the DNA of HPVs responsible for cellular 77
transformation could provide a strong predictive marker for early detection of women at risk 78
(20). Worldwide studies have established that testing for the presence of HPVs DNA shows a 79
higher sensitivity than cytology for the detection of high grade cervical intraepithelial lesions 80
(2). Despite this demonstrated higher sensitivity, its application to CC screening programs is 81
still not widespread (1, 2, 17, 19, 21). 82
Currently one of the most widely used tests for the detection of HPVs DNA is the Digene 83
Hybrid Capture assay (HC2; Qiagen, Gaithersburg, MD), approved by the U.S. Food and Drug 84
Administration (FDA) in 2003 for its use in clinical settings. Large prospective cohort studies 85
and randomized controlled trials have proved this assay as a highly clinical sensitivity test (90‐86
95%) for the detection of high‐grade CIN (2). The HC2 procedure does not include a PCR 87
amplification step, and uses cocktail of probes designed to detect 13 HPVs classified as 88
carcinogenic (groups 1 and 2A) by the International Agency for Research on Cancer: HPV16, 18, 89
31, 33, 35, 39, 45, 51, 52, 56, 58, 59, and 68 (12). 90
In the last years several tests for detection of DNA or transcripts of HPVs have been developed 91
(7, 18). A number of these tests have been proposed to be applied for CC screening. Beyond 92
sensitivity and specificity, an additional critical factor before their introduction as a screening 93
technique is the need for high reproducibility under the diverse conditions that the different 94
clinical laboratories may face (15). The current guidelines for HPVs test propose an intra‐95
laboratory reproducibility and inter‐laboratory agreement with a lower confidence bound not 96
on June 20, 2020 by guesthttp://jcm
.asm.org/
Dow
nloaded from
5
less than 87% for both of them (15). Good reproducibility of blinded clinical samples in the 97
laboratories might further warrant results reliability (4). 98
In Catalonia, North‐East of Spain, a screening protocol including HPVs testing for selected 99
indications was implemented in 2006 (9). Women became eligible for the carcinogenic HPVs 100
testing if they belonged to any of the three following categories: i) inadequate screening 101
history, i.e. women aged over 40 years and with no history of cytology in the previous five 102
years, ii) incident diagnosis of atypical squamous cell of undetermined significance (ASC‐US); or 103
iii) first follow‐up visit after a surgical conization. During the 2006 to 2012 period, 116,970 tests 104
for carcinogenic HPVs have been performed within CC screening activities in the Public Health 105
Sector in Catalonia. In order to monitor inter‐laboratory reproducibility, a proficiency testing 106
(PT) survey of the HPVs samples was launched in 2008, covering the twelve laboratories 107
involved in the screening activities. 108
The aim of the present study was to evaluate inter‐laboratory reproducibility and error rate in 109
the HC2 performance in cervical specimens among the twelve reference laboratories 110
participating in the HPVs screening in Catalonia during the period 2008‐2011. Additionally, we 111
aimed to investigate the nature and origin of the discrepant results in order to provide 112
guidelines for the assessment of inter‐laboratory concordance thresholds in the future. 113
114
on June 20, 2020 by guesthttp://jcm
.asm.org/
Dow
nloaded from
6
MATERIALS AND METHODS 115
Design of external proficiency testing program 116
All tests (original and PT assays) were performed using HC2 with the “high‐risk” probe pool 117
only, following the manufacturer's instructions. According to the FDA‐approved guidelines, the 118
threshold for positivity is the HC2 response against a control sample containing 1.0 pg mL‐1 119
HPV DNA, roughly equivalent to 5,000 genomic copies. A sample was considered positive if it 120
rendered a relative light unit ratio (RLU) of ≥1 with respect to the positive control. 121
The twelve laboratories for HPVs screening in Catalonia (Hospital Hospital Universitari Dr. 122
Josep Trueta, Consoci hospitalari de Vic, Hospital Universitari Joan XXIII, Hospital del Mar, 123
Hospital Universitari de Bellvitge –Institut Català d’Oncologia (ICO), Hospital Clínic, Hospital 124
Universitari Vall d’Hebron, Hospital Universitari Verge de la Cinta, Hospital Universitari Arnau 125
de Vilanova, Consorci sanitari Parc Tauli, and the Laboratoris d’Atenció Primària Doctor Robert 126
and Bon Pastor) participated in the PT program. The laboratory at the ICO was designated by 127
the Health Department of the Government of Catalonia as the reference laboratory for PT 128
purposes. The overall project was approved by the ethical committee of the ICO/IDIBELL. All 129
information regarding the identification of patients was anonymized before analysis. 130
The PT was run annually and blindly. Between 2008 and 2011 each participating laboratory 131
delivered 20 samples to the reference laboratory. The PT of the reference laboratory itself was 132
performed by one of the twelve laboratories. For PT, the laboratories provided the residual 133
aliquot of the original samples. Eleven of the participant laboratories collected the samples in 134
STM medium. One laboratory served liquid‐cytology based screening and in this case, buffer 135
conversion previous to retesting was performed according to the manufacturer’s guidelines. 136
Samples were kept frozen at ‐20 ºC before delivery. As far as possible, laboratories were 137
requested to deliver samples for PT within the three months after the initial testing. 138
Nevertheless, the possible effect of time elapsed between the original and the PT test was also 139
on June 20, 2020 by guesthttp://jcm
.asm.org/
Dow
nloaded from
7
assessed. In order to ensure a uniform coverage distribution of positive and negative HC2 140
values, the laboratories were asked to provide annually five randomly chosen samples of each 141
of the following RLU intervals: <0.5; 0.5‐0.99; 1‐9.99; and ≥10. 142
Concordance analysis 143
HPV samples were collated and analysed to allow for inter‐laboratory comparisons. Paired HC2 144
test results were categorized as Original‐negative/PT‐negative, Original‐positive/PT‐negative, 145
Original‐negative/PT‐positive, and Original‐positive/PT‐positive based on the 1.0 RLU/PC 146
positive cutoff. 147
Cohen’s Kappa statistics was used to determine the level of agreement between the original 148
and the PT categories of results (positive/negative and RLU intervals). The Kappa statistic is a 149
measure of inter‐rater agreement, which tests as null hypothesis the inter‐rater agreement by 150
chance (14). Generally a Kappa score between 0.8 and 1 is considered as an excellent 151
agreement, values between 0.61 and 0.8 substantial agreement; between 0.41 and 0.6 152
moderate agreement, between 0.21 and 0.4 fair agreement; and between 0 and 0.2 slight 153
agreement (3). 154
Correlation between original and PT HC2 readouts as well as correlation between changes in 155
signal in both readouts and the time elapsed between both tests were assessed by linear 156
regression using SPSS Statistics17.0 (SPSS Statistics/IBM, Chicago IL, USA). Paired Wilcoxon & 157
Mann‐Whitney test implemented in R (http://www.r‐project.org/) was used to test the null 158
hypothesis of the median difference between RLU values in origin and at PT to be non‐159
different from zero. 160
For all statistical analyses, a p‐value below 0.05 was considered as a significant for rejection of 161
the null hypothesis. 162
Bootstrapping analysis 163
on June 20, 2020 by guesthttp://jcm
.asm.org/
Dow
nloaded from
8
In order to estimate the limits for the expected number of discrepancies between the results 164
at origin and at the PT, a bootstrapping analysis was done. One thousand “virtual labs” were 165
generated by bootstrapping among the pooled samples, allowing for replacement. All these 166
“virtual labs” were constructed with the same data structure that was requested for the PT, 167
i.e. 20 samples in total, five samples of each of the intervals above defined. We performed also 168
the same analyses in another hypothetical scenario, with 40 samples per “virtual laboratory”, 169
ten of each interval. The total numbers of discrepancies and concordances, as well as the 170
Cohen’s Kappa statistics were computed for each of the “virtual labs”. Distribution of 171
discrepancies was fitted to a Poisson model and the 95% confidence interval was calculated 172
based on the fitted model. 173
We explored further the differential repeatability of the technique for different values of the 174
RLU variable. Each of the intervals 0.5‐0.99 and 1.0‐10 was divided in 0.1 unit sub‐intervals. For 175
each sub‐interval, one thousand “virtual laboratories” were created by drawing random 176
samples with replacement from the original pooled samples. The size of each “virtual lab” was 177
equal to the original number of samples within the sub‐interval. The mean percentage of 178
discrepancies was calculated for each of the defined sub‐intervals. The 2.5% and 97.5% 179
quantiles of the 1000 bootstrap samples were used to define the lower bound and the upper 180
bound of 95% CI. In all cases bootstrapping analyses were performed, using in‐house Perl 181
scripts. 182
183
on June 20, 2020 by guesthttp://jcm
.asm.org/
Dow
nloaded from
9
RESULTS 184
During the period 2008‐2011, a total of 946 specimens were retested for PT within the 185
framework of the CC screening program in Catalonia, Spain. The results of these paired tests 186
were used to study inter‐laboratory reproducibility. 187
Comparison between origin‐PT paired tests showed high correlation 188
The distribution of the samples selected for PT was chosen to show a flat distribution of RLU 189
values around the clinical cutoff value. The precise distribution of the samples tested in the PT 190
and its comparison with the distribution of the general population is given in Table 1. In the 191
general population in which HC2 was used to assess the presence of DNA of oncogenic HPVs 192
(n=46,949), the distribution of positive and negative results for the period 2006‐2009 was 22% 193
and 78%, respectively. In contrast, in the PT samples this distribution was 52.2% and 47.8%, 194
respectively. Regarding the classification by categories of RLU values, in the HPV‐tested 195
general population the central intervals spanning 0.5‐9.99 comprised only the 9.8% of all 196
samples, while 47% of the PT samples were located in these intervals. These differences arise 197
from the focus itself of the PT program, aiming to evaluate the inter‐laboratory reliability of 198
the technique and not necessarily matching with the distribution of the RLU values in the 199
general population. 200
A comparison of RLU values of the HC2 test by the twelve tested laboratories and of the 201
repeated test is shown in Figure 1. The overall correlation coefficient for the original and the 202
PT values was 0.95 (p<0.05), ranging between 0.88 and 0.97 for each of the individual 203
laboratories (data not shown). Correlation restricted to samples interpreted as positive 204
(RLU≥1) in both origin and in the PT retest increased up to 0.97 (p<0.05) (data not shown). The 205
PT retests rendered systematically lower RLU values than the reported by origin laboratories, 206
with a pairwise median decrease of 9% in signal. This median difference was significantly 207
different from zero, as determined by a paired Wilcoxon & Mann‐Whitney test (p<0.005). 208
on June 20, 2020 by guesthttp://jcm
.asm.org/
Dow
nloaded from
10
Time elapsed between original and PT test did not influence reproducibility 209
The mean time elapsed between the first and the PT read was 90.5 days, with a median of 53.0 210
days, and a range from 8 to 885 days. To determine whether time elapsed between 211
consecutive assays on the same sample could influence the differences between the RLU 212
values reported by HC2, the correlation between the original and the PT readout values was 213
assessed (Figure 2). No significant correlation was found (p=0.83), indicating that the time 214
interval between consecutive tests did not result in a decrease of the RLU readout value. 215
All laboratories exhibited high agreement with the PT results 216
In order to assess the agreement between the two tests, Cohen’s Kappa was calculated for the 217
positive/negative categorical classification and RLU intervals (Table 2). Paired comparison 218
between original and PT analyses for positive/negative results rendered an overall excellent 219
agreement (Kappa=0.91). The Kappa values for the individual laboratories ranged between 220
0.84 and 0.97, being above 0.90 for eight (66.7%) of them (Table S1). The laboratory using 221
liquid cytology‐based samples did not behave different from the rest of laboratories tested, 222
which used STM as collecting medium. The total number of discordant results was 44 (4.6% of 223
the total samples). A slightly asymmetric distribution of discordant tests was observed, with 224
more discordant samples being positive in origin and negative in the PT than vice versa (25 vs 225
19 respectively), although this difference was not significant (Z‐Score=1.28, p‐value=0.20). The 226
vast majority of discrepant results (97.7%) were reported in samples with RLU values between 227
0.5 and 10.0 in original tests. Remarkably, 13 of them showed a large difference between the 228
paired tests: ten positive samples with an original RLU between 2.0 and 9.0 produced negative 229
results between 0.08 and 0.77 in the PT tests. In contrast, only two samples with original 230
negative result (RLU of 0.61 and 0.72) produced positive results (RLU of 2.97 and 2.98 231
respectively) in the PT retest. Only one sample was an outlier following Tukey’s criterion (23), 232
showing an original result of 38.57 and a 0.17 result in the PT test. 233
on June 20, 2020 by guesthttp://jcm
.asm.org/
Dow
nloaded from
11
The reproducibility of the test in each of the RLU intervals here defined showed an overall 234
Kappa value of 0.79 with values ranging from 0.70 to 0.84 for each of the individual 235
laboratories (Table S2). The observed concordance in the extreme intervals, i.e. RLU below 0.5 236
and RLU above 10, was higher than in the intervals around the cutoff, i.e. RLU between 0.5 and 237
10, with overall agreement values of 96.4% and 70.1%, respectively. 238
Exploring new alternatives in the design of the PT test 239
Finally, we explored different alternatives for future improvement of the PT design. First, we 240
aimed to provide a threshold for identifying significant deviations from the maximum expected 241
number of discrepancies, under the current sampling scheme for PT. Limits for a discrepancy 242
report were determined under two scenarios: 20 samples analysed for PT per laboratory with 243
five samples per each interval (i.e. the one currently implemented); and 40 samples per 244
laboratory with ten samples for each interval (which could be implemented depending on 245
budget). We generated, by bootstrapping with replacement from the original data, 1,000 data 246
sets that corresponded to “virtual laboratories”, using the same stratified structure of samples. 247
The results were fitted to a Poisson distribution and the accumulated probability for the 248
detection of the corresponding number of discrepant results was calculated. We aimed to find 249
the number of discrepant results that could be expected to appear by chance with a 250
probability below 5% (Figure 4). Using the stratified structure of samples evaluated in the 251
present study (n=20, five samples of each interval), the probability of having two or more 252
discrepancies was 7.5%, whereas the probability of three or more discrepancies was 1.7%. If 40 253
samples in total were collected, the probability of finding three or more discrepancies was 254
13.1%, while retrieving four or more discrepancies presented a probability of 4.7%. Thus, 255
under the present PT structure and retesting 20 samples per laboratory, the analyses should 256
be labelled as significantly discordant when PT yields three or more discrepant samples. If 40 257
on June 20, 2020 by guesthttp://jcm
.asm.org/
Dow
nloaded from
12
were to be retested using the same structure, the flag for significant discordance should be 258
raised for sets with four or more discrepant samples. 259
Finally we explored whether the results could be used to improve the PT design, by modifying 260
the boundaries for sample stratification. We focused on the readout values with lower 261
reproducibility, identified above by means of Kappa values as the area around the positivity 262
threshold. This gray area comprised two of our RLU categories: 0.5‐0.99 and 1‐9‐99. By using 263
bootstrapping analyses as described above, we explored the expected percentage of 264
discrepancies along both RLU categories (Figure 4). The results indicated that in the 0.5≤RLU<1 265
interval the expected number of samples showing discrepant results did not experiment 266
significant changes, remaining flat around 13.7% (95%CI: 5.5‐28.6%). In the 1≤RLU<10 interval 267
we observed a significant monotonic decrease (p‐value for trend<0.001) in the percentage of 268
samples showing discrepant results, starting from 25.9% (95%CI: 14.8‐38.9%) for values in the 269
1≤RLU<2 interval and stabilising after RLU≥5 interval to reach 9.7% (95%CI: 5.7‐13.4%). 270
Moreover, we have explored the differential probability for a sample to show discrepancy 271
upon retesting, as a function of the RLU values. The RLU interval 0.5‐5 concentrated the 272
highest probability for discrepancies: the probability for a sample in this interval to show 273
discrepancy was 10.80% (95% CI: 7.86‐14.33) compared to 0.85% (95% CI: 0.17‐1.69,) for 274
samples outside this interval. The accumulated probabilities for discrepancy for the different 275
intervals of the RLU variable are given in Table 3. These improved intervals will be 276
implemented recommended in the future for the PT activities when using HC2. We will thus 277
ask the participating laboratories to would be asked to provide annually with randomly chosen 278
samples of each of the following RLU intervals with the following structure: RLU<0.5, five 279
samples; 0.5≤RLU<1, ten samples; 1≤RLU<5, ten samples; and RLU≥5, five samples. In this case, 280
the probability of having three or more discrepancies by chance was will be 13.5%, whereas 281
the probability of three four or more discrepancies by chance was will be 4.8%. 282
on June 20, 2020 by guesthttp://jcm
.asm.org/
Dow
nloaded from
13
283
DISCUSSION 284
An inter‐laboratory PT was successfully implemented for the detection of high‐risk HPV types, 285
within the CC screening activities in the Public Health System in Catalonia. It was considered 286
instrumental to ensure the accuracy of the results obtained by HPV testing. A total of 946 287
samples were analysed for PT during the 2008‐2011 period and a total of 44 (4.6%) 288
discrepancies were found. The present PT study was in agreement with the guidelines 289
proposed by Meijer and co‐workers for the validation of high‐risk HPV tests for primary CC 290
screening (15). These guidelines proposed that inter‐laboratory agreement should be 291
determined by evaluation of at least 500 samples, with a 30% of the samples having tested 292
positive in a reference laboratory using a clinically validated assay, and reaching an agreement 293
with a lower confidence bound not less than 87%. 294
It is very important to note that the distribution of the RLU values used for the PT does not 295
follow the one obtained from the general population participating in the screening algorithms 296
in which HC2 was used to assess the presence of DNA of oncogenic HPVs (Table 1). Differences 297
between distributions reflect the analytical nature of our PT program, as we were interested in 298
assessing the overall performance of the laboratories running the HC2 technique. We chose 299
therefore to design a balanced PT sample distribution exploring equally the full range of the 300
readout variable, as otherwise the central values, close to cutoff and more prone to 301
discrepancy, would have been undersampled. 302
Paired test demonstrated an almost excellent inter‐laboratory agreement in all twelve 303
participating laboratories, both for positive/negative agreement (Kappa= 0.91) as well as for 304
the four RLU categories (Kappa= 0.79). In our study, RLU values were slightly but significantly 305
lower in the PT retesting compared to those obtained in the original laboratory, with a median 306
decrease of 9.0% from the original RLU value. One possible explanation for this trend could 307
on June 20, 2020 by guesthttp://jcm
.asm.org/
Dow
nloaded from
14
have been specimen degradation between the two tests, as it has been reported for HC2 to a 308
certain extent (5, 6). We tested the hypothesis of the influence of time elapsed between the 309
two consecutive tests to account for this difference, but we did not find a significant 310
association between the decrease in the RLU readout variable and the time between the two 311
HC2 tests. An alternative explanation could be the impact of consecutive freeze‐thaw cycles, as 312
samples were denatured, analysed in origin, frozen, sent, and thawed for PT analysis, 313
especially because the HC2 test does not include an amplification step. Finally, although in 314
absolute terms more samples were found positive in origin and tested negative in PT than the 315
inverse case, the difference between both numbers was not statistically significant. Overall, 316
the small decrease in signal during retesting entails little or none clinical significance because 317
HPV retesting is not part of routine clinical practice (6). 318
Several reports have addressed the use of HC2 in CC screening and its performance compared 319
to other methods for detecting HPVs genetic material (10, 11, 18, 22). However, few data on 320
the reproducibility of HC2 are available. In the ALTS study, inter‐laboratory reproducibility 321
values across four laboratories were found to be similar to those herein communicated 322
(Kappa=0.84) (6). The inter‐laboratory reproducibility for HC2 in seven laboratories 323
participating in an Italian clinical trial was also high (4). In both cases RLU values around the 324
cutoff (RLU=1) were more prone to show discrepant results, as observed in our study. Thus, 325
the random variation of RLU values was more influential on the classification as positive or 326
negative for samples rendering values close to the cutoff (4). One fundamental contribution of 327
the present study was the detailed description of the differential reproducibility of the HC2 328
technique for different intervals of the RLU. We conclude therefore that, in our settings, the 329
0.5≤RLU<5 is the interval in which PT paired values of the readout RLU were more likely to 330
show discrepancy. 331
on June 20, 2020 by guesthttp://jcm
.asm.org/
Dow
nloaded from
15
A central aim of the initial PT assessment after the first four years of the CC screening activities 332
was to provide guidelines for a better PT control itself in the future. As such, we have 333
estimated the number of samples that should test discrepant after retesting, to be considered 334
as a “significant discordant”. In our case and with our data structure, this critical value is three 335
or more individual discrepant samples in the sets of 20 samples to be retested. Two factors in 336
our PT design are susceptible of changes for improvement: the number of retested samples 337
and its stratified structure. Increasing the number of retested samples would allow us to be 338
more accurate at finding a number of discrepancies significantly higher than expected by 339
chance. This has been exemplified in the simulation described above including 40 instead of 20 340
samples per laboratory per year. Obviously, this change also doubles the cost of the PT 341
assessment and cost‐benefit equilibrium must be found. Regarding sampling structure, the aim 342
of the PT program is analytical, and not clinical, and we have thus chosen to equally monitor 343
throughout the dynamic range of the readout variable, instead of randomly sampling from the 344
entire population. This strategy allows placing a special emphasis on the region in which the 345
technique is less reproducible. In order to focus in the sensitive area in which samples were 346
more likely to be discordant (0.5‐5) we propose to apply the following sample structure for 347
future PT assessment: five samples with RLU<0.5; ten samples with 0.5≤RLU<1; ten samples 348
with 1≤RLU<5; and five samples with RLU≥5. 349
Conclusions 350
The results of the PT assessment for the CC screening program in Catalonia have shown that 351
the HC2 assay has a high inter‐laboratory concordance. The use of a common, standardized 352
protocol with sharp anti‐contamination measures for samples and processing fluids in the 353
twelve laboratories involved has been instrumental to achieve these excellent results. We have 354
also detected a significant decrease in HC2 signal after retesting that cannot be linked to the 355
time elapsed between consecutive tests, and that may be attributable to the additional freeze‐356
on June 20, 2020 by guesthttp://jcm
.asm.org/
Dow
nloaded from
16
thaw cycle. We have additionally explored in depth the differential repeatability along the 357
dynamic response of the readout variable and have demonstrated that most discrepant results 358
accumulate in the 0.5‐5 RLU interval, with samples in this interval being 12.7 times more likely 359
to show discrepant results than samples outside this interval. With this information we have 360
further defined future recommendations and confidence thresholds for inter‐laboratory PT in 361
the future when using HC2 as screening test, as analytical quality assessment of the HPVs DNA 362
detection remains a central component of the screening program for CC prevention. 363
364
on June 20, 2020 by guesthttp://jcm
.asm.org/
Dow
nloaded from
17
Authors’ contributions 365
Conceived the analyses: RI, IGB. Coordinated the screening activities: RI, SdS. Analysed the 366
data: RI, MFS, IGB. Performed experiments: JMG, CG, EC, RJ, NC, BB, CD, JMC, LlP, JA, CG, JO. 367
Drafted the manuscript: RI, MFS, IGB. All authors read and approved the final manuscript. 368
369
Acknowledgments 370
This study has been partially supported by the Pla Director d’Oncologia of the Health 371
Department in Catalonia and grants from the Instituto de Salud Carlos III (Spanish 372
Government, grants RCESP C03/09, RTICESP C03/10, RTIC RD06/0020/0095, RD12/0036/0056, 373
and CIBERESP) and from the Agència de Gestió d’Ajuts Universitaris i de Recerca (Catalan 374
Government, grants AGAUR 2005SGR 00695 and 2009SGR126). 375
SdS has received occasional travel grants from Merck and Qiagen and holds a restricted grant 376
from Qiagen and from Merck not related to the study presented. None of the funding bodies 377
had any role in study design; in collection, analysis, and data interpretation; in the writing of 378
the manuscript; and in the decision to submit the manuscript for publication. 379
The authors want to thank Prof. Attila Lorincz for his comments and remarks to an initial 380
version of the manuscript. We would like to thank the following persons for their collaboration 381
in this study: Marylène Lejeune and Carlos López, Cristina Callau from Molecular Biology and 382
Research Section, IISPV, URV, Hospital de Tortosa Verge de la Cinta (HTVC); Anna Ferran 383
Gibert, Mercé Rey Ruhi, Ruth Orellana Fernandez and Irmgard Costa Tranchel from UDIAT‐CD, 384
Hospital Universitari Parc Taulí; Roser Esteve from Barcelona Hospital Clínic; Montserrat Sardà 385
Roca and Àngels Verdaguer Autonell from Consorci Hospitalari de Vic; Jo Ellen Klaustermeier, 386
Vanesa Camón, Ana Esteban, Yolanda Florencia, Isabel Català and Núria Baixeras from Hospital 387
universitari de Bellvitge – Institut Català d’Oncologia; José Maria Navarro Olivella, Carme 388
on June 20, 2020 by guesthttp://jcm
.asm.org/
Dow
nloaded from
18
Perich Alsina, and Belen Viñado from laboratori Clínic Bon Pastor; Cristina Antúnes Vitales, 389
Glòria Oliveras Serrat, Lluís Bernadó Turmo and Pilar Castro Marqueta from Institut Català 390
d'Oncologia (Girona); Ana Velasco Sánchez, Anna Serrate López, Marta Romero Fernández, 391
Azahar Romero Jiménez and Nuria Llecha from Hospital Universitari Arnau de Vilanova; 392
Vanessa Guerrero Hormiga and Ignacia Ramos Mora from Hospital Universitari Joan XXIII; Rosa 393
Tenllado Gimenez and Adoración Velilla from Laboratori Barcelonès Nord i Vallès Oriental; and 394
Belén Lloveras, Francesc Alameda and Merced Muset from Hospital del Mar. 395
396
on June 20, 2020 by guesthttp://jcm
.asm.org/
Dow
nloaded from
19
REFERENCES 397
398
1. Anttila, A., L. Kotaniemi‐Talonen, M. Leinonen, M. Hakama, P. Laurila, J. Tarkkanen, 399 N. Malila, and P. Nieminen. 2010. Rate of cervical cancer, severe intraepithelial 400 neoplasia, and adenocarcinoma in situ in primary HPV DNA screening with cytology 401 triage: randomised study within organised screening programme. BMJ 340:c1804. 402
2. Arbyn, M., G. Ronco, A. Anttila, C. J. Meijer, M. Poljak, G. Ogilvie, G. Koliopoulos, P. 403 Naucler, R. Sankaranarayanan, and J. Peto. 2012. Evidence regarding human 404 papillomavirus testing in secondary prevention of cervical cancer. Vaccine 30 Suppl 405 5:F88‐99. 406
3. Brennan, P., and A. Silman. 1992. Statistical methods for assessing observer variability 407 in clinical measures. BMJ 304:1491‐4. 408
4. Carozzi, F. M., A. Del Mistro, M. Confortini, C. Sani, D. Puliti, R. Trevisan, L. De Marco, 409 A. G. Tos, S. Girlando, P. D. Palma, A. Pellegrini, M. L. Schiboni, P. Crucitti, P. Pierotti, 410 A. Vignato, and G. Ronco. 2005. Reproducibility of HPV DNA Testing by Hybrid Capture 411 2 in a Screening Setting. Am J Clin Pathol 124:716‐21. 412
5. Castle, P. E., A. Hildesheim, M. Schiffman, C. A. Gaydos, A. Cullen, R. Herrero, M. C. 413 Bratti, and E. Freer. 2003. Stability of archived liquid‐based cytologic specimens. 414 Cancer 99:320‐2. 415
6. Castle, P. E., C. M. Wheeler, D. Solomon, M. Schiffman, and C. L. Peyton. 2004. 416 Interlaboratory reliability of Hybrid Capture 2. Am J Clin Pathol 122:238‐45. 417
7. Cuzick, J., C. Bergeron, M. von Knebel Doeberitz, P. Gravitt, J. Jeronimo, A. T. Lorincz, 418 J. L. M. M. C, R. Sankaranarayanan, J. F. S. P, and A. Szarewski. 2012. New 419 technologies and procedures for cervical cancer screening. Vaccine 30 Suppl 5:F107‐420 16. 421
8. de Sanjose, S., W. G. Quint, L. Alemany, D. T. Geraets, J. E. Klaustermeier, B. Lloveras, 422 S. Tous, A. Felix, L. E. Bravo, H. R. Shin, C. S. Vallejos, P. A. de Ruiz, M. A. Lima, N. 423 Guimera, O. Clavero, M. Alejo, A. Llombart‐Bosch, C. Cheng‐Yang, S. A. Tatti, E. 424 Kasamatsu, E. Iljazovic, M. Odida, R. Prado, M. Seoud, M. Grce, A. Usubutun, A. Jain, 425 G. A. Suarez, L. E. Lombardi, A. Banjo, C. Menendez, E. J. Domingo, J. Velasco, A. 426 Nessa, S. C. Chichareon, Y. L. Qiao, E. Lerma, S. M. Garland, T. Sasagawa, A. Ferrera, 427 D. Hammouda, L. Mariani, A. Pelayo, I. Steiner, E. Oliva, C. J. Meijer, W. F. Al‐Jassar, 428 E. Cruz, T. C. Wright, A. Puras, C. L. Llave, M. Tzardi, T. Agorastos, V. Garcia‐Barriola, 429 C. Clavel, J. Ordi, M. Andujar, X. Castellsague, G. I. Sanchez, A. M. Nowakowski, J. 430 Bornstein, N. Munoz, and F. X. Bosch. 2010. Human papillomavirus genotype 431 attribution in invasive cervical cancer: a retrospective cross‐sectional worldwide study. 432 Lancet Oncol 11:1048‐1056. 433
9. Departament de Salut. Direcció General de Planificació i Avaluació. Generalitat de 434 Catalunya. 2007. Protocol de les Activitats per al Cribratge del Càncer de Coll Uterí a 435 l’Atenció Primària., Barcelona. 436
10. Gravitt, P. E., M. Schiffman, D. Solomon, C. M. Wheeler, and P. E. Castle. 2008. A 437 comparison of linear array and hybrid capture 2 for detection of carcinogenic human 438 papillomavirus and cervical precancer in ASCUS‐LSIL triage study. Cancer Epidemiol 439 Biomarkers Prev 17:1248‐54. 440
11. Hesselink, A. T., N. W. Bulkmans, J. Berkhof, A. T. Lorincz, C. J. Meijer, and P. J. 441 Snijders. 2006. Cross‐sectional comparison of an automated hybrid capture 2 assay 442 and the consensus GP5+/6+ PCR method in a population‐based cervical screening 443 program. J Clin Microbiol 44:3680‐5. 444
on June 20, 2020 by guesthttp://jcm
.asm.org/
Dow
nloaded from
20
12. IARC. 2012. IARC Monographs on the Evaluation of Carcinogenic Risks to Humans. 445 Volume 100B: A review on human carcinogens. International Agency for Research on 446 Cancer., Lyon, France. 447
13. IARC. 2005. IARC Working Group on the Evaluation of Cancer‐Preventive Strategies. In 448 I. Press (ed.), Cervix Cancer Screening, vol. 10, Lyon, France. 449
14. Landis, J. R., and G. G. Koch. 1977. The measurement of observer agreement for 450 categorical data. Biometrics 33:159‐74. 451
15. Meijer, C. J., H. Berkhof, D. A. Heideman, A. T. Hesselink, and P. J. Snijders. 2009. 452 Validation of high‐risk HPV tests for primary cervical screening. J Clin Virol 46 Suppl 453 3:S1‐4. 454
16. Muñoz, N., F. X. Bosch, S. de Sanjosé, R. Herrero, X. Castellsague, K. V. Shah, P. J. F. 455 Snijders, and C. J. L. M. Meijer. 2003. Epidemiologic classification of human 456 papillomavirus types associated with cervical cancer. N Engl J Med 348:518‐527. 457
17. Naucler, P., W. Ryd, S. Tornberg, A. Strand, G. Wadell, K. Elfgren, T. Radberg, B. 458 Strander, B. Johansson, O. Forslund, B. G. Hansson, E. Rylander, and J. Dillner. 2007. 459 Human papillomavirus and Papanicolaou tests to screen for cervical cancer. N Engl J 460 Med 357:1589‐97. 461
18. Poljak, M., J. Cuzick, B. J. Kocjan, T. Iftner, J. Dillner, and M. Arbyn. 2012. Nucleic acid 462 tests for the detection of alpha human papillomaviruses. Vaccine 30 Suppl 5:F100‐6. 463
19. Rijkaart, D. C., J. Berkhof, L. Rozendaal, F. J. van Kemenade, N. W. Bulkmans, D. A. 464 Heideman, G. G. Kenter, J. Cuzick, P. J. Snijders, and C. J. Meijer. 2012. Human 465 papillomavirus testing for the detection of high‐grade cervical intraepithelial neoplasia 466 and cancer: final results of the POBASCAM randomised controlled trial. Lancet Oncol 467 13:78‐88. 468
20. Ronco, G., J. Dillner, K. M. Elfstrom, S. Tunesi, P. J. Snijders, M. Arbyn, H. Kitchener, 469 N. Segnan, C. Gilham, P. Giorgi‐Rossi, J. Berkhof, J. Peto, and C. J. Meijer. 2013. 470 Efficacy of HPV‐based screening for prevention of invasive cervical cancer: follow‐up of 471 four European randomised controlled trials. Lancet. 472
21. Ronco, G., P. Giorgi‐Rossi, F. Carozzi, M. Confortini, P. Dalla Palma, A. Del Mistro, B. 473 Ghiringhello, S. Girlando, A. Gillio‐Tos, L. De Marco, C. Naldoni, P. Pierotti, R. Rizzolo, 474 P. Schincaglia, M. Zorzi, M. Zappa, N. Segnan, and J. Cuzick. 2010. Efficacy of human 475 papillomavirus testing for the detection of invasive cervical cancers and cervical 476 intraepithelial neoplasia: a randomised controlled trial. Lancet Oncol 11:249‐57. 477
22. Sankaranarayanan, R., L. Gaffikin, M. Jacob, J. Sellors, and S. Robles. 2005. A critical 478 assessment of screening methods for cervical neoplasia. Int J Gynaecol Obstet 89 479 Suppl 2:S4‐S12. 480
23. Tukey, J. W. 1977. Exploratory data analysis. Addison‐Wesley, Reading. 481 24. Walboomers, J. M. M., M. V. Jacobs, M. M. Manos, F. X. Bosch, J. A. Kummer, K. V. 482
Shah, P. J. F. Snijders, C. J. L. M. Meifer, and N. Munoz. 1999. Human papillomavirus is 483 a necessary cause of invasive cervical cancer worldwide. J Pathol 189:12‐19. 484
485 486
487
on June 20, 2020 by guesthttp://jcm
.asm.org/
Dow
nloaded from
21
Figure legends 488
489
Figure 1. Scatterplot of the RLU values for the results of the HC2 test at the origin laboratory 490
and for the paired proficiency test. Results are expressed as logarithmic values of relative 491
luminiscence units (RLU) reads. The correlation coefficient for linear regression was 0.95 (p‐492
value<0.05). 493
Figure 2. Scatterplot of relative luminiscence units (RLU) ratios between original and 494
proficiency test as a function of time elapsed between the two tests. Results are expressed as 495
logarithmic values of the ratio of RLU and time in days. The correlation coefficient obtained for 496
linear regression was 0.003 (p‐value=0.219), and therefore no loss in signal can be attributed 497
to the time elapsed between consecutive tests. 498
Figure 3. Distribution of number of clinically discrepant results between the original 499
laboratory and the Proficiency test (circles) and fit to a Poisson distribution (grey line). 500
Accumulated probability for the detection of the corresponding number of discrepancies are 501
shown above each point. By bootstrapping with replacement of the all samples, 1,000 “virtual 502
labs” were generated and the number of discrepancies was calculated. With the same 503
stratifies structure of samples used in this study (RLU<0.5; 0.5≤RLU<1; 1≤RLU<10; and 504
RLU≥10), two different scenarios were tested: (A) 20 samples, five from each interval; and (B) 505
40 samples, ten from each interval. 506
Figure 4. Expected percentage of discrepancies along the RLU intervals 0.5‐0.99 (A) and 1‐507
9.99 (B). Mean percentage of discrepancies were obtained by bootstrapping. Grey area 508
encompasses 95% confidence interval. RLU: Relative luminiscence units. 509
510
511
on June 20, 2020 by guesthttp://jcm
.asm.org/
Dow
nloaded from
Table 1. Data from the general population compared to data from the reference HPV laboratories used for the HPV proficiency testing
CATEGORIES OF RESULTS
SCREENING GENERAL POPULATION *
N (%)
PROFICIENCY TESTING** N (%)
HPV negative 36,600 (78) 452 (47.8)
HPV positive 10,349 (22) 494 (52.2)
Total 46,949 (100) 946 (100)
RLU<0.5 34,580 (73.7) 253 (26.7)
0.5≤RLU<1 2,020 (4.3) 199 (21)
1 ≤RLU<10 2,563 (5.5) 246 (26)
RLU≥10 7,786 (16.6) 248 (26.2)
* Data obtained from the HPV test performed to the screening general population in the period 2007-2009.
** Data obtained from the reference HPV laboratories for the HPV proficiency testing during the period 2008-2011.
on June 20, 2020 by guesthttp://jcm
.asm.org/
Dow
nloaded from
Table 2: Comparison of tested laboratories and HPV Proficiency testing Laboratory paired Hybrid Capture 2 (HC2) Test Results
Proficiency testing HC2
results
HC2 results of origin Total Analysis of agreement
NEGATIVE POSITIVE NEGATIVE 433 25 458 Kappa 0.91 POSITIVE 19 469 488 p-value 0.00 Total 452 494 946
RLU<0.5 0.5 ≤RLU<1 1 ≤RLU<10 RLU≥10 Total Analysis of agreement
RLU<0.5 243 81 11 1 336
0.5≤RLU<1 10 99 13 0 122 Kappa 0.79
1 ≤RLU<10 0 19 213 7 239 p-value 0.00 RLU≥10 0 0 9 240 249 Total 253 199 246 248 946
on June 20, 2020 by guesthttp://jcm
.asm.org/
Dow
nloaded from
Table 3. Comparison between the percentage of discrepancies between results at origin and results after proficiency testing, as a function of the relative luminescence units (RLU). The categories used in the current proficiency testing scheme and those proposed for the improved proficiency testing are shown.
Categories of samples Mean probability of discrepancies, % (95% CI) Proficiency testing
RLU/CO <0.5 0 (0) 0.5 ≤RLU/CO <1 9.59 (6.02-13.58) 1 ≤RLU/CO <10 9.87 (6.50-13.82) RLU/CO ≥10 0.41 (0-1.21)
Proposed proficiency testing RLU/CO <0.5 0 (0) 0.5 ≤RLU/CO <1 9.59 (6.02-13.58) 1 ≤RLU/CO <5 12.56 (7.64-18.47) RLU/CO ≥5 1.18 (0.30-2.87)
on June 20, 2020 by guesthttp://jcm
.asm.org/
Dow
nloaded from