crops and soils research paper statistical procedures for ...vahl/kang_vahl_gmo_tests_2016.pdf ·...

21
CROPS AND SOILS RESEARCH PAPER Statistical procedures for testing hypotheses of equivalence in the safety evaluation of a genetically modified crop Q. KANG 1 AND C. I. VAHL 2 * 1 Independent Statistical Consultant, Manhattan, Kansas 66503, USA 2 Department of Statistics, Kansas State University, Manhattan, Kansas 66506, USA (Received 6 July 2015; revised 9 November 2015; accepted 15 December 2015; first published online 22 January 2016) SUMMARY Safety evaluation of a genetically modified crop entails assessing its equivalence to conventional crops under multi-site randomized block field designs. Despite mounting petitions for regulatory approval, there lack a scien- tifically sound and powerful statistical method for establishing equivalence. The current paper develops and vali- dates two procedures for testing a recently identified class of equivalence uniquely suited to crop safety. One procedure employs the modified large sample (MLS) method; the other is based on generalized pivotal quantities (GPQs). Because both methods were originally created under balanced designs, common issues associated with incomplete and unbalanced field designs were addressed by first identifying unfulfilled theoretical assumptions and then replacing them with user-friendly approximations. Simulation indicated that the MLS procedure could be very conservative in many occasions irrespective of the balance of the design; the GPQ procedure was mildly liberal with its type I error rate near the nominal level when the design is balanced. Additional pros and cons of these two procedures are also discussed. Their utility is demonstrated in a case study using summary statistics derived from a real-world dataset. INTRODUCTION Safety evaluation of a genetically modified (GM) crop entails establishing its substantial equivalence to con- ventional non-GM food crops with respect to key agronomic, phenotypic and compositional endpoints (OECD 1993; Codex Alimentarius Commission 2009). The European Food Safety Authority (EFSA) mandates a comparative analysis containing two types of statistical tests (EFSA 2010). For each end- point, a difference test is used to detect the potential change in the genetically modified organism (GMO) from its non-GM counterpart (control) whereas an equivalence test is used to assess GMOs similarity to commercial non-GM reference varieties with a history of safe use. Studies on GMO safety include multiple sites and within each site, a GMO is planted together with its control and several non- GM reference varieties according to a randomized complete block design. Features of the design are determined by the arrangement of references to sites. Table 1 reports designs implemented in recent peti- tions by ag-biotech producers to the US Department of Agriculture (USDA): Design I is complete and balanced; Design II is incomplete but balanced; Designs IIIVI are incomplete and, following the descriptions in Section 1.2 of Searle (1987), are planned unbalanced. Compositional data usually undergoes a natural logarithm transformation before the mixed model ana- lysis. Let n R be the total number of references in the study. Let n S and n B(S) be the number of sites and number of blocks per site. Denote Y ijl as the log- transformed response with respect to a given endpoint from genotype i in block l at site j, i = 1, , n R + 2, j = 1, , n S , l = 1, , n B(S) . Note that Y ijl is observed only when genotype i is present at site j; otherwise, it is un- available. The EFSA (2010) and Van der Voet et al. (2011) conducted the comparative analysis under the following linear mixed model which accommodates * To whom all correspondence should be addressed. Email: vahl@ ksu.edu Journal of Agricultural Science (2016), 154, 13921412. © Cambridge University Press 2016 doi:10.1017/S0021859615001367 http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0021859615001367 Downloaded from http:/www.cambridge.org/core. Kansas State University Libraries, on 28 Oct 2016 at 20:04:55, subject to the Cambridge Core terms of use, available at

Upload: others

Post on 08-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

  • CROPS AND SOILS RESEARCH PAPER

    Statistical procedures for testing hypotheses of equivalence in thesafety evaluation of a genetically modified crop

    Q. KANG1 AND C. I. VAHL2*

    1 Independent Statistical Consultant, Manhattan, Kansas 66503, USA2Department of Statistics, Kansas State University, Manhattan, Kansas 66506, USA

    (Received 6 July 2015; revised 9 November 2015; accepted 15 December 2015;first published online 22 January 2016)

    SUMMARY

    Safety evaluation of a genetically modified crop entails assessing its equivalence to conventional crops undermulti-site randomized block field designs. Despite mounting petitions for regulatory approval, there lack a scien-tifically sound and powerful statistical method for establishing equivalence. The current paper develops and vali-dates two procedures for testing a recently identified class of equivalence uniquely suited to crop safety. Oneprocedure employs the modified large sample (MLS) method; the other is based on generalized pivotal quantities(GPQs). Because both methods were originally created under balanced designs, common issues associated withincomplete and unbalanced field designs were addressed by first identifying unfulfilled theoretical assumptionsand then replacing them with user-friendly approximations. Simulation indicated that the MLS procedure couldbe very conservative in many occasions irrespective of the balance of the design; the GPQ procedure was mildlyliberal with its type I error rate near the nominal level when the design is balanced. Additional pros and cons ofthese two procedures are also discussed. Their utility is demonstrated in a case study using summary statisticsderived from a real-world dataset.

    INTRODUCTION

    Safety evaluation of a genetically modified (GM) cropentails establishing its substantial equivalence to con-ventional non-GM food crops with respect to keyagronomic, phenotypic and compositional endpoints(OECD 1993; Codex Alimentarius Commission2009). The European Food Safety Authority (EFSA)mandates a comparative analysis containing twotypes of statistical tests (EFSA 2010). For each end-point, a difference test is used to detect the potentialchange in the genetically modified organism (GMO)from its non-GM counterpart (control) whereas anequivalence test is used to assess GMO’s similarityto commercial non-GM reference varieties with ahistory of safe use. Studies on GMO safety includemultiple sites and within each site, a GMO isplanted together with its control and several non-GM reference varieties according to a randomized

    complete block design. Features of the design aredetermined by the arrangement of references to sites.Table 1 reports designs implemented in recent peti-tions by ag-biotech producers to the US Departmentof Agriculture (USDA): Design I is complete andbalanced; Design II is incomplete but balanced;Designs III–VI are incomplete and, following thedescriptions in Section 1.2 of Searle (1987), are‘planned unbalanced’.

    Compositional data usually undergoes a naturallogarithm transformation before the mixed model ana-lysis. Let nR be the total number of references in thestudy. Let nS and nB(S) be the number of sites andnumber of blocks per site. Denote Yijl as the log-transformed response with respect to a given endpointfrom genotype i in block l at site j, i = 1,…, nR + 2, j = 1,…, nS, l = 1, …, nB(S). Note that Yijl is observed onlywhen genotype i is present at site j; otherwise, it is un-available. The EFSA (2010) and Van der Voet et al.(2011) conducted the comparative analysis under thefollowing linear mixed model which accommodates

    * To whom all correspondence should be addressed. Email: [email protected]

    Journal of Agricultural Science (2016), 154, 1392–1412. © Cambridge University Press 2016doi:10.1017/S0021859615001367

    http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0021859615001367Downloaded from http:/www.cambridge.org/core. Kansas State University Libraries, on 28 Oct 2016 at 20:04:55, subject to the Cambridge Core terms of use, available at

    mailto:[email protected]:[email protected]://crossmark.crossref.org/dialog/?doi=10.1017/S0021859615001367&domain=pdfhttp:/www.cambridge.org/core/termshttp://dx.doi.org/10.1017/S0021859615001367http:/www.cambridge.org/core

  • the effect of the nR + 2 genotypes in a hierarchicalstructure with both fixed and random levels.

    Yijl ¼μR þ Sj þ BðSÞlð jÞ þ Ri þ Eijl i ¼ 1; :::; nRμT þ Sj þ BðSÞlð jÞ þ Eijl i ¼ nR þ 1μC þ Sj þ BðSÞlð jÞ þ Eijl i ¼ nR þ 2

    8><>:

    ð1ÞFixed effects of the three genotype groups, namely, ref-erence, GMO (test product) and control, are repre-sented, respectively, by μR, μT and μC. Becauseindividual producers select their own assortment of ref-erence varieties for each study, the deviation of themean of reference genotype i from its group mean μRis modelled by the random effect Ri which is normallydistributed with mean zero and variance σ2R, i.e.Ri ∼ Nð0; σ2RÞ. Random effects of site and blocknested within site are represented, respectively, bySj ∼ Nð0; σ2SÞ and BðSÞlðjÞ ∼ Nð0; σ2BðSÞÞ. The errorterm Eijl ∼ Nð0; σ2EÞ combines random effects due toplot, measurement uncertainty and other sources ofunexplained variation. All random terms are jointly in-dependent. The EFSA (2010) described the compara-tive analysis as ‘mainly concerned with studying theaverage difference and the average equivalence oversites’ and chose to ‘set a lower and upper equivalencelimit based on the variability observed among the com-mercial varieties’. Its current guidance tests the averageequivalence with data-driven equivalence limits(ELs), i.e.

    H0 : μT � μR ⩽ �P̂L or μT � μR ⩾ P̂L v:H1 : �P̂L< μT � μR < P̂L;

    ð2Þ

    where (�P̂L, P̂L) corresponds to a version of the 95%prediction interval of Ri estimated concurrently underModel (1). Upon treating these two prediction limitsas known 2·5th and 97·5th percentiles of Nð0; σ2RÞ,the traditional two one-sided tests (Schuirmann 1987)were employed. Unfortunately, judging μT− μRagainst estimated ELs is fraught with problems, fore-most of which is that statistical hypotheses shouldalways be defined in terms of model parametersrather than their estimators; otherwise, the type I

    error rate is intractable. In addition, P̂L decreases assample sizes increase. This unfairly imposes smallerand, hence, more stringent ELs on studies with largersample sizes. A thorough discussion on the pitfalls ofHypotheses (2) can be found in Appendix 3 of Vahl& Kang (2015).

    The average equivalence specified by EFSA (2010)was expressed explicitly in terms of model parametersby Kang & Vahl (2014) as

    H0 : μT � μR ⩽ �z0�975σR or μT � μR ⩾ z0�975σR v:H1 : �z0�975σR < μT � μR < z0�975σR

    ð3Þwhere z0·75 is the 97·5th percentile of a standardnormal distribution (Note: z0·025 =−z0·975).Hypotheses (3) assess the mean difference of the testand reference products based on σ2R, the variabilityof the super-distribution of reference genotypemeans. Vahl & Kang (2015) referred to it as addressing‘super-equivalence’. Tuning the EL according to σR isreasonable when σ2R dominates the reference naturalvariation. For endpoints with a small observed value

    of σ2R, EFSA (2010) recognized that their P̂L s lack bio-logical relevance and excluded these endpoints fromequivalence testing. With similar concern, Vahl &Kang (2015) identified ‘conditional equivalence’ as areasonable alternative to super-equivalence. Thisnew class of equivalence compares measurementsfrom field plots planted with GMO to those with refer-ences according to their conditional distributions atthe site and block level. Depending on one’s prefer-ence, conditional equivalence can be defined in twoforms: the conditional scaled average equivalence(SAE-C) and the conditional distribution-wise equiva-lence (DWE-C). A unified expression for hypothesesof conditional equivalence is

    H0 : μT � μR ⩽ �ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiz20�975σ

    2R þ cσ2E

    qor

    μT � μR ⩾ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiz20�975σ

    2R þ cσ2E

    qv:

    H1 : �ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiz20�975σ

    2R þ cσ2E

    q< μT � μR <

    ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiz20�975σ

    2R þ cσ2E

    qð4Þ

    Setting c ¼ z20�975 gives rise to SAE-C, which requiresthe GMOmean to fall within 2·5th and 97·5th percen-tiles of the reference distribution at the site and blocklevel. Testing SAE-C stems from the routinely imple-mented strategy for statistical quality control wherethe test product is required to provide values fallingwithin a certain percentage of the reference distribu-tion. It also harmonizes with the current assessmentof highly variable drugs via its rescaled average bioe-quivalence at the individual subject level (Davit et al.2012). Setting c ¼ z20�975 � 1 originates from the well-established statistical principle of controlling the

    Statistical testing procedures for genetically modified organism safety evaluation 1393

    http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0021859615001367Downloaded from http:/www.cambridge.org/core. Kansas State University Libraries, on 28 Oct 2016 at 20:04:55, subject to the Cambridge Core terms of use, available at

    http:/www.cambridge.org/core/termshttp://dx.doi.org/10.1017/S0021859615001367http:/www.cambridge.org/core

  • information divergence between conditional distribu-tions (for its connection to individual bioequivalencesee Dragalin et al. 2003). The corresponding notionof DWE-C is more stringent than SAE-C but maintainsits meaning for any monotonically transformed data.This is particularly appealing for assessing GMO agro-nomic-phenotypic characteristics. However, DWE-Chas been deemed too mathematical, at least for drugstudies (see Vahl & Kang 2015 for more discussionson this issue). In spite of the conceptual connectionbetween GMO conditional equivalence and drugbioequivalence, it must be noted that drug bioavail-ability studies differ remarkably from GMO safetystudies with respect to purpose, experimental designand model. In vivo drug bioavailability studiesintend to assess the difference between a test drugand a single reference product via a crossoverdesign. Their statistical analyses employ typicalmixed models where both levels of the treatmenteffect are fixed. Safety studies for GMOs include an as-sortment of non-GM commercial reference varieties toassess GMO in the context of natural variation. Datacollected in multi-site randomized block field trialsare usually unbalanced and incomplete, by designor execution. Model (1) is atypical because genotypecontains both fixed and random levels. In contrast tothe U S Food and Drug Administration’s recent recom-mendation of SAE-C as a usable bioequivalence criter-ion (FDA 2014), there has been no regulatoryclarification on the explicit definition of GMO equiva-lence. It is therefore prudent to create statistical testingprocedures that establish both SAE-C and DWE-C.

    Statistical methodologies for inference on functionsof means and variances have been investigated mostlywithin the context of testing bioequivalence in drugstudies and estimating tolerance intervals for monitor-ing purposes. Two methods have been commonlyemployed there: the modified-large-sample (MLS) ap-proximation (Howe 1974; Graybill & Wang 1980;Hyslop et al. 2000; Krishnamoorthy & Lian 2012)and a method via generalized pivotal quantities(GPQs) (Tsui & Weerahandi 1989; Weerahandi1993; McNally et al. 2003; Krishnamoorthy &Mathew 2004; Liao et al. 2005; Hannig et al. 2006;Chiu et al. 2010). Both the MLS and GPQ methodscall for exact distributions of independent summarystatistics which are established results of linearmodel theory under balanced designs. As stated onp. 141 of Krishnamoorthy & Mathew (2009), ‘forgeneral mixed effects or random effect models withunbalanced data, a unified methodology is currently

    not available’. Krishnamoorthy & Mathew (2004) aswell as Krishnamoorthy & Lian (2012) conquered theproblem of unbalanced data in a one-way randomeffect model by building moments around unweightedmeans; however their approach is unfeasible inModel (1), a three-way mixed model where the siteeffect crosses with the reference effect, due tomissing combinations of site and reference (Section4.6 of Searle 1987). In addition, the model under clin-ical crossover designs contains correlated heteroge-neous variance components, which are issuesspecific to drug bioequivalence studies. Clearly theunique characteristics of GMO testing require thesestatistical methods be derived from the ground up.Because the distribution of a GPQ generally dependson nuisance parameters, it is crucial to validate theGPQ method when venturing into a new application.This also pertains to the MLS approximation. To date,only Hypotheses (3) have received careful attentionand this was accomplished exclusively via the GPQmethod (Kang & Vahl 2014).

    The purpose of the present work is to develop andvalidate statistical procedures for testing Hypotheses(4) under Model (1). Special effort is made to tacklethe incompleteness and unbalanced nature of fielddesigns. The approach here is novel, in that it buildsmoments with tractable distributions under Model(1) based on the method of Khuri et al. (1998). Thepractical appeal is manifest in the convenience ofobtaining the required summary statistics output bystandard statistical software rather than from customprograms carrying out tedious algebraic calculations.The Materials and Methods section introducessummary statistics especially useful for estimatingparameters in Model (1). Their statistical properties, in-cluding distributions and dependencies, are estab-lished under a general setting. Accordingly, thepresent paper creates the statistical procedures fortesting Hypotheses (4) via both the MLS and GPQmethods. The Results section investigates thesenewly proposed procedures via two sets of simulationstudies. First, the small-sample behaviours of the MLSand GPQ procedures are inspected over an array ofidealistic situations where the design is completeand balanced. Second, their performance is examinedunder practical designs listed in Table 1 with andwithout data randomly missing. The utility of thesetwo procedures is also illustrated in a case studyusing summary statistics derived from a real-worlddata set. The last section of this paper makes addition-al comments on GMO equivalence testing.

    1394 Q. Kang and C. I. Vahl

    http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0021859615001367Downloaded from http:/www.cambridge.org/core. Kansas State University Libraries, on 28 Oct 2016 at 20:04:55, subject to the Cambridge Core terms of use, available at

    http:/www.cambridge.org/core/termshttp://dx.doi.org/10.1017/S0021859615001367http:/www.cambridge.org/core

  • MATERIALS AND METHODS

    Properties of summary statistics

    Because studies on GMO safety include a limitednumber of reference varieties, variance componentstend to be overestimated by the restricted maximumlikelihood (REML) method (Endrenyi et al. 2000;Park & Burdick 2003). To avoid estimation bias, themethod of moments was used in the current paper toestimate variances. Table 2 provides the analysis ofvariance (ANOVA) shell for complete and balanceddesigns under Model (1). Constructing the ANOVAtables and deriving weighted means under otherdesigns involved tedious algebraic calculation.Nonetheless, these computed results were easily ac-cessible via routine procedures of standard statisticalsoftware such as PROC MIXED in SAS (SAS InstituteInc. 2011).Khuri et al. (1998, Section 4.3) extended the method

    of moments to general two-way random effect models.Their approach offers a viable solution for inferringvariance components from unbalanced GMO fielddesigns where not all combinations of site and refer-ences are present. Let σ2 be the vector of variancecomponents in Model (1). The mean square for each

    random effect corresponds to a quadratic formwhose expectation is a linear combination of elementsin σ2. Distributions of these mean squares wereobtained directly from the application of classicalmixed model theory. In particular, the mean squareerror MSE has an expected value of σ2E , i.e.EðMSEÞ ¼ σ2E, and follows a scaled χ2 distributionwith degrees of freedom (D.F.) dfE, i.e.

    dfEEðMSEÞMSE ≡ UE ∼ χ

    2ðdfEÞ ð5Þ

    The value of dfE is the rank of the quadratic matrix forMSE. The mean square for the random effect of refer-ence genotypes, denoted by MSR has an expectedvalue of EðMSRÞ ¼ aσ2R þ σ2E. Coefficient a > 0 is deter-mined entirely by the design matrix. It attains thesimple formulation of a = nSnB(S) for complete andbalanced designs (see Table 2). By construction,MSR is independent of MSE. The quadratic matrix ofMSR is of rank (nR−1). The distribution of MSR is, ingeneral, represented by a weighted sum of (nR−1) in-dependent χ2(1) random variables. When referencevarieties are allocated to sites in a balanced fashion,as in Designs I and II, the weights all equal one so

    Table 1. Field designs implemented for GMO safety evaluation (information available online from http://www.aphis.usda.gov/biotechnology/petitions_table_pending.shtml (verified 3 December 2015))

    Design

    I II III IV V VI

    USDA petition number 12-215-01p 11-234-01p 13-262-01p 12-185-01p 11-202-01p 13-290-01pNumber of references 6 6 6 9 16 20Number of sites 8 10 8 8 8 8Number of blocks per site 4 4 4 4 4 4Number of references per site 6 3 2, 3 3, 4 3 3, 4Site 1 allocation R1∼R6 R1 R3 R5 R1 R4 R5 R1 R2 R3 R4 R1 R2 R3 R1 R2 R3 R4Site 2 allocation R1∼R6 R2 R3 R4 R1 R2 R6 R1 R5 R6 R7 R4 R5 R6 R1 R2 R5 R6Site 3 allocation R1∼R6 R1 R4 R5 R1 R3 R6 R1 R2 R5 R7 R4 R5 R6 R2 R7 R8 R9Site 4 allocation R1∼R6 R3 R4 R6 R2 R4 R6 R1 R2 R6 R9 R7 R8 R9 R4 R10 R11 R12Site 5 allocation R1∼R6 R1 R2 R6 R2 R3 R4 R3 R4 R7 R8 R7 R8 R9 R5 R6 R7 R15Site 6 allocation R1∼R6 R2 R4 R5 R3 R5 R6 R3 R6 R8 R9 R10 R11 R12 R4 R13 R14Site 7 allocation R1∼R6 R3 R5 R6 R2 R3 R5 R4 R5 R7 R8 R1 R5 R13 R8 R16 R17Site 8 allocation R1∼R6 R1 R2 R3 R5 R6 R6 R7 R8 R14 R15 R16 R18 R19 R20Site 9 allocation – R1 R4 R6 – – – –Site 10 allocation – R2 R5 R6 – – – –Total sample size(with no missing data)

    256 200 156 188 160 180

    GMO, genetically modified organism; USDA, US department of agriculture.

    Statistical testing procedures for genetically modified organism safety evaluation 1395

    http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0021859615001367Downloaded from http:/www.cambridge.org/core. Kansas State University Libraries, on 28 Oct 2016 at 20:04:55, subject to the Cambridge Core terms of use, available at

    http://www.aphis.usda.gov/biotechnology/petitions_table_pending.shtmlhttp://www.aphis.usda.gov/biotechnology/petitions_table_pending.shtmlhttp://www.aphis.usda.gov/biotechnology/petitions_table_pending.shtmlhttp:/www.cambridge.org/core/termshttp://dx.doi.org/10.1017/S0021859615001367http:/www.cambridge.org/core

  • that the distribution of MSR simplifies to a scaledχ2(nR−1) distribution. For unbalanced Designs III–VI,the distribution of MSR was approximated by

    dfREðMSRÞMSR ≡ UR _∼χ

    2ðdfRÞ ð6Þ

    with dfR = nR− 1. Subsequent simulation studies indi-cated that equivalence tests based on this approxima-tion perform reasonably well.

    Denote the least square mean for μT− μR by �D. Byconstruction, �D is independent of MSR and MSE. It iswell-known that

    �D� μT þ μRffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiVarð�DÞ

    q ≡ Z ∼ Nð0; 1Þ ð7ÞThe variance of �D depends on the designmatrix and σ2.Plug the point estimator of σ2 into Varð�DÞ. The resultingestimator, V̂arð�DÞ, is independent of �D and approxi-mately follows a scaled χ2 distribution. That is,

    dfDðσ2ÞVarð�DÞ V̂arð

    �DÞ _∼χ2ðdfDðσ2ÞÞ ð8Þ

    where dfDðσ2Þ, as a function of σ2, is the Satterthwaiteapproximation to the D.F. Also note that Varð�DÞ couldbe expressed as a linear combination of E(MSR) andE(MSE) with linear coefficient functions h1ðσ2Þ andh2ðσ2Þ, i.e. Varð�DÞ ¼ h1ðσ2ÞEðMSRÞ þ h2ðσ2ÞEðMSEÞ.In balanced designs containing v references persite, h1ðσ2Þ and h2ðσ2Þ are constants with valuesh1ðσ2Þ¼ðanRÞ�1, h2ðσ2Þ¼ðnSnBðSÞÞ�1þðvnSnBðSÞÞ�1�ðanRÞ�1. The distribution of �D under these designswas then parameterized by E(MSR) and E(MSE) insteadof σ2.

    Modified large sample method

    The MLS method was originally developed to ap-proximate linear combinations of variances (Howe

    1974; Graybill & Wang 1980). Hyslop et al. (2000)extended the MLS method to assess drug bioequiva-lence by restating its hypotheses in terms of a linearcombination of squared mean differences and var-iances. The present work adopted their strategy andrejected H0 of Hypotheses (4) at the α significancelevel if the 100(1 − α)% MLS upper confidence limit(CL) for ðμT � μRÞ2 � ðz20�975σ2R þ cσ2EÞ was less thanzero. In order to utilize the sampling distributions ofthe summary statistics, the following reparameteriza-tion was introduced.

    ðμT � μRÞ2 � ðz20�975σ2R þ cσ2EÞ

    ¼ ðμT � μRÞ2 �1az20�975EðMSRÞ

    � c� 1az20�975

    � �EðMSEÞ

    Suppose that �d, msR and msE are the respectiveobserved values of �D, MSR and MSE. Let PEk and ULk,k =D, R, E, be the point estimate and 100(1 − α)%upper CL for (μT− μR)2, �ð1=aÞz20�975EðMSRÞ and�ðc� z20�975=aÞEðMSEÞ, respectively. Properties(5)–(8) suggest that

    PED ¼ �d 2;

    ULD ¼ fj �dj þ t1�αðdfDðσ̂2ÞÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiV̂arð�DÞ

    qg2

    PER ¼ � 1a z20�975msR;

    ULR ¼ �1a z20�975

    dfRmsRχ21�αðdfRÞ

    PEE¼� c� 1a z20�975

    � �msE;

    ULE¼� c� 1a z20�975

    � �dfEmsEχ21�αðdfEÞ

    where t1−α (n) and χ21�αðnÞ are the respective 100(1−α)th percentiles of t and χ2 distributions with n D.F.; σ̂2 isthe estimated variance vector. The MLS method

    Table 2. ANOVA table for random effects in Model (1) under complete balanced designs

    Source Degrees of freedom Mean square Expected mean square Observed mean square

    Site nS− 1 MSS σ2E þ ðnR þ 2Þσ2BðSÞ þ ðnR þ 2ÞnBðSÞσ2S MSSBlock (site) (nB(S) − 1)nS MSB(S) σ2E þ ðnR þ 2Þσ2BðSÞ MSB(S)Reference nR− 1 MSR σ2E þ nSnBðSÞσ2R MSRError (nSnB(S) − 1)(nR + 1) MSE σ2E MSE

    ANOVA, analysis of variance.

    1396 Q. Kang and C. I. Vahl

    http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0021859615001367Downloaded from http:/www.cambridge.org/core. Kansas State University Libraries, on 28 Oct 2016 at 20:04:55, subject to the Cambridge Core terms of use, available at

    http:/www.cambridge.org/core/termshttp://dx.doi.org/10.1017/S0021859615001367http:/www.cambridge.org/core

  • approximated the CL under balanced designs byrestricting it to be exact when only one parameter isunknown. Plugging in the 100ð1� αÞ% CL for eachof its independent elements, the 100(1−α)% MLSupper CL for ðμT � μRÞ2 � ðz20�975σ2R þ cσ2EÞ is

    Xk¼D;R;E PEk þ

    ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXk¼D;R;E ðULk � PEkÞ

    2q

    ð9Þ

    Numerical integration and simulation under severalmixed models indicated that the overall confidencelevel of two-sided and upper MLS intervals wasgreater than 100(1−α)% (Graybill & Wang 1980;Hyslop et al. 2000). This conservatism has led to its

    wide application in the regulatory assessment ofgeneric drugs (Quiroz et al. 2002; Lee et al. 2004;Chiu et al. 2010). Ting et al. (1990) added cross-products of mean squares so that the MLS CLs areexact for additional restrictions. Simulations underbalanced designs indicated that adding cross-productterms had a trivial effect on adjusting the conservatisminGMO testing (results not shown). Therefore, Formula(9) was used to represent the MLS method in subse-quent evaluation.

    Generalized pivotal quantity method

    The GPQ method tackles complex parametric pro-blems by extending the traditional definition ofpivotal quality (Weerahandi 1993). A GPQ functionfor the univariate parameter of interest θ containsthree arguments: y is the observed data; y is arandom copy of y; ðθ; ςÞ is the vector containing θand the nuisance parameter ς. Hannig et al. (2006)imposed two requirements for functionQðy;Y; ðθ; ςÞÞ to be the GPQ of θ: (1) the distributionof Qðy;Y; ðθ; ςÞÞ for any given y is free of ðθ; ςÞ; (2)

    Qðy; y; ðθ; ςÞÞ ¼ θ. Under the fiducial inferenceframework, these requirements enlistθ �Qðy;Y; ðθ; ςÞÞ as the generalized test variable fortesting H0 :θ⩽ θ0 v. H1 :θ > θ0 withPrðQðy;Y; ðθ; ςÞÞ ⩽ θjθ ¼ θ0Þ being the generalized Pvalue (GPV). The percentiles of Qðy;Y; ðθ; ςÞÞprovide the generalized CLs of θ. Without loss ofclarity, the arguments inside the GPQ function aredropped henceforward.

    Suppose that the design is balanced. Thefollowing functions are the exact GPQs for

    μT � μR ±ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiz20�975σ

    2R þ cσ2E

    qwhich related directly to

    Hypotheses (4).

    Note that the first two terms in Formula (10) are theGPQ of μT− μR and the last term is the GPQ of

    ±ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiz20�975σ

    2R þ cσ2E

    q. Similar to the MLS method, the

    parameter of interest for Hypotheses (4) could be

    ðμT � μRÞ2 � ðz20�975σ2R þ cσ2EÞ, whose exact GPQ is

    Q ¼ f �d þ Zffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffih1ðσ2ÞdfR msRUR þ h2ðσ

    2ÞdfE msEUE

    rg2

    � z20�9751adfR

    msRUR

    � c� z20�975a

    � �dfE

    msEUE

    ð11Þ

    Properties of �D, MSR and MSE described earlier indi-cated that Z, UR and UE are independent random vari-ables with exact distributions free of modelparameters; h1ðσ2Þ and h2ðσ2Þ are known constantsfor balanced designs. Upon observing �d, msR andmsE, the distributions of GPQs listed in Formulas (10)and (11) were obtained by the following resamplingalgorithm.

    Step 1. Independently sample z, uR and uE fromN(0,1), χ2(dfR), and χ

    2(dfE).

    Q± ¼ �dþ μT�μR��Dffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

    h1ðσ2Þðσ2E þaσ2RÞþh2ðσ2Þσ2Eq ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffih1ðσ2ÞmsRMSR ðσ2E þaσ2RÞþh2ðσ2Þ

    msEMSE

    σ2E

    r

    ±ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiz20�975

    1afmsRMSR

    ðσ2E þaσ2RÞ�msEMSE

    σ2EgþcmsEMSE

    σ2E

    r

    ¼ �dþZffiffiffiffiffiffiffiffiffiffih1ðσ

    p 2ÞdfRmsRUR þh2ðσ2ÞdfEmsEUE

    ±

    ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiz20�975

    1adfR

    msRUR

    þðc�z20�975a

    ÞdfEmsEUE

    sð10Þ

    Statistical testing procedures for genetically modified organism safety evaluation 1397

    http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0021859615001367Downloaded from http:/www.cambridge.org/core. Kansas State University Libraries, on 28 Oct 2016 at 20:04:55, subject to the Cambridge Core terms of use, available at

    http:/www.cambridge.org/core/termshttp://dx.doi.org/10.1017/S0021859615001367http:/www.cambridge.org/core

  • Step 2. Replace Z, UR andUE in Formulas (10) and (11)with z, uR and uE. This yields q

    +, q− and q.Step 3. Accumulate values of q+, q− and q by repeating

    Steps1–2 a large number of times, perhaps 5000.This generates the resampling distributions of Q+, Q−

    and Q.

    The inequalities of q+⩽ 0 or q−⩾ 0 coincide with theinequality of q⩾ 0. As a result, the twoGPQ functions fortesting Hypotheses (4) yield identical GPVs given below.

    PrðQþ ⩽ 0 or Q� ⩾ 0Þ ¼ PrðQ ⩾ 0Þ

    ≈number of resamples with qþ ⩽ 0 or q� ⩾ 0

    total number of resamples

    ¼ number of resamples withq ⩾ 0total number of resamples

    Equivalence was concluded if the GPV was less thanthe pre-specified significance level α. TestingHypotheses (4) could also be carried out via assessingpercentiles of resamplingdistributions forQ+,Q−andQ.These percentiles are referred to as generalized CLs.Adopting the approach of two one-sided tests, H0 wasrejected when the 100αth percentile forQ+ was greaterthan zero and the 100(1−α)th percentile for Q− wasless than zero. Alternatively, H0 could be rejectedwhen the 100(1−α)th percentile forQ was smaller thanzero. The resampling algorithm for Q, however,does not produce a straightforward representation ofthe rejection region for �d or �d2 at a given set of msRand msE values. The formulation of Q

    + and Q− is ad-vantageous in this regard and may be more inform-ative to practitioners. To see this, let wþα be the100αth percentile for Qþ � �d. Note that �wþα is the100(1−α)th percentile for Q� � �d. Equivalence wasconcluded when �wþα þ �d < 0 and wþα þ �d > 0.Because wþα depends on msR and msE only, the rejec-tion region for �d at given msR and msE could be effort-lessly expressed as �wþα < �d 6. Nonetheless, therejection rates under these designs characterize theidealistic scenario for GMO equivalence testing.Figures 1 and 2 summarize the simulation results for

    1398 Q. Kang and C. I. Vahl

    http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0021859615001367Downloaded from http:/www.cambridge.org/core. Kansas State University Libraries, on 28 Oct 2016 at 20:04:55, subject to the Cambridge Core terms of use, available at

    http:/www.cambridge.org/core/termshttp://dx.doi.org/10.1017/S0021859615001367http:/www.cambridge.org/core

  • testing SAE-C and DWE-C, i.e. Hypotheses (4) atc ¼ z20�975 and c ¼ z20�975 � 1. Clearly the pattern fortesting SAE-C was similar to that of DWE-C.Consistent with previous experience reported in theliterature, the MLS procedure concluded equivalenceless often than desired. Its type I error rate deviatedfrom the 0·05 nominal level to less than 0·02 atnR = 6 and ρ = 0·9. This conservatism attenuated as ρdecreased or as nR increased. The GPQ procedure, onthe other hand, rejected H0 of Hypotheses (4) witheither nearly nominal or slightly liberal type I errorrates. When nR = 6 and ρ = 0·1, its type I error ratewas as high as 0·06. This anti-conservatism dissipatedeither as nR or ρ increased. The power of both proce-dures dropped as ρ ascended. It increased as nRbecame larger. This is associated with the fact that infer-ring σ2R from MSR possesses a greater extent of uncer-tainty than inferring σ2E from MSE since dfE >> dfR.Consistent with their disparity in type I error rates, theGPQ procedure exceeded theMLS procedure in power.

    Next investigate equivalence testing under practicaldesigns listed in Table 1. Rejection rates were initiallysimulated without the occurrence of missing data.Theoretical assumptions for the MLS and GPQmethods were satisfied under Designs I and II. ForDesigns III–VI, the distribution of MSR in Formulas(9) and (10) was approximated by χ2 (dfR); values ofh1ðσ2Þ and h2ðσ2Þ were estimated via the algorithmdescribed previously. It was observed that MSR and

    MSE regressed closely to V̂arð�DÞ with over 99% ofadjusted coefficients of determination exceeding0·98. This good fit supports the strategy of simplifyingthe dependence of Varð�DÞ on E(MSR) and E(MSE) via alinear regression. Performance of the two proceduresfor testing SAE-C and DWE-C is presented in Figs 3and 4, respectively. Again, similar patterns wereobserved when testing SAE-C and DWE-C. For eachtest, the MLS procedure was overly conservativeunless nR was large or ρ was close to zero. The GPQprocedure was moderately liberal. It gave high type I

    Fig. 1. Empirical rejection rates for testing conditional scaled average equivalence under complete balanced designs. Type Ierror rate was estimated at ðμT � μRÞ2 ¼ z20�975ðσ2R þ σ2EÞ. Power was estimated at ðμT � μRÞ2 ¼ 0 � 75z20�975ðσ2R þ σ2EÞ. Dashedhorizontal lines represent the 0·05 nominal level and the levels of 0·05 ± 0·01. ○, nR = 6; ●, nR = 9; □, nR = 16; ■, nR =20; Δ, nR = 36. GPQ, generalized pivotal quantities; MLS, modified large sample method.

    Statistical testing procedures for genetically modified organism safety evaluation 1399

    http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0021859615001367Downloaded from http:/www.cambridge.org/core. Kansas State University Libraries, on 28 Oct 2016 at 20:04:55, subject to the Cambridge Core terms of use, available at

    http:/www.cambridge.org/core/termshttp://dx.doi.org/10.1017/S0021859615001367http:/www.cambridge.org/core

  • error rates when ρ was small. Designs III–VI in Figs 3and 4 relate to designs in Figs 1 and 2 by having thesame numbers of references, specifically nR = 6, 9,16, 20. At the same nR, the GPQ procedure tendedto be more liberal under unbalanced designs thanbalanced designs. Corresponding to their disparity intype I error rates, power of the MLS procedure wasconsiderably lower than that of the GPQ procedure.To further mimic a real-world scenario, a randommissing rate of 3% was imposed in the simulateddata sets. Depending on the design, there was anaverage of five to eight samples missing per study.Greater rates of missing data would not be toleratedin practice and were therefore omitted. The occur-rence of missing data caused σ2S and σ

    2BðSÞ to appear

    in Varð�DÞ, which in theory affected the behaviour ofthe MLS and GPQ procedures. At a 3% missing rate,the simulation results were unlikely to be sensitive tovalues of σ2S and σ

    2BðSÞ, though. Experience with crop

    composition data indicates that either σ2S, σ2R, or, occa-

    sionally, σ2E could be the dominating source of naturalvariation (Vahl & Kang 2015). Several realistic scenarioswere considered here by setting σ2S ¼ 4, σ2BðSÞ ¼ 0 � 25,σ2E ¼ 1, and taking σ2R ≈ 0�11; 0�43; 1�0; 2�3; 9�0 in ac-cordance with the various levels of ρ considered.Simulation results in Figs 3 and 4 indicate that a smallproportion of missing data had trivial impact on the per-formance of the two procedures.

    It is noteworthy that nR, rather than the total samplesize, played a crucial role in controlling the power ofequivalence tests. Designs I and II collected the great-est numbers of samples but their power never rankedat the top. The total numbers of samples for DesignsIII and V are similar. Regarding the GPQ procedure,Design V (nR = 16) always carried better power thanDesign III (nR = 6) while their type I error rates werecomparable. Without missing data, Design IV haseight more samples than Design VI. Rejection rates

    Fig. 2. Empirical rejection rates for testing conditional distribution-wise-equivalence under complete balanced designs.Type I error rate was estimated at ðμT � μRÞ2 ¼ z20�975σ2R þ ðz20�975 � 1Þσ2E. Power was estimated at ðμT � μRÞ2 ¼ 0 � 75fz20�975σ2R þ ðz20�975 � 1Þσ2Eg. Dashed horizontal lines represent the 0·05 nominal level and the levels of 0·05 ± 0·01.○, nR = 6; ●, nR = 9; □, nR = 16; ■, nR = 20; Δ, nR = 36. GPQ, generalized pivotal quantities; MLS, modified large samplemethod.

    1400 Q. Kang and C. I. Vahl

    http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0021859615001367Downloaded from http:/www.cambridge.org/core. Kansas State University Libraries, on 28 Oct 2016 at 20:04:55, subject to the Cambridge Core terms of use, available at

    http:/www.cambridge.org/core/termshttp://dx.doi.org/10.1017/S0021859615001367http:/www.cambridge.org/core

  • Fig. 3. Empirical rejection rates for testing conditional scaled average equivalence under various field designs (a) withoutmissing data and (b) with 3% missing data. Type I error rate was estimated at ðμT � μRÞ2 ¼ z20�975ðσ2R þ σ2EÞ. Power wasestimated at ðμT � μRÞ2 ¼ 0 � 75z20�975ðσ2R þ σ2EÞ. Dashed horizontal lines represent the 0·05 nominal level and the levels of0·05 ± 0·01. ○, Design I (nR = 6); ◊, Design II (nR = 6); ♦, Design III (nR = 6); ●, Design IV (nR = 9); □, Design V (nR = 16);■, Design VI (nR = 20). GPQ, generalized pivotal quantities; MLS, modified large sample method.

    Statistical testing procedures for genetically modified organism safety evaluation 1401

    http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0021859615001367Downloaded from http:/www.cambridge.org/core. Kansas State University Libraries, on 28 Oct 2016 at 20:04:55, subject to the Cambridge Core terms of use, available at

    http:/www.cambridge.org/core/termshttp://dx.doi.org/10.1017/S0021859615001367http:/www.cambridge.org/core

  • Fig. 4. Empirical rejection rates for testing conditional distribution-wise-equivalence under various field designs (a) withoutmissing data and (b) with 3% missing data. Type I error rate was estimated at ðμT � μRÞ2 ¼ z20�975σ2R þ ðz20�975 � 1Þσ2E. Powerwas estimated at ðμT � μRÞ2 ¼ 0 � 75fz20�975σ2R þ ðz20�975 � 1Þσ2Eg. Dashed horizontal lines represent the 0·05 nominal leveland the levels of 0·05 ± 0·01. ○, Design I (nR = 6); ◊, Design II (nR = 6); ♦, Design III (nR = 6); ●, Design IV (nR = 9); □,Design V (nR = 16); ■, Design VI (nR = 20). GPQ, generalized pivotal quantities; MLS, modified large sample method.

    1402 Q. Kang and C. I. Vahl

    http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0021859615001367Downloaded from http:/www.cambridge.org/core. Kansas State University Libraries, on 28 Oct 2016 at 20:04:55, subject to the Cambridge Core terms of use, available at

    http:/www.cambridge.org/core/termshttp://dx.doi.org/10.1017/S0021859615001367http:/www.cambridge.org/core

  • for testing SAE-C under these two designs were plottedagainst ðμT � μRÞ2=ðz20�975σ2R þ z20�975σ2EÞ at ρ = 0·3 and0·7 in Fig. 5. Design VI (nR = 20) outperformed DesignIV (nR = 9) with respect to both the MLS and GPQ pro-cedures. This superiority was more prominent at ρ =0·7 than at ρ = 0·3. Moreover, the two power curvesfor Design VI differed less than those for Design IV,suggesting the reconciliation of theMLS and GPQ pro-cedures for large nR.

    Case study

    Real-world data sets used in GMO safety evaluationare publicly unavailable due to proprietary reasons.As far as it is known, the illustrative example byEFSA (2010) and Van der Voet et al. (2011) is theonly open document detailing the summary statisticsunder Model (1) for each compositional endpoint.Their analysis data set came from a study on a GM

    maize with nR = 13, nS = 4, nB(S) = 3 and four refer-ences per site. Protocol deviation resulted in a totalof 67 samples with 14, 17, 18 and 18 samples col-lected from the four sites. This data set, in spite of itssmall sample size, still holds great merit in illustratingGMO equivalence tests. Regarding the testing proce-dures for Hypotheses (3) and (4), summary statistics in-

    cluding expð �dÞ, dfDðσ̂2Þ andffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiV̂arð�DÞ

    qare directly

    available in Tables 4 and 5 of EFSA (2010) as well asTable 2 of Van der Voet et al. (2011). The studydesign and missing data pattern suggest dfR = 12, dfE= 41 and a≈ 2·97. The observed mean squares for ref-erence genotype and error could then be deducedaccording to msR ¼ aσ̂2R þ σ̂2E and msE ¼ σ̂2E , whereσ̂2R and σ̂

    2E are estimated variances of reference geno-

    type and error given by Table 2 of Van der Voetet al. (2011). It is noted that σ̂2R and σ̂

    2E were produced

    by the REML method. Although the method ofmoments is preferred, REML estimates are stilluseful for comparing testing procedures given theabsence of the original data. In this example,ρ̂ ¼ σ̂2R=ðσ̂2R þ σ̂2E Þ was greater than 0·5 in 43 out of53 analytes. For several analytes whose ρ̂ was less

    than 0·3, regressing V̂arð�D�Þ on ms�R and ms�E fromten simulated data sets yielded unstable estimates ofh1ðσ2Þ and h2ðσ2Þ with the adjusted coefficient of de-termination less than 0·9. The size of the original dataset, which is less than half EFSA’s requirement, seemsto be at fault here. Boosting the number of simulateddata sets from ten to 50 appeared to tame down the in-stability. The Appendix provides the SAS code fortesting Hypotheses (3) and (4) from summary statistics.

    The testing proceduremandated by EFSA (2010) con-cludedequivalence, as defined inHypotheses (2), for 44of the 53 analytes. In comparison, testing Hypotheses(3) concluded equivalence for 33 analytes; testingHypotheses (4) concluded SAE-C for 47 analytes andDWE-C for 45 analytes. Table 3 lists 14 analyteswhere different conclusions on equivalence occurred.Recall that the mixed model was fitted on log-transformed data. The magnitude of the differencebetween GMO and references is routinely assessed interms of the ratio of their geometric means, i.e. expð �dÞ.Table 3 reports the rejection regions of expð �dÞ forthese tests at given σ̂2R and σ̂

    2E . Note that the MLS and

    GPQ procedures for testing Hypotheses (4) led to iden-tical conclusions in this example data set; but the Pvalues and rejection regions pertaining to Hypotheses(4) are available only from the GPQ procedure.

    Fig. 5. Power curves for testing conditional scaled averageequivalence at (a) ρ = 0·3 and (b) ρ = 0·7. The dashedhorizontal line represents the 0·05 nominal level. ○, GPQmethod and Design IV; ●, MLS method and Design IV; □,GPQ method and Design VI; ■, MLS method and DesignVI. GPQ, generalized pivotal quantities; MLS, modifiedlarge sample method.

    Statistical testing procedures for genetically modified organism safety evaluation 1403

    http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0021859615001367Downloaded from http:/www.cambridge.org/core. Kansas State University Libraries, on 28 Oct 2016 at 20:04:55, subject to the Cambridge Core terms of use, available at

    http:/www.cambridge.org/core/termshttp://dx.doi.org/10.1017/S0021859615001367http:/www.cambridge.org/core

  • Table 3. Analytes where conclusions differ among testing procedures

    Hypotheses (4)

    EstimateHypotheses (2) Hypotheses (3) SAE-C DWE-C

    Analyte expð �dÞ σ̂2R σ̂2E ρ̂ Rejection region Rejection region P value Rejection region P value Rejection region P valueAcid dietary fibre 1·1026 0·0071 0·027063 0·21 (0·838, 1·193) ∅ 0·251 (0·769, 1·301)

  • Analytes listed in Table 3 all had non-trivial σ̂2E andtheir values of expð �dÞ ranged from 0·80 to 1·31, whichwere fairly close to one. In fact, tests of SAE-C weresignificant for all 14 and tests of DWE-C were signifi-cant for 12 analytes. Vahl & Kang (2015) pointed outthe H1 parameter space of super equivalence isnested within the H1 parameter space of DWE-C,which in turn is nested within that of SAE-C.Coherent with this ordering, all 33 analytes significantfor Hypotheses (3) were significant for SAE-C andDWE-C; all six (eight) analytes non-significant forSAE-C (DWE-C) remained non-significant forHypotheses (3). The super equivalence defined byHypotheses (3) was so stringent that five analyteswith the lowest ρ̂ values had the empty set as their re-jection regions. Tests of Hypotheses (2) assertedequivalence more often than tests of Hypotheses (3),which is consistent with the findings of Kang & Vahl(2014). Testing Hypotheses (2) and (4) disagreed onthree analytes for SAE-C and two analytes for DWE-C only. This may give the false impression that thedata-driven ELs are indeed acceptable. However, theapparent success of Hypotheses (2) is an artefactresulting from the insufficient number of sites. Notethat the data example contains half of EFSA’sminimum requirement of eight sites. Formula (12)and Fig. 3 of Kang & Vahl (2014) depict the unfortu-nate effect that the EL of Hypotheses (2) decreases asthe sample size increases. Producers adhering toEFSA’s guidance will then be punished by the shrink-ing EL. In contrast, they will be rewarded with moresignificance by using well-defined ELs. Another ad-vantage of testing Hypotheses (4) is revealed by exam-ining the two analytes with σ̂2R ≈ 0, i.e. ash and phyticacid. The statistical analysis mandated by EFSAexcluded them from equivalence testing and declared‘no conclusion on equivalence’. However, σ̂2R ≈ 0implies σ2R is small relative to σ

    2E rather than being ab-

    solutely zero. The inability to test equivalence withσ̂2R ≈ 0 is certainly an undesirable feature. The testsof SAE-C and DWE-C were both significant for ashwhile only the test of SAE-C was significant forphytic acid. The shortcoming of EFSA’s procedurecan be further highlighted by considering twoextreme cases at σ̂2R ≈ 0. In case one, the GMO geo-metric mean is identical to the reference geometricmean. In case two, it is many fold larger, say a thou-sand. EFSA’s procedure for testing equivalence woulddeclare ‘no conclusion on equivalence’ for bothcases. The proposed procedures would declare

    equivalence for case one and retain the null ofHypotheses (4) for case two, which satisfies commonsense.

    DISCUSSION

    This present paper focused on the statistical proce-dures for evaluating the conditional equivalence ofGMO to references. Both the MLS and GPQmethods were considered. In many aspects, strengthof one method attends to the weakness of the other.The MLS CLs were obtained effortlessly via a simpleformula. Practitioners accustomed to traditional fre-quentist types of inference might comprehend itmore easily. The GPQ method operates under theframework of fiducial inference. Its execution requiresmore technical competence, since a re-sampling algo-rithm is needed to generate the distribution of its GPQ.Nonetheless, the convenience of the MLS method isachieved by re-expressing the original hypotheses interms of a linear combination of expected meansquares. The resulting CLs lack meaningful interpret-ation. In contrast, the GPQ testing procedure is flex-ible with the parameterization of complexhypotheses and produces rejection regions that helppractitioners visualize abstract statistical results. Anadditional benefit of the GPQ method comes fromits ability to generate P values, which are unavailablefrom the MLS method. Given the fact that a P valuequantifies the strength of evidence on a unit scaleand substantial equivalence is established by aweight of evidence, the GPQ method is especiallyuseful for assessing GMO across a vast number of end-points. So far, the conservative nature of the MLSmethod has been viewed as a plus for regulatingdrugs. The simulations of McNally et al. (2003) andChiu et al. (2010) show that the MLS method did notsuffer a severe power loss over the range of parametervalues commonly encountered in bioequivalencestudies. The disparity between the MLS and GPQmethods in their statistical properties was more dis-tinctive for testing GMO equivalence. Simulation, tai-lored toward the unique setting of GMO safetyevaluation, showed that the MLS method could be ex-tremely conservative while the GPQ method wasmildly liberal. The majority of compositional end-points had small σ2E relative to σ

    2R. In these cases, the

    GPQ method is preferred because of its near-nominal type I error rate and superior power. Forthose few endpoints where σ2R was observed to be

    Statistical testing procedures for genetically modified organism safety evaluation 1405

    http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0021859615001367Downloaded from http:/www.cambridge.org/core. Kansas State University Libraries, on 28 Oct 2016 at 20:04:55, subject to the Cambridge Core terms of use, available at

    http:/www.cambridge.org/core/termshttp://dx.doi.org/10.1017/S0021859615001367http:/www.cambridge.org/core

  • much smaller than σ2E, the MLS method had a tightcontrol over the type I error rate and is thereforerecommended in such cases. Simulation results alsoindicate that the two methods had good powerunder practical designs; power of the GPQ methodalways exceeded that of the MLS method; their differ-ence may not be practically noticeable for largesamples or when σ2R is relatively large comparedwith σ2E. The case study of real-world data furtherdemonstrated the utility of the two proposed proce-dures as compared with the procedure currently man-dated by EFSA.

    The theory supporting the MLS and GPQ methodsrequires independent summary statistics with identifi-able distributions which are attainable only in idealis-tic situations. Real-world studies bear incomplete andunbalanced arrangements (as in multi-site fielddesigns), intricate correlations (as in clinical crossoverdesigns), and/or incidental missing data. These prac-tical issues make it difficult to construct summary sta-tistics that satisfy the exact requirements. Simulationstudies under one-way random effect models demon-strate that the MLS and GPQ methods were quitelenient towards moderate violations of theoreticalassumptions with respect to parameter estimation(Park & Burdick 2003; Krishnamoorthy & Mathew2004; Krishnamoorthy & Lian 2012). Bioequivalencehas been successfully tested via MLS and GPQmethods that ignored some of the theoretical require-ments (Hyslop et al. 2000; McNally et al. 2003). Themild impact of an unbalanced design matrix wasalso seen in the empirical type I error rates of thesetwo methods under Model (1). Nonetheless, produ-cers are encouraged to pursue the balanced designdescribed by John & Mitchell (1977) and to includeadequate replicates of random effects so as toenhance power. Missing data should be minimizedas it is often indicative of poor study conduct.Performance of equivalence tests should also be rou-tinely monitored through simulation.

    In the past decade, there have been intense debatesover the proper steps to ascertain GMO substantialequivalence to conventional food crops (Hothorn &Oberdoerfer 2006; Herman et al. 2010; Van derVoet et al. 2011; Kang & Vahl 2014; Vahl & Kang2015). A viable solution is best obtained via integrat-ing biological reasoning with statistical justification.Conditional equivalence compares GMO and refer-ences based on their conditional distributions. In con-trast, super equivalence judges the mean of GMO

    against the super-distribution of reference genotypemeans. Super-equivalence may first appear to be thenatural choice for assessing GMO. However, its bio-logical relevance is lost when σ2R does not dominatecrop natural variation. Conditional equivalence cir-cumvents this issue by also incorporating the variabil-ity due to plot, analytical error, etc. This approachresonates with the concept of site-specific equiva-lence advocated by EFSA (2010). Equivalence testingfacilitates hazard identification with statistical rigour.It adheres to the strategy of König et al. (2004) andserves as the first step in GMO risk assessment. Theresulting non-significant endpoints become candi-dates for hazard characterization, which involves athorough exposure assessment and possible labanimal testing. A recent explanatory statement ofEFSA (2014) elaborates that the assessment of GMOcompositional and agronomic-phenotypic character-istics serves as the basis for determining the design,conduct and interpretation of their mandated 90-dayrodent study. In the real-world data example, 20 end-points failed the test for super-equivalence, amongwhich 12 or 14 were significant in testing conditionalequivalence. Note that this discrepancy occurredmostly at endpoints with non-trivial σ2E and the corre-sponding GMO-reference ratios of geometric meanswere all fairly close to one. Testing for conditionalequivalence appears to avoid excessive investigationon endpoints where the deviation of GMO from refer-ences is unlikely to be of biological importance. Forthese reasons, conditional equivalence is moresuitable for GMO safety evaluation than super-equivalence. The two forms of conditional equiva-lence both have statistical and biological merits:SAE-C may be favoured by some because of its lenientrequirement and simple, non-technical interpretationwhereas DWE-C may be preferred by others becauseits ELs assign a lighter weight to the error term varianceshared by the GMO and reference distributions. The ul-timate call should be made through an open dialogueamong stakeholders with clear interests (the ag-biotechindustry and regulatory authorities) as well as research-ers with independent, scientific opinions.

    A controversial topic in GMO safety evaluationrevolves around the usage of Model (1). Genotype ×site interaction, also known as G × E with E standingfor environment, is an indispensable source of vari-ation in analysing field efficacy of experimentalcrops. Ward et al. (2012) criticized Model (1) for itsomission of G × E terms. In the latest public comments

    1406 Q. Kang and C. I. Vahl

    http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0021859615001367Downloaded from http:/www.cambridge.org/core. Kansas State University Libraries, on 28 Oct 2016 at 20:04:55, subject to the Cambridge Core terms of use, available at

    http:/www.cambridge.org/core/termshttp://dx.doi.org/10.1017/S0021859615001367http:/www.cambridge.org/core

  • to EFSA’s draft guidance on GMO, one organizationdescribed G × E as a known important effect on cropcomposition (EFSA 2015a). This comment cited thearticle byWhent et al. (2009) who reportedG × E con-stituted over half of the variation in some composition-al endpoints. The EFSA (2015b) responded bymandating an analysis of G × E via a separate fixedeffect model in the case of ‘significant differences[from the control] and/or lack of equivalence [to thereferences]’. A closer look at the analysis conductedby Whent et al. (2009) reveals that percentages ofthe total mean square for genotype, site and G × Ewere used to measure their respective contributionto variation under a fixed effect model. Recall that ex-pectation of each mean square equals the error termvariance plus a quadratic form quantifying fixedeffects. Small percentages of genotype and site inWhent et al. (2009) always coincided with largeerror term variances. The observed large contributionfrom G × E, although significantly non-zero, becameless relevant after taking the experimental design andthe error term variance into account. Therefore,these results point to the importance of the errorterm, rather than G × E. A proper way to studysources of variation is to conduct a variance compo-nent analysis under the random and/or mixed effectmodel. It is interesting to note that in several variancecomponent analyses on GM crops, the effect of G × Ewas negligible in comparison with the magnitude ofvariation due to site, reference genotype and theerror term (Harrigan et al. 2013; Harrison et al.2013; Venkatesh et al. 2014). These findings provideempirical justification that Model (1) is indeed ad-equate for assessing GMO safety with respect to com-positional endpoints. The influence of G × E onagronomic-phenotypic endpoints can only be clari-fied through proper statistical analyses on an abun-dance of real-world data. The proposed MLS andGPQ methods, in addition to their usage in equiva-lence testing, serve as valuable tools for characterizingcrop natural variation. A compelling GMO safetyevaluation requires solid biological exploration ac-companied by sound statistical justification. It ishoped that the present work further advances the pro-gress of a science-based decision-making process.

    REFERENCES

    CHIU, S. T., TSAI, P. Y. & LIU, J. P. (2010). Statistical evaluationof non-profile analyses for the in vitro bioequivalence.Journal of Chemometrics 24, 617–625.

    Codex Alimentarius Commission (2009). Foods Derivedfrom Modern Biotechnology, 2nd edn, Rome: Joint FAO/WHO Food Standards Programme.

    DAVIT, B. M., CHEN, M. L., CONNER, D. P., HAIDAR, S. H.,KIM, S., LEE, C. H., LIONBERGER, R. A., MAKLOUF, F. T.,NWAKAMA, P. E., PATEL, D. T., SCHUIRMANN, D. J. & YU, L. X.(2012). Implementation of a reference-scaled averagebioequivalence approach for highly variable genericdrug products by the US Food and Drug Administration.The AAPS Journal 14, 915–924.

    DRAGALIN, V., FEDOROV, V., PATTERSON, S. & JONES, B. (2003).Kullback-Leibler divergence for evaluating bioequiva-lence. Statistics in Medicine 22, 913–930.

    EFSA (2010). Scientific opinion on statistical considerationsfor the safety evaluation of GMOs. EFSA panel on GMOs.EFSA Journal 8(1), 1250. doi: 10.2903/j.efsa.2010.1250.

    EFSA (2014). Explanatory statement for the applicability ofthe Guidance of the EFSA Scientific Committee on con-ducting repeated-dose 90-day oral toxicity study inrodents on whole food/feed for GMO risk assessment.EFSA Journal 12(10), 3871. doi: 10.2903/j.efsa.2014.3871.

    EFSA (2015a). Outcome of the Public Consultation on theDraft Guidance on the Agronomic and PhenotypicCharacterization of Genetically Modified Plants. EFSAsupporting publication 2015: EN-829. Parma, Italy: EFSA.

    EFSA (2015b). Guidance on the agronomic and phenotypiccharacterization of genetically modified plants. EFSAJournal 13(6), 4128. doi: 10.2903/j.efsa.2015.4128.

    ENDRENYI, L., TABACK, N. & TOTHFALUSI, L. (2000). Properties ofthe estimated variance component for subject-by-formulation interaction in studies of individual bioequivalence.Statistics in Medicine 19, 2867–2878.

    FDA (2014). Guidance for Industry: Bioavailability andBioequivalence Studies Submitted in NDAs or INDs –General Considerations. Draft Guidance. Rockville, MD:Food and Drug Administration, Center for DrugEvaluation and Research (CDER). Available from: http://www.fda.gov/downloads/drugs/guidancecomplianceregulatoryinformation/guidances/ucm389370.pdf (verified 3December 2015).

    GRAYBILL, F. A. & WANG, C. M. (1980). Confidence intervalson nonnegative linear combinations of variances.Journal of the American Statistical Association 75, 869–873.

    HANNIG, J., IYER, H. & PATTERSON, P. (2006). Fiducial general-ized confidence intervals. Journal of the AmericanStatistical Association 101, 254–269.

    HARRIGAN, G. G., CULLER, A. H., CULLER, M., BREEZE, M. L.,BERMAN, K. H., HALLS, S. C. & HARRISON, J. M. (2013).Investigation of biochemical diversity in a soybeanlineage representing 35 years of breeding. Journal ofAgricultural and Food Chemistry 61, 10807–10815.

    HARRISON, J. M., HOWARD, D., MALVEN, M., HALLS, S. C.,CULLER, A. H., HARRIGAN, G. G. & WOLFINGER, R. D. (2013).Principle variance component analysis of crop compos-ition data: a case study on herbicide-tolerant cotton.Journal of Agricultural and Food Chemistry 61, 6412–6422.

    HERMAN, R. A., SCHERER, P. N., PHILLIPS, A. M., STORER, N. P. &KRIEGER, M. (2010). Safe composition levels of transgenic

    Statistical testing procedures for genetically modified organism safety evaluation 1407

    http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0021859615001367Downloaded from http:/www.cambridge.org/core. Kansas State University Libraries, on 28 Oct 2016 at 20:04:55, subject to the Cambridge Core terms of use, available at

    http://www.fda.gov/downloads/drugs/guidancecomplianceregulatoryinformation/guidances/ucm389370.pdfhttp://www.fda.gov/downloads/drugs/guidancecomplianceregulatoryinformation/guidances/ucm389370.pdfhttp://www.fda.gov/downloads/drugs/guidancecomplianceregulatoryinformation/guidances/ucm389370.pdfhttp://www.fda.gov/downloads/drugs/guidancecomplianceregulatoryinformation/guidances/ucm389370.pdfhttp:/www.cambridge.org/core/termshttp://dx.doi.org/10.1017/S0021859615001367http:/www.cambridge.org/core

  • crops assessed via a clinical medicine model.Biotechnology Journal 5, 172–182.

    HOTHORN, L. A. & OBERDOERFER, R. (2006). Statistical analysisused in the nutritional assessment of novel food using theproof of safety. Regulatory Toxicology and Pharmacology44, 125–135.

    HOWE, W. G. (1974). Approximate confidence limits on themean of X + Y where X and Y are two tabled independentrandom variables. Journal of the American StatisticalAssociation 69, 789–794.

    HYSLOP, T., HSUAN, F. & HOLDER, D. J. (2000). A small sampleconfidence interval approach to assess individual bioe-quivalence. Statistics in Medicine 19, 2885–2897.

    JOHN, J. A. & MITCHELL, T. J. (1977). Optimal incomplete blockdesigns. Journal of the Royal Statistical Society, Series B(Methodological) 39, 39–43.

    KANG, Q. & VAHL, C. I. (2014). Statistical analysis in the safetyevaluation of genetically modified crops: equivalencetests. Crop Science 54, 2183–2200.

    KHURI, A. I., MATHEW, T. & SINHA, B. K. (1998). Statistical Testsfor Mixed Linear Models. New York: Wiley-Interscience.

    KÖNIG, A., COCKBURN, A., CREVEL, R.W. R., DEBRUYNE, E.,GRAFSTROEM, R., HAMMERLING, U., KIMBER, I., KNUDSEN, I.,KUIPER, H. A., PEIJNENBURG, A. A. C. M., PENNINKS, A. H.,POULSEN, M., SCHAUZU, M. & WAL, J. M. (2004).Assessment of the safety of foods derived from geneticallymodified (GM) crops. Food and Chemical Toxicology 42,1047–1088.

    KRISHNAMOORTHY, K. & LIAN, X. D. (2012). Closed-form ap-proximate tolerance intervals for some general linearmodels and comparison studies. Journal of StatisticalComputation and Simulation 82, 547–563.

    KRISHNAMOORTHY, K. & MATHEW, T. (2004). One-sided toler-ance limits in balanced and unbalanced one-wayrandom models based on generalized confidence inter-vals. Technometrics 46, 44–52.

    KRISHNAMOORTHY, K. & MATHEW, T. (2009). StatisticalTolerance Regions: Theory, Applications, andComputation. Hoboken, NJ: Wiley.

    LEE, Y. H., SHAO, J. & CHOW, S. C. (2004). Modified large-sample confidence intervals for linear combinations ofvariance components: extension, theory, and application.Journal of the American Statistical Association 99, 467–478.

    LIAO, C. T., LIN, T. Y. & IYER, H. K. (2005). One- and two-sidedtolerance intervals for general balanced mixed modelsand unbalanced one-way random models.Technometrics 47, 323–335.

    MCNALLY, R. J., IYER, H. &MATHEW, T. (2003). Tests for individ-ual and population bioequivalence based on generalizedP values. Statistics in Medicine 22, 31–53.

    OECD (1993). Safety Evaluation of Foods Derived byModern Biotechnology: Concepts and Principles. Paris,France: Organization for Economic Cooperation andDevelopment.

    PARK, D. J. & BURDICK, R. K. (2003). Performance of confi-dence intervals in regression models with unbalancedone-fold nested error structures. Communication inStatistics – Simulation and Computation 32, 717–732.

    QUIROZ, J., TING, N., WEI, G. C. G. & BURDICK, R. K. (2002).Alternative confidence intervals for the assessment ofbioequivalence in four-period cross-over designs.Statistics in Medicine 21, 1825–1847.

    SAS Institute Inc. (2011). SAS/STAT® 9·3 User’s Guide.Cary, NC: SAS Inst. Inc.

    SCHUIRMANN, D. J. (1987). A comparison of the two one-sidedtests procedure and the power approach for assessing theequivalence of average bioavailability. Journal ofPharmacokinetics and Biopharmaceutics 15, 657–680.

    SEARLE, S. R. (1987). Linear Models for Unbalanced Data.New York: John Wiley and Sons.

    TING, N., BURDICK, R. K., GRAYBILL, F. A., JEYARATNAM, S. &LU, T. F. C. (1990). Confidence interval on linear combi-nations of variance components that are unrestricted insign. Journal of Statistical Computation and Simulation35, 135–143.

    TSUI, K.W. &WEERAHANDI, S. (1989). Generalized P values insignificance testing of hypotheses in the presence of nuis-ance parameters. Journal of the American StatisticalAssociation 84, 602–607.

    VAHL, C. I. & KANG, Q. (2015). Equivalence criteria for thesafety evaluation of a genetically modified crop – a statis-tical perspective. The Journal of Agricultural Science,Cambridge. doi: S0021859615000271.

    VAN DER VOET, H., PERRY, J. N., AMZAL, B. & PAOLETTI, C. (2011).A statistical assessment of differences and equivalencesbetween genetically modified and reference plant var-ieties. BMC Biotechnology 11, 15. doi: 10.1186/1472–6750-11-15.

    VENKATESH, T. V., BREEZE, M. L., LIU, K., HARRIGAN, G. G. &CULLER, A. H. (2014). Compositional analysis of grainand forage from MON 87427, an inducible male sterileand tissue selective glyphosate-tolerant maize productfor hybrid seed production. Journal of Agricultural andFood Chemistry 62, 1964–1973.

    WARD, K. J., NEMETH, M. A., BROWNIE, C., HONG, B.,HERMAN, R. A. & OBERDOERFER, R. (2012). Comments onthe paper “A statistical assessment of differences andequivalences between genetically modified and referenceplant varieties” by van der Voet et al. 2011. BMCBiotechnology 12, 13. doi: 10.1186/1472–6750-12-13.

    WEERAHANDI, S. (1993). Generalized confidence intervals.Journal of the American Statistical Association 88, 899–905.

    WHENT, M., HAO, J. J., SLAVIN, M., ZHOU, M., SONG, J. Z.,KENWORTHY, W. & YU, L. L. (2009). Effect of genotype, en-vironment, and their interaction on chemical compositionand antioxidant properties of low-linolenic soybeansgrown in Maryland. Journal of Agricultural and FoodChemistry 57, 10163–10174.

    1408 Q. Kang and C. I. Vahl

    http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0021859615001367Downloaded from http:/www.cambridge.org/core. Kansas State University Libraries, on 28 Oct 2016 at 20:04:55, subject to the Cambridge Core terms of use, available at

    http:/www.cambridge.org/core/termshttp://dx.doi.org/10.1017/S0021859615001367http:/www.cambridge.org/core

  • APPENDIX: SAS CODE FOR THE CASE STUDY

    Statistical testing procedures for genetically modified organism safety evaluation 1409

    http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0021859615001367Downloaded from http:/www.cambridge.org/core. Kansas State University Libraries, on 28 Oct 2016 at 20:04:55, subject to the Cambridge Core terms of use, available at

    http:/www.cambridge.org/core/termshttp://dx.doi.org/10.1017/S0021859615001367http:/www.cambridge.org/core

  • 1410 Q. Kang and C. I. Vahl

    http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0021859615001367Downloaded from http:/www.cambridge.org/core. Kansas State University Libraries, on 28 Oct 2016 at 20:04:55, subject to the Cambridge Core terms of use, available at

    http:/www.cambridge.org/core/termshttp://dx.doi.org/10.1017/S0021859615001367http:/www.cambridge.org/core

  • Statistical testing procedures for genetically modified organism safety evaluation 1411

    http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0021859615001367Downloaded from http:/www.cambridge.org/core. Kansas State University Libraries, on 28 Oct 2016 at 20:04:55, subject to the Cambridge Core terms of use, available at

    http:/www.cambridge.org/core/termshttp://dx.doi.org/10.1017/S0021859615001367http:/www.cambridge.org/core

  • 1412 Q. Kang and C. I. Vahl

    http:/www.cambridge.org/core/terms. http://dx.doi.org/10.1017/S0021859615001367Downloaded from http:/www.cambridge.org/core. Kansas State University Libraries, on 28 Oct 2016 at 20:04:55, subject to the Cambridge Core terms of use, available at

    http:/www.cambridge.org/core/termshttp://dx.doi.org/10.1017/S0021859615001367http:/www.cambridge.org/core

    Statistical procedures for testing hypotheses of equivalence in the safety evaluation of a genetically modified cropINTRODUCTIONMATERIALS AND METHODSProperties of summary statisticsModified large sample methodGeneralized pivotal quantity method

    RESULTSSimulation studyCase study

    DISCUSSIONREFERENCES