advisory opinion on the design of sampling plans for a

33
Advisory Opinion on the Design of Sampling Plans for a Seed Threshold Value of 0.1% for Genetically Modified Organisms (GMOs) given by: Dr. Michael Kruse Professor of Seed Science and Technology at the University of Hohenheim Schubertweg 2 D-71111 Waldenbuch Germany for: The Land of Schleswig Holstein, represented by the Minister-President, represented in turn by the Ministry of Agriculture, Environment and Rural Areas (MLUR) Mercatorstraße 3, D-24106 Kiel Germany Contents: 1. Summary conclusions 2 2. Preliminary remarks 3 3. Documentation of fundamentals 3 3.1 Definitions 3 3.2 Fundamentals of acceptability statistics 5 3.3 Boundaries of underlying conditions 6 3.3.1 Reference quantity of the GMO threshold value 6 3.3.2 Producer and consumer risk 6 3.3.3 Quality indicators of the test methods 8 3.3.4 Sampling from the lot of seed 9 3.4 Calculation of testing plans 10 3.5 Testing plans according to the seed concept of the UAM 11 4. Testing plans for the scenario "minimal risk to the seed purchaser" 11 4.1 Testing plans for qualitative detection methods 11 4.2 Testing plans for quantitative detection methods 13 4.3 Conclusions 15 5. Testing plans for the scenario "minimal risk to the seed seller" 16 5.1 Testing plans for qualitative detection methods 16 5.2 Testing plans for quantitative detection methods 17 5.3 Conclusions 18 6. Testing plans for the scenario "equal allocation of the risks" 18 6.1 Testing plans for qualitative detection methods 18 6.2 Testing plans for quantitative detection methods 19 6.3 Conclusions 22 7. Effects of the 0% threshold value for non-permitted GMOs 22 8. Examination of reserve samples 23 9. Bibliography 26

Upload: others

Post on 24-Nov-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Advisory Opinion

on the Design of Sampling Plans

for a Seed Threshold Value of 0.1% for Genetically Modified Organisms (GMOs)

given by: Dr. Michael Kruse Professor of Seed Science and Technology at the University of Hohenheim Schubertweg 2 D-71111 Waldenbuch Germany for: The Land of Schleswig Holstein, represented by the Minister-President, represented in turn by the Ministry of Agriculture, Environment and Rural Areas (MLUR) Mercatorstraße 3, D-24106 Kiel Germany Contents: 1. Summary conclusions 2 2. Preliminary remarks 3 3. Documentation of fundamentals 3 3.1 Definitions 3 3.2 Fundamentals of acceptability statistics 5 3.3 Boundaries of underlying conditions 6 3.3.1 Reference quantity of the GMO threshold value 6 3.3.2 Producer and consumer risk 6 3.3.3 Quality indicators of the test methods 8 3.3.4 Sampling from the lot of seed 9 3.4 Calculation of testing plans 10 3.5 Testing plans according to the seed concept of the UAM 11 4. Testing plans for the scenario "minimal risk to the seed purchaser" 11 4.1 Testing plans for qualitative detection methods 11 4.2 Testing plans for quantitative detection methods 13 4.3 Conclusions 15 5. Testing plans for the scenario "minimal risk to the seed seller" 16 5.1 Testing plans for qualitative detection methods 16 5.2 Testing plans for quantitative detection methods 17 5.3 Conclusions 18 6. Testing plans for the scenario "equal allocation of the risks" 18 6.1 Testing plans for qualitative detection methods 18 6.2 Testing plans for quantitative detection methods 19 6.3 Conclusions 22 7. Effects of the 0% threshold value for non-permitted GMOs 22 8. Examination of reserve samples 23 9. Bibliography 26

1. Summary conclusions The introduction by the Federal Länder of an "official threshold value" of 0.1% for contamination present in conventional seed containing permitted GMOs is intended above all to permit an unambiguous decision to be arrived at in conjunction with the application of quantitative detection methods, and thus to offer reliability for the supervisory bodies and the seed trade, and to define a uniform threshold for its performance, on the basis of which measures can be taken in respect of the affected lot of seed. This advisory opinion makes it clear that the setting of threshold value of 0.1% is necessarily associated with the quantification of the risks presented to producers and consumers by incorrect decisions, as the threshold value is otherwise essentially meaningless for the practical test and its performance. In the event that the previous distribution of the risk is retained - above all the allocation of risk to the consumers - as set out in the "seed concept" of the UAM (Chapter 2.3), however, the official threshold value of 0.1% cannot be achieved with quantitative detection methods. For example, quantitative detection processes with quantification limits in the order of 0.03% and sample sizes of 8000 seeds would be required in order to achieve this (see Testing plan 7 below). However, such detection methods are not available with the required scope at the present time. In the case of maize, a sample size of 8000 seeds would require the milling of around 3 kg for a test. The test laboratories are unable to do this on the grounds of capacity. The aforementioned object of permitting an unambiguous decision to be arrived at with quantitative detection methods is clearly missed if the risk distribution remains unchanged. If, however, the distribution of risk to the consumers can be on a higher level than previously, quantitative detection methods will permit the use of testing plans which allow a test to be carried out on a total sample size of 3000 seeds (see Testing plan 16). This test can then be easily linked with investigating for the absence of non-permitted GMOs. It is absolutely essential, however, for a political decision on the distribution of risk to be taken initially in this respect. It would be preferable for this decision to be taken quickly and unambiguously, so that clear guidance can be given to all concerned (growers, seed producers, seed dealers, farmers, test laboratories and the relevant authorities). 2. Preliminary remarks The definitions that are used here are not intended to assert an entitlement to general applicability, or to be taken as exact, precise formulations corresponding to statistical usage, but should serve to assist understanding of the relevant contexts. The testing plans illustrated below are intended to be taken as examples for guidance. The stipulated conditions in each case can be met by a number of testing plans, although it is not considered helpful to list them here. On the contrary, the illustrated testing plans provide an idea of what form a test might take, and what level of expenditure might be incurred for this test. The sample sizes specified in each case, for example, accordingly appear to be arbitrary. However, they were arrived at in such a way that the cost of analysis in each case is minimized. In any case, each

laboratory is required to draw up its own testing plans according to its individual methodological indices (limit of detection, limit of quantification, error rates), so that the plans presented here cannot and should not be adopted indiscriminately. 3. Documentation of fundamentals Only the threshold values for any unavoidable occurrences of GMOs in the seed of conventional varieties were discussed as a regulative instrument in the course of the political discussion of the significance of the contamination of seed with GMOs. As far as the practical buying and selling of seed is concerned, however, unambiguous definitions are absolutely essential if the verification of compliance with these threshold values is to be reliable. It is, in fact, possible, by investigating only a single seed, to obtain an analysis result (0% or 100% GMO) that would in principle permit testing for compliance with a threshold value (smaller than 0.1% or greater than 0.1%). There is no doubt that such testing would be unsatisfactory, although this does illustrate the need to be able to control the reliability. It is appropriate, therefore, to devote some consideration to the so-called acceptability statistics, which make up the main part of this opinion. 3.1 Definitions Threshold value (here GMO seed threshold value): Threshold values are essentially abstract values, since they relate to the true and, as such, generally unknown characteristic of an object. A threshold value of 0.1% for the GMO contamination of a lot of seed thus relates to the actual "true value of a lot of seed" existing in the complete lot and not - in view of the associated potential error - directly to the test result. To put it another way, the lot of seed must meet the threshold value, and not the test result. If the true value of the lot of seed exceeds the threshold value, appropriate steps should then be taken, for example by labelling the lot of seed as being GMO-contaminated. The aim of a testing plan is to formulate a decision rule in such a way that, in spite of the true value not being known, this aim is achieved with high reliability on the basis of test results. Thus, in spite of the essentially abstract basis, the threshold value has a high relevance for the practical implementation. Acceptance Limit (AL): The acceptance limit indicates, in the case of quantitative methods of analysis, the measured quantity at which steps must be taken if it is exceeded. This threshold must be defined in such a way that, if it is exceeded, there is a sufficient probability that the true value for the lot of seed will exceed the threshold value. In the most simple case, the acceptance limit can correspond to the threshold value. The acceptance limit can also deviate from the threshold value, however, and can be lower or higher. Consumer risk: The consumer risk is the probability that a lot of seed, of which the true value lies above the threshold value, was not recognized as such in the test previously carried out, since the measurement result lies beneath the acceptance limit. Consequently, no steps will then be taken in respect of this lot. No health risk from GMOs is intended in this case, but rather the statistical probability of an incorrect decision in respect of compliance with the threshold value to the disadvantage of the seed purchaser. This risk can be defined quantitatively by two parameters: LQL and beta. LQL (Lower Quality Level): LQL is used to designate the poor quality (high actual GMO contamination as a percentage) of a tested lot of seed, which the seed

purchaser or also the supervisory authority would still be prepared to tolerate at a probability of occurrence of beta %. Beta (β): Beta is used to designate the probability, as a percentage, that the seed lots that have been tested and are not rejected will still include lots with a (high) actual contamination of LQL %. A value of 5% is frequently assumed for β. Producer risk: The producer risk includes the probability that a lot of seed, of which the actual value lies below the threshold value, was not recognized as such in the test that was carried out and (in this case unjustifiably) is rejected. The statistical probability of an incorrect decision to the disadvantage of the seed purchaser is accordingly intended in this case. This risk in turn can be defined quantitatively by two parameters: AQL and alpha. AQL (Acceptance Quality Level): AQL is used to designate the good quality (low actual GMO contamination as a percentage) of a lot of seed, which, in the test that has been conducted with a probability of alpha %, leads to a result which (in this case unjustifiably) involves rejection. Alpha (α): Alpha designates the probability that a lot of seed with a GMO contamination of AQL% (below the threshold value) in the test that has been conducted leads to a result which (in this case unjustifiably) involves rejection. A value of 5% is frequently assumed for α. Testing plan: A testing plan is an instruction in respect of how to conduct a test. It contains details of the number and size of the sub-samples to be processed and tested, as well as a decision rule for use in summarizing and evaluating the individual results. Submitted sample: The submitted sample is the sample submitted to the test laboratory, which must meet the requirements in respect of size, labelling and packaging according to the relevant regulations in each case. Test sample: The test sample is the complete submitted sample or a part thereof and is used for the test. Sub-samples: The test sample can be subdivided for the purposes of the test into sub-samples, which are then included in the test as repeats. Reference sample: The reference sample is a sample that was taken from a lot of seed in the course of the actual sampling process in the same way as a submitted sample. The reference sample must meet the same requirements as the submitted sample in respect of its size, labelling and packaging and as such is capable of permitting a further, complete and equally representative test. Limit of detection (LoD): The limit of detection of a method is the minimum GMO contamination that is still capable of reliable qualitative detection by the method concerned in a seed sample (contamination present / not present). Limit of quantification (LoQ): The limit of quantification of a method is the minimum GMO contamination that is still capable of being reported sufficiently reliably with the method concerned as a percentage in a seed sample. The limit of quantification is customarily higher than the limit of detection, since the requirements relating to the result are higher (not simply "yes" or "no"; if "yes", then how many) and a higher

reliability is also required as a result, which is only achieved in the case of heavier contaminations. Qualitative methods: These methods can be applied to testing of whether an sample contains contamination by a GMO. The result is a Yes/No result. In the case of a "Yes", it is not possible to indicate how heavy the contamination of the sample is. It is customary to designate a detected GMO contamination as a "positive" result, and the absence of any such detection as a "negative" result. Sub-sampling: When applying qualitative methods, a frequently utilized form of testing plans is so-called sub-sampling. For this purpose, a number of sub-samples is individually tested qualitatively, and the number of positive and negative sub-samples is obtained as the result. The "acceptance limit" is then the maximum permissible number of positive sub-samples within the tested sub-samples. The number of positive and negative sub-samples can even be used to provide a nevertheless rather inaccurate estimate of the actual proportion of GMOs in the lot of seed. The maximum size of a sub-sample is obtained in these testing plans from the limit of detection of the method, that is to say the sub-sample must be so small that an individual GM seed still produces a positive result for this sub-sample. By investigating a number of sub-samples which satisfy these requirements, each of which is analysed qualitatively individually and must be negative, the limit of detection of the analysis can be reduced overall to values below the limit of detection of the method (e.g. limit of detection 0.1% = 1 in 1000, and in the analysis of 3 sub-samples, each containing 1000 seeds, 1 seed in 3000 seeds can be found = 0.03%). Quantitative methods: The percentage share of any GMO contamination that is present can be measured by the use of these methods. This takes the form of the quantitative measurement of a particular characteristic of an sample, which can be converted into a GMO percentage share by means of a calibration function. 3.2 Fundamentals of acceptance statistics The definitions of the acceptance statistics indicated above are repeated in graphical form in Figure 1. A threshold value of 0.1% has been taken as the basis for this. The ideal situation would be reached if all lots were to be accepted with actual GMO contamination of 0.1% and below (probability of acceptance 100%), and if those with GMO contamination above 0.1% up to 100% were to involve rejection (probability of acceptance 0%) (as indicated by the thick, vertical broken line in Figure 1). This could only be achieved, however, in a full test of the entire lot and as such is unrealistic. It is necessary in practice, on the other hand, to test samples that are very small (in the order of 1000 to 10,000 seeds) by comparison with the lot of seed (in the case of cereal grain, a 30 t lot of seed contains around 600 million seeds). This gives rise to sampling errors, which occur in conjunction with sampling from the lot, but above all in conjunction with the division of the sample in the test laboratory, and which lead to wrong decisions in respect of compliance with the threshold value when evaluating the results. This can lead to a situation, for example, in which, as shown in Figure 1, the probability of acceptance follows a curve which is drawn more or less steeply in its width. This curve is also referred to as the operating characteristic curve. The position of this curve and its width are dependent only on the testing plan in each case.

[key] Annahmewahrscheinlichkeit (%) probability of acceptance (%) Wahrer Wert in der Partie (%) actual value in the lot (%) alpha alpha Produzentenrisko risk to producers beta beta AQL AQL LQL LQL Konsumentenrisko risk to consumers Figure 1: Diagrammatic view of the probabilities of acceptance for seed lots with actual GMO proportions of 0.1% in the (unrealistic) ideal case of a full test (thickness, broken line), and for an approximately symmetrical distribution of the risks to producers and consumers with a definition of LQL and β. Complex testing plans involving a large number of large samples produce steep, narrow curves, which are associated with only low risks to consumers and producers. Testing plans with a small number of small samples produce flat curves with major risks to both sides. It is already clear from these simple descriptions that, although the testing plan takes account of the threshold, it depends above all on the required steepness of the operating characteristic curve for its determination. It is accordingly necessary in the first place, before individual testing plans can be discussed, to determine the required form of the operating characteristic curve, which is done ideally by the application of the definitions of LQL and β on the one hand and AQL and a on the other hand. It is possible by this means to establish a framework for all further discussions, which must be adhered to by the testing plans that are used. The definition as a whole of these 4 values and any necessary operating characteristic curve is not a statistically determined routine for a given threshold value, however, but above all this definition includes the specific expectations of those involved in each case in the accuracy and reliability of the decisions, which naturally also depend on what measures are adopted in the event of the test results being too high. This is a risk management question, therefore, and there is no doubt that the decision about the operating characteristic curve in respect of GMOs is a process that must be conducted at a political level. Any such political discussion is not the concern of this opinion, however. Nevertheless, in order to be able to compare the testing plans that are set out below, the definitions that have been

selected are not intended to constitute suggestions or even default values for the necessary further political discussion. The choice of the individual values is indicated in the corresponding examples, although these do not have any particular background and are thus regarded as arbitrary for practical purposes. 3.3 Boundaries of underlying conditions 3.3.1 Reference quantity of the GMO threshold value As soon as a threshold value which differs from 0% is defined, it is then necessary to specify the basis on which the determined percentage value should be defined. Various reference values are possible, depending on the method: 1. The number of seeds. (0.1% would then correspond to 1 GM seed per 1000

seeds, for example). 2. The mass of the seeds. (0.1% would then correspond to 1 g of GM seed per

1000 g of seeds, for example). 3. The number of haploid genomes. (0.1% would then correspond to 1 haploid GM

genome per 1000 haploid genomes, for example). Taking a simple diploid seed as an example, the difference between definitions 1 and 3 is already obvious: this GM seed is likely to have come into existence as a result of the GM pollination of a non-GM plant in the course of seed production. The presence of this seed among 999 non-GM seeds represents precisely 0.1% of the total number of seeds. The 1000 seeds together contain 2000 haploid genomes, however, and of these only one is GM. This gives a proportion of 1 GM haploid genome in 2000 haploid genomes = 0.05%. As a result of complicated structures of seeds with thicker, purely maternal pericarps and seed coats, a purely maternal perisperm or a triploid endosperm, and in addition different DNA rates of extraction from these tissues, the differences between definitions 1 and 3 are displaced to a very great extent and can amount to up to 5 times one of the individual results (that is to say 0.1% versus 0.02%). Only a threshold value of 0% is free from a reference basis, since any contamination, regardless of the basis on which it is quantified, leads to the threshold value being exceeded. Also equally independent of this definition are testing plans in which qualitative methods alone are used to determine whether GMO contamination is present in a test sample. The requirement for a definition of a reference basis only presents itself when a quantitative result needs to be determined and reported. This must always be taken into account in the subsequent discussion. The reference quantity "number of seeds" is used below to explain the qualitative methods, and the reference basis "haploid genome" for the quantitative methods, in full awareness that, because of this change, the results and also the conclusions derived therefrom are different for the two methods. This is addressed again in the summary presentation. 3.3.2 Producer and consumer risk As illustrated above, the determination of an operating characteristic curve is a political process, which is not yet complete as far as GMO testing is concerned. However, in order to be able to develop testing plans without anticipating the decision, three different scenarios with three different operating characteristic curves are presented below, which presumably represent two extreme variants and one medium variant with regard to the threshold value of 0.1%. An attempt will be made to provide a comparative illustration of the advantages and disadvantages and the effects of these three definitions for the testing plans, in order to make preparations for an informed decision for the political level concerned.

1. Minimal risk for the seed purchaser (LQL = threshold value = 0.1%, β = 5%; AQL and α are not defined, because this scenario relates solely to consumer protection). An operating characteristic curve for this scenario is illustrated in Figure 2. In this distribution of the statistical risks, the greatest weight is attached to ensuring that there is only a very small probability of lots of seed with GMO contamination above the threshold value not being rejected. The testing plan must be very strictly designed, therefore. On the other hand, this means that there is a rather high probability that lots of seed with actual values below the threshold value of 0.1% will also involve rejection. The desired low risk to the seed purchaser cannot be achieved without taking this into consideration, because the path of the curve cannot exhibit any steepness because very large test samples would then be necessary.

[key] Annahmewahrscheinlichkeit (%) probability of acceptance (%) Produzentenrisko risk to producers LQL LQL Konsumentenrisko risk to consumers beta beta Wahrer Wert in der Partie (%) actual value in the lot (%) Figure 2: Acceptance probabilities in the scenario "Minimal risk for the seed purchaser". The probability here that a lot with an actual GMO contamination of 0.1% LQL will not involve rejection is 5% (beta). It should be noted that a definition of LQL and β of this kind must lead to a very steep curve if the expectation exists that GMO-free seed lots will be identified as being practically defect-free as such. In fact, the operating characteristic curve in this case in the narrow range from 0 to 0.1% must decrease from a probability of acceptance of almost 100% to 5%. 2. Minimal risk for the seed producer (AQL = threshold value = 0.1%, β = 5%; LQL and α are not defined, because this scenario relates solely to producer protection). An operating characteristic curve for this scenario is illustrated in Figure 3. In this case, the greatest weight is attached to ensuring that there is only a very small probability of lots of seed with GMO contamination below the threshold value also not being rejected. The testing plan used in this case must be designed so that rejection only occurs if it is really certain that the actual value lies above the threshold value.

On the other hand, however, this means that there is a rather high probability that lots of seed with actual values above the threshold value of will also not be rejected, since the results are not sufficiently certain to enable a decision to be reached that the actual value really lies above the threshold value without any doubt.

[key] Annahmewahrscheinlichkeit (%) probability of acceptance (%) Produzentenrisko risk to producers Konsumentenrisko risk to consumers Wahrer Wert in der Partie (%) actual value in the lot (%) Figure 3: Acceptance probabilities in the scenario "Minimal risk for the seed producer". The probability here that a lot with an actual GMO contamination of 0.1% AQL will not involve rejection is 95% (α). It should be noted that the probability of acceptance in the range from 0 to 0.1% in this case only needs to decrease from practically 100% to 95%, and that significantly flatter curves (and thus smaller test samples) than those in scenario 1 are possible. This has been selected in Figure 2, for example. Nevertheless, steeper curves with larger test samples would also be possible here. 3. Equal allocation of the risks (LQL = 0.2%, β = 5%, AQL = 0.03%, α = 5%). One possible operating characteristic curve for this scenario is already illustrated in Figure 1. The risks for both sides are depicted as being more or less in equilibrium in this case. The consequence is that lots with an actual GMO contamination of around 0.1% (threshold value) in particular exhibit a probability of 50 to 50 of involving rejection. The balance of the risks is at its clearest here. 3.3.3 Quality indicators of the test methods Threshold values close to or equal to 0% immediately give rise to the problem of the limit of detection and the limit of quantification. From time to time, a threshold is also justified directly by the limit of detection. The following should be noted in this respect: The limit of detection is at present the critical quality parameter for qualitative and quantitative methods. It indicates the number of seeds in which the presence of a

single, solitary GMO seed will still lead to a clear and reproducible, positive, qualitative result. An example of this would be 1 GM seed in 3000, i.e. a limit of detection of 0.03%. The limit of detection thus relates to the tested sample. This can be the entire test sample, although it can also be sub-samples of it. A detection process can thus be conducted for a threshold value of 0.1%, even if methods with limits of detection of 0.5% were to be available. Several sub-samples, each with a maximum of 200 (0.5 = 1 in 200) seeds, would then require to be tested in this case, in order to ensure that the presence of a single GM seed in a sub-sample is always reliably identified. The absence of GMOs from 3000 seeds, which could be the decision threshold in a testing plan, for example, could be verified reliably by the analysis of 15 samples, each of 200 seeds, using a method which has a limit of detection of only 0.5%. It should be noted that this "addition" of sub-samples is only possible when considering the limit of detection in conjunction with qualitative processes, although not in conjunction with quantification and the relevant limit of quantification (see below). Limits of detection of 0.03% and 0.1% for the test methods are used for further considerations. The limit of quantification is an additional important parameter for quantitative methods. Account must be taken in this case of the fact that the limit of quantification cannot be lowered further by the analysis of more sub-samples, and that only the uncertainty of the results can be reduced. A lowering of the limit of quantification can only be achieved through methodological changes. A limit of quantification of 0.1% is used below as the starting point. In the case of qualitative methods of analysis, the false-positive rate, fpr, and the false-negative rate, fnr, are still important parameters. The first, fpr, relates to the case in which no GMO contamination was present in the tested sub-sample, but where the result of the method is nevertheless positive (erroneous approach). The second rate, fnr, also relates to the case in which GMO contamination was present in the tested sub-sample, but where the result of the method is nevertheless negative (erroneous approach). The aim of any laboratory is to bring both rates down to 0% and to hold them there, although this is not always achieved or achieved uniformly for all methods. It is important to differentiate between sampling errors and these error rates. Sampling errors lead to the sub-samples submitted for analysis deviating from the actual value in the lot, and the error rates can be attributed to the test result deviating from the identity of the sub-sample. The total error accordingly consists of the two errors together. We are concerned here with additive error components. Both rates are assumed below mainly to have a value of 0%, although also occasionally 3%, in order to illustrate the effects. In the case of quantitative detection methods, two indices are of considerable significance for the precision of the method; these are referred to as the " coefficient of variation " and the "flour standard deviation" in the SeedCalc nomenclature (see below). The coefficient of variation is a percentage measure for the distribution of the results of tests that are performed on one and the same flour sample. Values between 25 and 50% are commonly set in this case. To put it another way, a coefficient of variation of 30% means that around 67% of the repeat values lie in a range of +/- 30% to either side of the mean value. Thus, for a mean value of 0.1%, this would give a range of 0.07% to 0.13%. The remaining 33% lie outside this range, with around half below and the other half above. If the mean value is higher, the range is broader (for a mean value = 0.5%: 0.35% to 0.65%). It should be noted, however, that this range is always applicable only to tests performed on the same flour sample, that is to say on flour samples consisting of reference materials with known GMO contents. The flour standard deviation measures the distribution

between the flour samples, which are taken from the material milled from a sub-sample and from which the DNA is then extracted. The fineness of the milled material, the mixture of the milled material, the nature of the sampling and the size of the flour sample are all included here, and we are accordingly dealing with a value that relates very closely to the method, nature, laboratory and personnel concerned. The value of 0.020% is used for many illustrative calculations, including in this case, too. 3.3.4 Sampling from the lot of seed For the purposes of quality testing in the context of the seed certification procedure, or also in the context of seed marketing inspection, sampling is carried out according to the ISTA regulations. The size of a lot of seed and the number of initial samples to be taken from a lot of seed are clearly defined here. In the quality testing operation, for example, results are obtained in the course of testing for the presence of other seeds, which permit the testing for compliance with standards which stipulate, for example, that a maximum of 10 seeds of other species may be present in a sample of around 12,500 seeds. This is 1 in 1250 or 0.08%. Even lower standards down to 0 seeds in 12,500 seeds are also tested routinely at the current time. Because the ISTA sampling regulations are designed for these standards close to 0%, and even equal to 0%, and have been used routinely for decades for traditional seed testing, they can also be applied without restriction to GMO testing. The minimum number of initial samples to be taken from the lot of seed is sufficient to assure adequate representativitity of the average quality of the lot of seed by the submitted sample. Only the size of the submitted sample may need to be increased depending on the testing plan. Sample sizes in this case must be set so that, even in the presence of large-grained material (maize, soya beans), a sufficient quantity of seed is available for testing. If 5000 seeds were required, for example, a sample size of 2 kg would be entirely appropriate for maize. A combination of customary thousand-grain masses may be ascertained for the various species at the seed testing stations. The necessary sample sizes in grams are then easily calculated from these. Depending on the organization of the sampling and the sample handling, a check can also be made to establish whether a submitted sample should undergo only a single complete test, or whether a second complete test should be possible from the same submitted sample. In the latter case, the size of the submitted sample should be doubled. This decision is not one that cannot be taken from the outside, however, but is a decision that is also dependent on the quality management of the laboratory concerned. The point of view to be considered in this case is the possibility, which must be accommodated at all times, of conducting a second test without having to request the provision of a reference sample from outside. A further point to be considered in this respect is the fact that a second test of this kind is a test of a "reference sample", the result of which must not necessarily confirm the first result in each case (see Chapter 8). This is also true in the event that the division of the sample has taken place in the seed or gene technology laboratory. The taking of further submitted samples and reference samples from the lot of seed in a new seed sampling process can also be performed under practical conditions. The ISTA regulations in this case tolerate a greater deviation between the first test and the second test for the parameter "technical purity", than if both tests were to be based on the same sampling process. This is not the case, on the other hand, for the parameters germination power, presence of other seeds and purity (electrophoresis),

for which the tolerated deviations are independent of the sampling process. The reason for this is that any inhomogeneity that is present in the lot occurs above all in conjunction with the different flow characteristics of the particles in a lot of seed, and that parameters whose form is not associated (or not closely associated) with different flow characteristics exhibit inhomogeneities only to a small degree. This also includes the varietal purity and thus also the GMO contamination. GM seeds and non-GM seeds will differ less from one another in terms of their grain characteristics than, for example, the small seeds and the large seeds contained in the non-GM seeds. Accordingly, the need to consider the inhomogeneity of seed lots is generally of secondary importance, as is the question of whether the samples originate from the same sampling process or from different sampling processes. The situation could be different, however, if a mixture of non-GM lots with GM lots were to arise, for example in the final stage in conjunction with sacking, when local contamination can occur which can lead to strong inhomogeneity and, as a result, to greater differences between two submitted samples from different sampling operations. This cannot be assumed to be the case, however, in the present very weak distribution of GM lots in seed production and in seed producing companies. It can be assumed, therefore, that the submitted samples and the reference samples, to the extent that they were taken and reduced according to ISTA regulations, are equivalent in value, even if they originate from different sampling processes. An important consideration when sampling is that both the taking of the initial samples from the lot of seed and the reduction of the size of the sample must be performed using suitable apparatus and methods that are described in the ISTA regulations. Dividing the sample as far as the sub-sample, which is then milled, must be carried out with suitable sample dividers. The same is true of taking small flour samples from the milled material from large samples. If no suitable sample dividers are available in this case, for simplicity one small sample should be taken from at least 10 uniformly distributed points, which can then be combined to produce the required sample, and this can itself be reduced further as necessary. The ISTA regulations contain plans for the intensity of sampling and the sampling devices that are suitable both for the sampling of sacked goods and for the sampling of seed in loose bulk. It is recommended to adhere closely to these regulations. It is then possible to conclude that the submitted samples are equivalent in value. 3.4 Calculation of testing plans All the testing plans set out in Chapters 4 and 5 were calculated using the SeedCalc 7.1 EXCEL spreadsheet. The statistical background of SeedCalc is described in detail in the publication by K. Remund et al. Above all, the binomial distribution is used to produce the testing plans for qualitative methods of analysis. For the quantitative methods, these models are associated with the normal distribution or the truncated normal distribution, which contain a number of additional assumptions and simplifications when the areas to be considered lie close to 0.0%. The statistical method of the models used in each case cannot be reproduced here, as they are extremely complex. Essentially, the user instructs SeedCalc to run a testing plan (e.g. investigate 15 sub-samples each of 120 seeds, of which a maximum of 5 may be positive; fpr = fnr = 0%), and SeedCalc will then calculate the probabilities that lots with actual GMO proportions lying within a specified range (e.g. 0 to 0.5%) will pass this test (probabilities of acceptance). The probabilities of acceptance (which then correspond

to 1-α and β) are stated numerically for the actual proportions, which correspond to AQL and LQL, and the probabilities of acceptance for the other actual values can be read from a graphical representation of the operating characteristic curve. The values for AQL, LQL α and β and, for example, the size of a sub-sample, can also be specified in another form of the application, and SeedCalc will then calculate the number of sub-samples to be analysed and the maximum number of positive sub-samples required in order to meet these values. Other targets must be set for quantitative methods, and from these SeedCalc will then calculate the probabilities of acceptance for lots of seed with actual GMO shares in a specific range and will produce an operating characteristic curve. All the results of SeedCalc are based on random sampling, which means that no systematic errors may occur when dividing the sample, and thorough mixing of the samples is necessary in every case before taking the samples. The ISTA methods for sample reduction are recommended emphatically for this purpose. SeedCalc is at present available in version 7.1, which has already completed a number of trials and has proven its value in many applications and tests. An enhanced Version 8.0 of SeedCalc is in preparation. Figures 1 to 3 in this opinion, in which the principles are illustrated, are not concrete examples that have been calculated with SeedCalc. Figures 4 to 11 are graphical representations taken directly from SeedCalc, and Figure 12 is an illustration of SeedCalc results. 3.5 Testing plans according to the seed concept of the UAM In Chapter 3,1, the seed concept of the UAM sets out two testing plans for the analysis of the presence of GMOs. Both testing plans are based on a test sample size of 3000 seeds, which, according to section 2.3, was arrived at on the basis of the binomial distribution. They differ in the sense that, in the first case, the test sample is divided into three sub-samples, and these are then analysed qualitatively with a limit of detection of 0.1%, and in the second case the entire test sample is analysed qualitatively with a limit of detection of 0.03%. These two variants are presented below as testing plans 2 and 1. In arriving at both plans, account was taken only of the consumer risk, and not the producer risk. Both testing plans are correct and appropriate for the assumptions that have been made, and only the description and interpretation are sometimes presented in the seed concept in a manner that is not statistically perfect. It is stated in sections 3.2.2 and 3.3, however, that further analyses and quantifications may be possible or even necessary in the case of positive findings within 3000 seeds. It should be pointed out at this stage, however, that the sample size of 3000 seeds is part of a testing plan (0 GMOs in 3000 seeds) and must not be used as a fixed quantity for other additional testing plans. If the presence of GM seeds in 3000 seeds is not to result in the lot of seeds being rejected, then the entire testing plan including the 3000 seeds must be questioned. Either the correctly differentiated 3000 seeds must be analysed, and the decision rule (0 GMOs in 3000 seeds) must be adhered to, or a more target-oriented testing plan must be applied. A certain lack of sharpness can be identified in this case, although its actual cause can be attributed to the lack of a statutory default value for the seed threshold and the certainty of its being met.

4. Testing plans for the scenario "minimal risk to the seed purchaser" Testing plans, which depend on qualitative and quantitative detection methods, are set out below for the scenario "minimal risk to the seed purchaser" for a performance threshold of 0.1%. Above all, an LQL = 0.1% and a β value = 5% are met, these being the key figures for this scenario. The scenario is illustrated in principle in Figure 2. The testing plans were calculated with SeedCalc 7.1. The default values entered in SeedCalc 7.1 are marked in yellow in each case in the tables for the testing plans, and the results produced from these by SeedCalc are not colour-marked. 4.1 Testing plans for qualitative detection methods These testing plans only involve, for the analysed sub-samples, determining whether a GM seed is present in the sample in each case. Testing plan 1 involves the qualitative analysis of a sample of around 3000 seeds. Testing plan 1 LQL: 0.1% β: 5% AQL: n.d. α: n.d. Limit of detection: 0.03% False-positive rate: 0% False-negative rate: 0% Number of sub-samples to be tested

Number of seeds per individual sample

Total number of seeds to be tested

Maximum permissible number of positive sub-samples

1 3000 (exactly: 2995) 3000 (2995) 0 * n.d. not determined A positive result must not be obtained. The presence of any GM seed in this sample will lead to a "positive" result, and the lot must then be rejected. It is necessary to establish, however, that a positive result is not caused by dust contamination, for example, but by contamination of a kind that would be transmitted genetically to the next generation (in the sense of the Gentechnikrecht, Genetic Engineering Act). No quantification is required according to this testing plan. The simple fact that a positive result is obtained already provides sufficient certainty, on the basis of testing 3000 seeds, that the threshold of 0.1% has been exceeded with a probability of more than 5%. If quantification is carried out in order to exclude the possibility of contamination by dust, for example, then this can be appropriate. However, the result to be reported is that the analysis of 3000 seeds produced a positive result, and not the quantitatively established percentage value of the contamination. This can be used for internal purposes, although I advise strongly against reporting it, since it would then become the subject of retesting, which would lead to insufficient reproducibility in the case of the analysis of 3000 seeds. Account should also be taken of the fact that GM seeds can be heterozygote, and that the proportion of a GM seed in 3000 seeds at the measurement level of haploid genomes can be significantly lower than 0.03% due to the presence of larger proportions of purely maternal tissue in the seeds. Consequently, measurement values that are smaller than the calculated proportion of a single seed in the test sample do not automatically mean "dust contamination". If the limit of detection of the method is not sufficient to detect the presence of a single GM seed in a sample of 3000 seeds, the test sample of 3000 seeds can be divided, for example, into 3 sub-samples, all three of which must then produce a negative result in order to be able to leave the lot free from rejection.

Testing plan 2 LQL: 0.1% β: 5% AQL: n.d. α: n.d. Limit of detection: 0.1% False-positive rate: 0% False-negative rate: 0% Number of sub-samples to be tested

Number of seeds per sub-sample

Total number of seeds to be tested

Maximum permissible number of positive sub-samples

3 1000 (999) 3000 (2996) 0 * n.d. not determined Account can also be taken in the testing plans in the event that the method used in the test laboratory leads to incorrect determinations with a certain (low) frequency (that is to say, the result deviates from the identity of the sub-sample). Presented below are additional testing plans, based on Testing plan 1, in which a false-positive rate and/or a false-negative rate of 3% are taken into account in each case. Testing plan 3 LQL: 0.1% β: 5% AQL: n.d. α: n.d. Limit of detection: 0.025% False-positive rate: 0% False-negative rate: 3% Number of sub-samples to be tested

Number of seeds per sub-sample

Total number of seeds to be tested

Maximum permissible number of positive sub-samples

1 3880 3880 0 * n.d. not determined Testing plan 4 LQL: 0.1% β: 5% AQL: n.d. α: n.d. Limit of detection: 0.03% False-positive rate: 3% False-negative rate: 0% Number of sub-samples to be tested

Number of seeds per sub-sample

Total number of seeds to be tested

Maximum permissible number of positive sub-samples

1 2965 2965 0 * n.d. not determined Testing plan 5 LQL: 0.1% β: 5% AQL: n.d. α: n.d. Limit of detection: 0.025% False-positive rate: 3% False-negative rate: 3% Number of sub-samples to be tested

Number of seeds per sub-sample

Total number of seeds to be tested

Maximum permissible number of positive sub-samples

1 3850 3850 0 * n.d. not determined Testing plan 6 LQL: 0.1% β: 5% AQL: n.d. α: n.d. Limit of detection: 0.1% False-positive rate: 3% False-negative rate: 0% Number of sub-samples to be tested

Number of seeds per sub-sample

Total number of seeds to be tested

Maximum permissible number of positive sub-samples

4 784 3136 0 * n.d. not determined It is clear that a false-negative rate of 3% makes a significant increase in the size of the test sample necessary in order to eliminate the risk to the consumer. Account must be taken of the fact that, in this case, the analysis with only a single test sample can only be retained if the limit of detection is sufficiently low (in this case: 1 in 3880 = 0.025%). If this is not achievable, sub-samples must be produced. If a limit of detection below 0.1% is not achievable, for example, then Testing plan 4 would have to be replaced by Testing plan 6, in which more than 3000 seeds still require to be tested. It should be noted that all these testing plans, as illustrated in Figure 2, are associated with rather high risks to the producers. This means that lots with an actual contamination of less than 0.1%, which would still be tolerable, in fact, according to the definition of the threshold value, possess only rather small probabilities of not being rejected. If false-positive results occur in addition (testing plan 5), then these

probabilities are further reduced. Even lots with 0% GMO would only have a probability of 97% of not being rejected. For this reason alone, false-positive results must be avoided in any event, and the test laboratories must adopt appropriate measures and work in a documented fashion towards bringing this rate down towards 0%. International ring trials have shown that many laboratories have been successful in this respect. It can be stated by way of summary that the scenario "minimal risk to the seed purchaser" (that is to say a smaller risk to the consumer), can be dealt with very well by means of qualitative testing plans. A disadvantage of these testing plans is that the risk to the producer is relatively large. 4.2 Testing plans for quantitative detection methods Quantitative methods of analysis ultimately have the aim of determining quantitatively the proportion of genetically modified haploid genomes in the milled material of a test sample. The aim is thus not to determine whether GM seeds are present in a sample, but, if GM seeds are present, to determine how high the proportion of genetically modified haploids is. Reference is made to the quality indices of quantitative methods, such as the coefficient of variation and the flour standard deviation (see section 3.3.3). A qualitative statement can naturally also be arrived at by means of the quantitative detection methods. In this application, they do not differ from the qualitative detection methods if appropriate account is taken of the detection threshold of the quantitative methods when determining the size of a sub-sample. In this case, the testing plans shown in section 4.1 can also be used with quantitative detection methods, of which the results are evaluated only qualitatively. It should be noted that the result to be reported in this case is then not the quantified percentage value, but the statement of whether the tested sample contains GMO contamination. If, however, the sub-samples are so large that the presence of a single GM seed can no longer be clearly demonstrated qualitatively with the quantitative detection method, the results for the qualitative evaluation are no longer applicable and a purely quantitative analysis and evaluation will take place. The limit of quantification is then the critical parameter, above which an adequately reliable quantification is only possible in the first place. Since this is in the order of 0.1%, according to a large number of laboratories, it is of considerable significance for the testing plans to be discussed here for a threshold value of 0.1%. In addition, an acceptance limit must be defined as an element of the testing plans, above which the threshold value is considered to have been exceeded. It can be gathered solely from the definition of the limit of quantification (generally = 0.1% for quantitative methods) that any quantitative results, which is capable of being determined with sufficient reliability, will lie above 0.1% and thus also above the threshold value. This means that a quantification is not even necessary, in fact, since from a purely qualitative point of view all "quantifiable results" lie above the threshold value. An opinion that is now frequently heard, however, is that the combination of the threshold value and the limit of quantification (at 0.1%) is believed to represent an appropriate solution, since all results below the threshold value are, in fact, acceptable, and accordingly do not even need to be exactly quantified, and all results above the threshold value can actually be quantified, and that a concrete "%" result

for exceeding the threshold would therefore also be provided and could be notified to those concerned. Furthermore, a sample size of around 3000 seeds, derived from the use of qualitative methods (Testing plan 1), is regarded as adequate. A testing plan of this kind with the associated risks to the seed purchaser is presented below as Testing plan 7. Testing plan 7 LQL: 0.1% β: 50% AQL: α: Limit of detection: 0.1% Limit of quantification: 0.1% Variation coefficient: 30% Flour sub-samples SD: 0.020% Number of sub-samples to be tested

Number of seeds per sub-sample

Number of flour samples per sub-sample

Number of quantifications per flour sample

Total number of quantifications

Acceptance limit

3 1000 2 2 12 0.1% As can be appreciated from Testing plan 7, this approach is not sufficient for the scenario "minimal risk to the seed purchaser" (β = 50%, stipulated value 5%), and it is ultimately misleading in spite of an apparently initially existing inner logic. Figure 4 shows the operating characteristic curve that would occur in this testing plan. It can be clearly appreciated that, for lots with an actual contamination of 0.1% (LQL), the probability of acceptance is 50%. This lies clearly above the 5% for β stipulated in scenario 1 at an LQL of 0.1%. This being the case, it is also obvious from the background that, purely by chance, in conjunction with random sampling in the case of a lot of seed with an actual proportion of GM seeds of 0.1%, approximately one half of the samples will possess a sample value of less than 0.1% and approximately the other half will possess a sample value of more than 0.1%. The sample value will thus be split approximately 50:50, and not the stipulated 95:5. Also, in the case of lots of seed with an actual contamination of 0.2%, for example, the probability of acceptance of 12.8% is still quite high. This testing plan accordingly does not achieve the high consumer protection of Testing plans 1 to 6 involving the use of qualitative methods. It is not appropriate, therefore, to take the sample size from testing plans for qualitative methods of detection and to apply these to quantitative acceptance limits. The sample size of 3000 seeds is suitable for the stricter decision rule, "GMO-positive/negative", but not for the more liberal rule, "if GMO-positive, above or below 0.1%".

[key]

LQL = LQL β (ist) = β (actual) β (soll) = β (target) Figure 4: Operating characteristic curve for Testing plan 7, which does not meet the requirements of the scenario "minimal risk to the consumer". It involves the quantitative testing of 3000 seeds, and any results above 0.1% will involve rejection for the lots of seeds. (Original print-out from SeedCalc, augmented with the labelling of the axes corresponding to that used in Figure 1). The testing plan illustrated below as Testing plan 8 formally satisfies the requirements of the scenario "minimal risk to the seed purchaser" by the use of a quantitative method of analysis. The default values in this case are: LQL = 0.1%, β = 5%, coefficient of variation = 30%, flour sub-samples SD = 0.020%. This involves milling 2 sub-samples, each of 4000 seeds (8000 seeds in total), separately from one another, taking 2 flour samples from each of the samples, and performing 2 measurements in each case on the resulting 4 flour samples or the 4 DNA samples extracted there from, that is to say 8 measurements. The results of the measurements are then averaged, and if the average value exceeds the acceptance limit of 0.037%, the lot of seed must be rejected. Testing plan 8 LQL: 0.1% β: 5% AQL: α: Limit of detection: 0.037% Limit of quantification: 0.037% Variation coefficient: 30% Flour sub-samples SD: 0.020% Number of sub-samples to be tested

Number of seeds per sub-sample

Number of flour samples per sub-sample

Number of quantifications per flour sample

Total number of quantifications

Acceptance limit

2 4000 2 2 8 0.037% According to this testing plan, the probability of a lot with an actual value of 0.1% (LQL) not being rejected is, as stipulated, precisely equal to 5% (β). It thus meets the requirements of the scenario "minimal risk to the consumer". The operating characteristic curve is shown in Figure 5. The problem in this case is the acceptance limit of 0.037%, which makes a limit of quantification of less than 0.037% necessary. This is scarcely ever achieved in practice. However, if the accuracy of the quantitative methods of detection is higher (i.e. smaller coefficient of variation and flour sub-samples SD) than assumed in Testing plan 8, a higher acceptance limit is possible as a result. Nevertheless, in order to be able to achieve β = 5%, the acceptance limits must always lie below 0.1%, simply because of the sampling errors which continue to occur in sample sizes containing several thousand seeds. A systematic problem associated with this strategy is thus the excessively high limit of quantification, above which measurements are reliable, as well as the fact that the limit of quantification = the threshold value.

[key] LQL = LQL β (ist und soll) =β (actual and target) Figure 5: Operating characteristic curve for Testing plan 8, which meets the requirements of the scenario "minimal risk to the seed purchaser". (Original print-out from SeedCalc, augmented with the labelling of the axes corresponding to that used in Figure 1). 4.3 Conclusions The conclusion reached from the two examples of quantitative detection methods is that it is not methodologically possible at the present time, by the use of quantitative PCR-supported methods with a limit of quantification of 0.1%, to arrive at quantitative measured values by means of which the requirements of the scenario "minimal risk to the seed purchaser" (LQL = 0.1%, β = 5%) in a testing plan can be met. This scenario can thus only be met by the use of qualitative methods and by testing a sufficiently large sample. It is still also possible, for the achievement of the qualitative testing plans, to use quantitative methods, although their results must then be evaluated in purely qualitative terms. The sample size required for this is the same as in Testing plans 1 to 6. Consequently, it is not the limit of quantification that must be taken into account in this case, but the limit of detection. That is to say, the milling of 3000 seeds to produce a single flour sample is only possible when the limit of detection of the method is 0.3% or below. The limit of quantification is insignificant in this case. The result to be reported is simply an indication of whether the sample is negative or positive, and not the quantitative result. The introduction of an official threshold value with the intention of using quantitative test results while maintaining the distribution of risk, from which the sample size of 3000 seeds was obtained, is thus not appropriate, and is even counterproductive to a considerable degree. Only by shifting the distribution of risk is it possible to achieve a broad applicability of quantitative methods of detection (see section 6.2).

5. Testing plans for the scenario "minimal risk to the seed seller" An AQL of 0.1% and a value for α of 5% were stipulated in this case. Therefore, the testing plans must assure a very high probability that lots of seed with actual values below the threshold value of 0.1% will produce a negative result. The scenario was illustrated in principle in Figure 2. The size of the sample falls drastically here in testing plans for qualitative methods of detection, because if a small sample produces a positive result, the probability of the actual value lying below the threshold value is already very low. Once again, the default values in each case are marked in yellow in the tables for the testing plans, although no colour marking is used for the resulting values. 5.1 Testing plans for qualitative detection methods A testing plan that satisfies these default values for the maximum producer protection (Testing plan 9) is given here by way of an example. This testing plan indicates that a test sample of 51 seeds is investigated and, in the case of a positive result, the lot is rejected. It means that, for an actual value of 0.1% and smaller in the lot, the probability of the lot being (wrongly) rejected is 5% and below. This is thus a relatively reliable test for the producer - as intended. It also means, on the other hand, that lots with an actual value of 0.1% (that is to say precisely the threshold value) will still remain free from rejection with a probability of 95%. This means that the risk to the consumer is relatively high. Lots with an actual proportion of 0.2% would thus still have a probability of 90% of not being rejected. The testing plan could thus be summarized as follows: if a sample as small as 51 seeds already contains one or more GM seeds, the probability is then very high that the actual value of the lot will lie above 0.1% and the lot must consequently be rejected even if high protection for the producer is assured. Testing plan 9 LQL: n.d. β: n.d. AQL: 0.1% α: 5% Limit of detection: 0.1% False-positive rate: 0% False-negative rate: 0% Number of sub-samples to be tested

Number of seeds per sub-sample

Total number of seeds to be tested

Maximum permissible number of positive sub-samples

1 51 51 0 * n.d. not determined It is also possible, of course, to introduce false-positive and false-negative rates once again at this point, although this is not discussed in detail here. The limit of detection should not play any role in the case of a test sample size of 51 seeds. 5.2 Testing plans for quantitative detection methods Testing plans for quantitative methods of detection for the scenario "minimal risk to the seed seller" also get by with quite small sample sizes, and the acceptance limit also lies above the threshold value of 0.1% and thus also above the limit of quantification. It is already clear, on this basis alone, that these testing plans satisfy the conditions and are also realistic. An increase in the risk of the seed purchaser obtaining tested lots of seed which produced a "non-obvious" test result, but of which the actual value still lies above the threshold value of 0.1%, is also unavoidable in this case, however. For this reason, an LQL of 1.0% has been introduced into

Testing plan 10, for example, at which a β of 5% can be achieved. This definition is arbitrary in this case, although it should ultimately be the result of a political discussion, as illustrated in section 3.2. The values here (Testing plan 10 and also Testing plan 11) should demonstrate how their determination affects the testing plans. Testing plan 10 LQL: 1.0% β: 5% AQL: 0.1% α: 5% Limit of detection: 0.1% Limit of quantification: 0.1% Variation coefficient: 30% Flour sub-samples SD: 0.020% Number of sub-samples to be tested

Number of seeds per sub-sample

Number of flour samples per sub-sample

Number of quantifications per flour sample

Total number of quantifications

Acceptance limit

1 965 1 1 1 0.278% This testing plan consequently involves milling a sample of 965 seeds, performing a determination on a sample of flour, and then rejecting the lot if the result lies above 0.278%. The acceptance limit of 0.278% in this case lies clearly above the limit of quantification of 0.1%, and a relatively reliable determination should thus be possible.

[key] LQL = LQL β (ist und soll) =β (actual and target) Figure 6: Operating characteristic curve for Testing plan 10, which meets the requirements of the scenario "minimal risk to the producer". (Original print-out from SeedCalc, where the labelling of the axes corresponds to that used in Figure 1). In this testing plan, too, the probability of lots with actual GMO contamination below 0.1% wrongly being rejected is (only) 5%. On the other hand, account should also be taken of the fact that seed lots with an actual GMO contamination of 1.0% are still not rejected with a probability of 5%. If this risk is to be reduced, for example if the LQL is to be reduced 0.5%, then more complex testing plans must be used, since the associated operating characteristic curve must follow a steeper path. One example of this can be found in Testing plan 11.

Testing plan 11 LQL: 0.5% β: 5% AQL: 0.1% α: 5% Limit of detection: 0.1% Limit of quantification: 0.1% Variation coefficient: 30% Flour sub-samples SD: 0.020% Number of sub-samples to be tested

Number of seeds per sub-sample

Number of flour samples per sub-sample

Number of quantifications per flour sample

Total number of quantifications

Acceptance limit

2 961 2 2 2 0.221% Two samples, each of 961 seeds, must be tested here, and the lot of seed is rejected if the result lies above 0.221%. 5.3 Conclusions What emerges from the testing plans illustrated here is that a low risk to producer at an official threshold value of 0.1% can be achieved relatively free from problems if a higher consumer risk can be accepted. For this scenario, the introduction of an official threshold value of 0.1% (= AQL) is a very useful and appropriate default value, which makes a reliable method, and one which is also acceptable in terms of the methods of detection and testing plans, possible for the producers. 6. Testing plans for the scenario "equal allocation of the risks" Both of the aforementioned scenarios are arranged in a one-sided manner in each case, on the one hand for a low consumer risk, and on the other hand for a low producer risk. The two risks of an incorrect decision about a lot of seed should now be distributed below more or less equally on the two sides. LQL and β on the one hand, and AQL and α on the other hand are defined in advance. It can already be anticipated here that the scope of the test is greater in proportion to the intended steepness of the curve in Figure 1. The more closely the curve is required to lie to the ideal line, the closer the necessary test sample size approaches to the size of the lot. A problem associated with this is that a threshold value of 0.1% is already very low, and only the interval between 0 and 0.1% is available in order to define an AQL. Once again, the default values in each case are marked in yellow in the tables for the testing plans, although no colour marking is used for the resulting values. 6.1 Testing plans for qualitative detection methods For the purpose of formulating testing plans for qualitative test methods, an important aim is to keep the number of tested sub-samples as low as possible, because these incur additional costs in connection with preparation through cleaning of the mills and separate extraction. Thus, when selecting suitable testing plans, the optimization criteria below are firstly to meet the defined risks and then to minimize the number of sub-samples. For Testing plan 12, AQL has been set at 0.03% and LQL at 0.2%. The resulting testing plans require a minimum of 11 sub-samples and the processing of around 4500 seeds in total. Of the 11 to 15 tested sub-samples, a maximum of 3 sub-samples may produce a positive result, so that none of the lots is rejected.

Testing plan 12 LQL: 0.2% β: 5% AQL: 0.03% α: 5% Limit of detection: 0.1% False-positive rate: 0% False-negative rate: 0% Number of sub-samples to be tested

Number of seeds per sub-sample

Total number of seeds to be tested

Maximum permissible number of positive sub-samples

a) 11 450 4950 3 b) 12 380 4560 3 c)….13 350 4550 3 d)….15 300 4500 3

* n.d. not determined It will be appreciated that the testing of only 3000 seeds is no longer adequate for this uniform distribution of the risks, but that 4500 seeds are required, which must also be tested in 11 or more sub-samples. The risk to the producer is reduced, however, by the approval of positive samples, since such lots, that is to say containing GM seeds in very small proportions below 0.1%, have a greater chance of not being rejected. The operating characteristic curves for Testing plan 12 are shown in Figure 7. For Testing plan 13, however, AQL was set at 0.06% and LQL at 0.15%. the requirement thus corresponds to some extent to that in the illustration in Figure 1. The testing plans show a further significant increase in the cost of testing. It can be seen clearly here that the steeper curve is associated with a greater cost of testing. At least 25 samples from each lot would require to be tested in practical terms, and a total quantity of around 16,000 to 20,000 seeds (around 4.5 to 6.5 kg in the case of maize) would require to be milled. This is clearly not realistic, so that these requirements in respect of lower risks for producers and consumers cannot be achieved by means of qualitative testing plans. Testing plan 13 LQL: 0.15% β: 5% AQL: 0.06% α: 5% Limit of detection: 0.1% False-positive rate: 0% False-negative rate: 0% Number of sub-samples to be tested

Number of seeds per sub-sample

Total number of seeds to be tested

Maximum permissible number of positive sub-samples

a) 25 1000 25,000 15 b) 29 700 20,300 14 c) 35 500 17,500 13 d) 42 400 16,800 13

* n.d. not determined

Figure 7: Operating characteristic curve for Testing plan 12, which corresponds to the scenario "equal distribution of the risks". (Original print-out from SeedCalc, where the labelling of the axes corresponds to that used in Figure 1). 6.2 Testing plans for quantitative detection methods Once again, AQL was set at 0.03% and LQL at 0.2%. Testing plan 14 a) shows that around 4000 seeds in total would be required here and certainly that the acceptance limit also lies at 0.081% and thus below the limit of quantification. This testing plan is thus not realistic. If "acceptance limit > 0.1" is adopted as the default value, then Testing plan 14 b) is appropriate. According to this, 6 samples each containing 950 seeds (5700 seeds in total) would require to be tested and, if the average value of the total of 24 tests required to be carried out lies above 0.1%, then the lot of seed must be rejected. Testing plan 14 LQL: 0.2% β: 5% AQL: 0.03% α: 5% Limit of detection: 0.1% Limit of quantification: 0.081% Variation coefficient: 30% Flour sub-samples SD: 0.020% Number of sub-samples to be tested

Number of seeds per sub-sample

Number of flour samples per sub-sample

Number of quantifications per flour sample

Total number of quantifications

Acceptance limit

a) 4 1000 2 2 16 0.081% b) 6 950 2 2 24 0.1%

If, however, a more precise method of detection with a coefficient of variation of only 15% and a flour sub-sample SD of only 0.010% were to be available, only minor changes would result here compared with Testing plan 14 b). An example of this is given in Testing plan 15, for which the operating characteristic curve is shown in Figure 8. The explanation for this is that more than 95% of the uncertainties that occur in Testing plan 14 b) are caused by the sample size (5700 seeds) and fewer than 5% by the method. If the method is now more precise, the effect on the reliability of the process as a whole will be only marginal.

Figure 8: Operating characteristic curve for Testing plan 15, which corresponds to the scenario "equal distribution of the risks". (Original print-out from SeedCalc, where the labelling of the axes corresponds to that used in Figure 1). Testing plan 15 LQL: 0.2% β: 5% AQL: 0.03% α: 5% Limit of detection: 0.1% Limit of quantification: 0.01% Variation coefficient: 15% Flour sub-samples SD: 0.010% Number of sub-samples to be tested

Number of seeds per sub-sample

Number of flour samples per sub-sample

Number of quantifications per flour sample

Total number of quantifications

Acceptance limit

6 910 2 2 24 0.1% As illustrated by Testing plan 15, the size of the sample in the significantly more precise method is reduced overall from 5700 seeds (Testing plan 14 b)) to only 5460 seeds, which does not represent a true simplification. In the case of Test plan 14 a), the more precise method could permit the acceptance limit to be raised from 0.081% to 0.083% (Testing plan not illustrated), although this also offers no solution to the problem, since the acceptance limit would still lie below the limit of quantification. This shows that it is not the precision of the method when quantifying the GMO contamination in the sample that is critical, but rather the representativity of the sample for the lot of seed. The representativity can only be improved by increasing the size of the sample, on the other hand, which results in an increase in the cost of testing. Testing plan 16 illustrates only one application for Testing plan 7 (sample size of 3000 seeds and acceptance limit of 0.1%) with the question of what low LQL and what high AQL can be achieved under these conditions. The result is the LQL is 0.3%, and the AQL is 0.03% if α = β ≤ 5% as usual. The associated operating characteristic curve is illustrated in Figure 9.

Testing plan 16 LQL: 0.3% β: 5% AQL: 0.03% α: 5% Limit of detection: 0.1% Limit of quantification: 0.1% Variation coefficient: 30% Flour sub-samples SD: 0.020% Number of sub-samples to be tested

Number of seeds per sub-sample

Number of flour samples per sub-sample

Number of quantifications per flour sample

Total number of quantifications

Acceptance limit

1 3000 2 1 2 0.1%

Figure 9: Operating characteristic curve for Testing plan 16, which corresponds to the scenario "equal distribution of the risks". (Original print-out from SeedCalc, where the labelling of the axes corresponds to that used in Figure 1). This testing plan is realistic for the assumed limit of detection and limit of quantification and is based on the sample size of 3000 seeds that is also regarded as realistic in practice. However, account must also be taken of the fact that, in this testing plan, a lot with 0.1% GMO contamination will not be rejected with a probability of 50%, although a lot with 0.3% will not be rejected with a probability of 5% and, on the other hand, a lot with an actual contamination of 0.05%, for example (that is to say below the threshold value), will be rejected with a probability of 13%. In summary, although this testing plan is highly practicable, it produces the greatest risks to the producer and the consumer. If these definitions of the risk to the producer and the risk to the consumer were nevertheless to be acceptable, Testing plan 17 presents testing plans for qualitative methods of detection for this purpose. Testing plan 17 LQL: 0.3% β: 5% AQL: 0.03% α: 5% Limit of detection: 0.1% False-positive rate: 0% False-negative rate: 0% Number of sub-samples to be tested

Number of seeds per sub-sample

Total number of seeds to be tested

Maximum permissible number of positive sub-samples

6 435 2610 2 7 360 2520 2

Around 2600 seeds in total are required in this testing plan, which does not represent a real saving in sample material and effort in the preparation of the sample compared to the 3000 seeds in the quantitative testing plans. Rather, the necessary separate preparation of the 6 sub-samples in Testing plan 17 involves a greater cost than the

preparation of the single sample in Testing plan 16. Here, the quantitative method of detection would offer advantages over the qualitative method. 6.3 Conclusions If the risks to producers and consumers at a threshold value of 0.1% are divided equally, practicable testing plans are possible both with qualitative and with quantitative methods of detection. The risks to both parties are not greater than the risks that the producers and/or the consumers would be required to face in each case in scenarios 1 and 2. If the allocation of risk were to be accepted as such for the official threshold value of 0.1%, a testing plan based on a quantitative method of detection would even be preferable. As already indicated, however, the dimensioning and distribution of the risk is a default value for the GM testing to be derived from the political assessment, which does not yet exist. 7. Effects of the 0% threshold value for non-permitted GMOs The testing plans have previously been discussed as if the testing for compliance with the official threshold value of 0.1% were the sole mode of detection to be used. The fact is that, according to existing statutory regulations, a threshold value of 0% is applicable for the non-approved GMOs, and that seed that is contaminated with such GMOs must not be placed on the market. The question of how the two tests (0% for non-approved GMOs and 0.1% for approved GMOs) may best be combined one with the other is now addressed. Consideration must be given to three stages in this respect: 1. Screening, 2. Identification, 3. Quantification. The manner in which these 3 stages are combined into an procedure depends on the anticipated nature of the GMO contamination, on its level and, above all, on the anticipated frequency of the GMO-contaminated lots. It is accordingly not possible to develop a generally applicable solution here that is efficient in all situations. This is the responsibility of the GMO test laboratory concerned. The following rules must be observed in conjunction with this, however: 1. For testing the threshold value of 0%, the testing plan designated here as Testing plan 1 (or Testing plan 2 for a limit of detection of 0.1%) has been generally introduced. This means that it is necessary to establish as a part of the procedure that a test sample of 3000 seeds is free from non-approved GMOs. The need accordingly arises in every case for a detection process with a sample size of at least 3000 seeds. 2. For testing the official threshold value of 0.1%, the question now arises of what scenario to select. a) If the scenario "minimal risk to the consumer" is selected, further testing is relatively easy, because a sample size of 3000 seeds is necessary in this case, too, in conjunction with the use of qualitative test plans. For example, a test method could involve performing initial screening on a sample of 3000 seeds. In the event of a positive result (= GMO found), an identification would then be necessary, or at least the allocation of the GMO to the categories "approved GMO" or "non-approved GMO", and this allocation would then indicate what measures need to be taken. This method would be very

uncomplicated because of the uniform sample size and decision rule, and it would be quite simple to perform in the laboratory. An application for quantitative detection methods would not be necessary and would also not be appropriate in this test procedure (except for the exclusion of dust contamination, where necessary). b) If the scenario "minimal risk to the producer" is selected, the testing plans for the two categories "approved GMO" and "non-approved GMO" would naturally be different. The practical solution could take the form of the qualitative testing of 3 samples, each of 1000 seeds (Testing plan 2). If one or more sub-samples are positive (i.e. if they contain GMOs), an identification will take place. If the GMO is a non-approved GMO, the test is concluded, and the seed lot must not be placed on the market. If, on the other hand, the GMO is an approved GMO, a quantification must be performed on the positive test samples (or one of the positive test samples) according to Testing plan 11, and the lot must be rejected if the decision threshold is exceeded. Since the sample size in Testing plan 11, at a theoretical 961 seeds, is close to the size of a test sample in Testing plan 2, the combination of these two testing plans and the associated application of qualitative and quantitative measures is entirely suitable and appropriate. A sample size of 3000 seeds in total is thus sufficient in this scenario. c) If the scenario ""equal distribution of the risks" is selected, the testing plans for the two categories "approved GMO" and "non-approved GMO" would also be different. The practical solution could take the form of the qualitative testing of 3 samples, each of 1000 seeds (Testing plan 2). If one or more sub-samples are positive (i.e. if they contain GMOs), an identification will take place. If the GMO is a non-approved GMO, the test is concluded, and the seed lot must not be placed on the market. If, on the other hand, the GMO is an approved GMO, further tests will be required, the scope of which and the necessary sample sizes will now depend on the definition of the risks to the producers and the risks to the consumers. If the risks indicated in Testing plan 16 were to be accepted (LQL = 0.3%, AQL = 0.03%, β = α = 5%), the further testing would be relatively simple, the material milled from the 3 sub-samples, each of 1000 seeds, would be mixed, and the flour samples required for the quantification would then be taken. If lower risks than those taken as the basis of Testing plan 16 must be achieved, in the event of the identification of an approved GMO, then further sub-samples must be milled in addition to the already milled 3 sub-samples in order to permit the use of Testing plans 12 or 14. In the event of the more frequent occurrence of approved GMOs, it may then also be appropriate to perform the screening with the sub-sample sizes that will subsequently be required for the quantification. It will thus be appreciated that the tests for "approved GMO" and "non-approved GMO" can be combined, and indeed must be combined with one another in an entirely practical manner. It is, in fact, necessary to ensure that the samples, in which the presence of GMO contamination has been identified during screening, are included in the further tests for identification and quantification. If one of these steps were to be performed in isolation on a new sample, there is no guarantee that any GMO contamination will be present in this sample. A sample for which a positive outcome is obtained must accordingly be included in the rest of the procedure.

In summary, it will be found that the sample size is at least 3000 seeds for combined testing for "approved GMO" and "non-approved GMO", and that even larger test samples may be required for the 0.1% threshold, depending on the scenario. The subdivision of the test sample into sub-samples may be performed in such a way that these are used in both test methods. This can actually lead to highly practical solutions. 8. Examination of reserve samples All the testing plans represented here and their associated risks of incorrect decisions relating to lots of seed depend on the testing of the precisely defined sample sizes in a single operation. The testing plans provide for a decision to be reached solely on the basis of the single performance of the test on a submitted sample. No provision is made for the "confirmation" of positive results, which involve rejection, by a second test. Nevertheless, a procedure of this kind may be desirable. The risks for those involved will shift in this case, however, and new characteristic curves must be worked out. It is appropriate at this point to provide a few examples which illustrate these shifts. 1st case: A first test performed on 3000 seeds has produced a positive result. The contamination is very weak, evidently due to the presence of a single GM seed. According to Testing plan 2, which is assumed to have been used here, the decision threshold has been exceeded and the lot of seed would have to be rejected. A second test is now performed, however, on an available reference sample. This test produced a negative result. As a result, the lot of seed is not rejected. The question concerns the nature of the risk to the consumer in this case. It can also be extended directly to include those cases in which not only a single positive sample was present, but 2 or even 3, and the final sample in each case then produced a negative result. Figure 10 indicates the results. It can be clearly appreciated that the risk to the consumer grows as the number of positive samples increases. The definition LQL = 0.1% and β = 5% can thus no longer be met. Figure 10 also illustrates the operating characteristic curves for those cases in which 3 or 4 samples were analysed and the final sample was always negative. The depicted increase in the acceptance probability for LQL = 0.1% continues here.

Figure 10: Operating characteristic curves for the supplementary testing of reference samples, each of 3000 seeds, and the decision threshold requiring the last of these to be negative. (Original print-out from SeedCalc, where the labelling of the axes corresponds to that used in Figure 1). 2nd case: It is assumed that false-positive results occur with a frequency of around 5% in a GMO laboratory. This would lead to a situation in which the laboratory and also the enforcement agencies would suspect positive results from this laboratory. One solution could be to undertake a second test on a reference sample. This assumes that the result would be negative and that the lot would not be rejected. The operating characteristic curves shown in Figure 11 would result for this combination. First of all, Figure 11 shows that a greater probability of acceptance for lots with an actual GMO proportion of 0% is obtained with the decision rule "a maximum of one positive sample out of 6 samples, each of 1000 seeds, may be positive" that is derived from both of the tests (the broken line begins at a higher point on the y-axis than the solid line). This means that the aim of not being misled here by the false-positive result in the case of actual freedom from GMOs is achieved. Also, when account is taken of both tests, the probabilities of acceptance for lots with actual proportions between 0.05 and 0.1% are lower than when account is taken of only the first sample, which involves a reduction in the risk to the consumer. The strategy on the whole is appropriate, therefore. It should be noted, however, that these conclusions only apply to the acceptance of a false-positive rate of 5%, which is set quite high as a value, in order to illustrate the effects clearly. Figure 12 shows that deviating decisions (blue) can be expected, and that they occur most frequently at actual GMO contaminations in the lot in the order of 0.025%. The most interesting case in practice, i.e. that a first result is positive and a second result is then negative (dark blue),occurs here at a rate of about 25%. This means that, in a lot with an actual GMO contamination of 0.025%, around 53% of the test results are positive, and that, in the case of the retesting of this 53% in a second test, the first result is only confirmed in 28%, and for the remaining 25% (53-28), the second test result is negative. This is the extreme case, although it shows that, in the case of a correspondingly smaller contamination, the statistical probability of confirming a first positive result in a second test can only lie at around 50%. It should be noted that this

is not due to incorrect determinations by the laboratory or to poor sampling, but that it is due solely to the probability of the presence of GM seeds in the test samples in the case of a random sampling exercise. At higher actual levels of GMO contamination in the lot of seed, the concurrent (deviating) results of both tests take precedence.

Figure 11: Operating characteristic curves for qualitative tests with a rate of false-positive results of 5%. One of the depicted operating characteristic curves (solid line) relates to the case in which, in a first test (3 x 1000 seeds), no positive samples are tolerated, and the other (broken line) relates to the case in which, in the first test, one of the three samples was positive and, in the testing of the reference sample, all 3 sub-samples were negative. (Original print-out from SeedCalc, where the labelling of the axes corresponds to that used in Figure 1). 3rd case: One seed lot was analysed by means of Testing plan 1, and the resulting decisions were notified to the owner of the lot. This now calls for a retest on a reference sample, which is performed according to the same testing plan 1. A second result is now obtained, on the basis of which a decision can again be reached about the lot. According to the combinatoric, four groupings are possible (formulated briefly below): 1. Both accept the lot. 2. Both reject the lot. 3. First accept, then reject. 4. First reject, then accept. The question concerns the probability of concurrence between the two decisions that are taken on the basis of the results in each case, and the probability of their disagreement. These probabilities are illustrated for Testing plan 1 on the left in Figure 12 (3000 seeds tested, 0 positive seeds).

[key] Wahrscheinlichkeit (%) probability (%) Wahre GVO Verunreinigung (%) actual GMO contamination (%) Ablehnung + Ablehnung rejection + rejection Ablehnung + Annahme rejection + acceptance Annahme + Ablehnung acceptance + rejection Annahme + Annahme acceptance + acceptance Ablehnung + Annahme rejection + acceptance Annahme + Ablehnung acceptance + rejection Übereinstimmende und abweichende Ergebnisse, addiert concurrent and deviating results (added) nur abweichende Ergebnisse, addiert deviating results only (added) Figure 12: Probabilities of concurrent (green) and deviating (blue) decisions (rejection or acceptance of a lot of seed) for two tests (3000 seeds tested, 0 positive seeds) performed on various samples from the same lot depending on the actual GMO contamination in the lot. These three examples show that the failure of a test result to comply and the testing of a reference sample and the application of their results displaces the risks. The often discussed conclusion that a positive result in the case of non-confirmation in a second test performed on a reference sample must clearly have been a false-positive result, is not acceptable. The expectation alone that the result of the second test could be a false-negative shows that this argument cannot be maintained. Depending on the actual GMO proportion in the lot, it can even be expected that reference samples taken from the submitted sample will differ with a probability of 50%. False-positive or false-negative results thus cannot be traced. These can only be determined by testing specifically prepared, known samples, and it is the responsibility of every laboratory practically to exclude the risk of false-positives and false-negatives in the routine procedure by means of corresponding positive and negative checks. For this reason, participation in national or international ring trials should already be a duty incumbent on every GMO test laboratory.

9. Bibliography Kruse, M. (2001). Probenahmepläne zur Nachprüfung der Verunreinigungen von konventionellen Saatgut mit GVO Saatgut. (Sampling plans for checking the contamination of conventional seed with GMO seed). (VDLUFA-Schriftenreihe 57, Teil 2: 562-567. ISTA (2007). SeedCalc 7.01. www.seedtest.org Remund, K., Dixon, D., Wright, D. and Holden, L. (2001). Statistical considerations in seed purity testing for transgenic traits. Seed Science Research 11, 101-119. Anonymous (2006). Konzept zur Untersuchung von Saatgut auf Anteile gentechnisch veränderter Pflanzen. (Concept for the testing of seed for proportions of genetically modified plants). Subcommittee on Methodological Development at the Federal/State Genetic Engineering Association. www.lag-gentechnik.de/dokumente/Saatgutkonzept_2006.pdf Anonymous (2006). Empfehlungen für ein einheitliches Vorgehen der Überwachungsbehörden bei GVO-Anteilen mit zugelassenen GVO. Beratungsvorlage A zu TOP 3.3 der 32. Sitzung der LAG. (Recommendations for a standard procedure for the monitoring authorities with regard to GMO proportions containing permitted GMOs. Consultation Paper A to TOP 3.3 of the 32nd. session of the LAG). Anonymous (2007). UAM-Stellungnahme zum LAG-Auftrag "Konsequenzen zu dem Vollzugsschwellenwert 0.1% bei Saatgut", Stand 23.3.07. (UAM opinion on the LAG commission "Consequences for the official threshold value of 0.1% in seeds", status 23.03.2007). Anonymous (2005). Harmonisierte experimentelle Saatgutüberwachung auf GVO-Anteile. (Harmonized experimental seed monitoring for GMO levels). Federal/State Genetic Engineering Association (LAG) Manual, status Nov. 2005. Anonymous (2004). Commission Recommendation on technical guidance for sampling and detection of genetically modified organisms and material produced from genetically modified organisms as or in products in the context of Regulation (EC) No. 1830/2003. Official Journal of the European Union, L348/18 of 24.11.2004.