shelf life determination using sensory evaluation scores

8/6/2019 Shelf Life Determination Using Sensory Evaluation Scores

http://slidepdf.com/reader/full/shelf-life-determination-using-sensory-evaluation-scores 1/19

Shelf life determination using sensory evaluation scores:A general Weibull modeling approach q

Marta A. Freitas a,*, Josenete C. Costa b

a Departamento de Engenharia de Produca o, E.E., Universidade Federal de Minas Gerais, R. Espı rito Santo 35,

CEP 30160-030, Belo Horizonte, MG, Brazil b Departamento de Estatı stica, ICEX, Universidade Federal de Minas Gerais, CEP 31270-901, Belo Horizonte, MG, Brazil

Received 28 September 2005; received in revised form 15 May 2006; accepted 18 May 2006

Available online 1 September 2006

Abstract

Sensory evaluations to determine the shelf life of food products are routinely conducted in food experimentation as apart of each product development program. In such experiments, trained panelists are asked to judge food attributes byreference to a scale. The ‘‘failure time’’ associated to a product unit under test is usually defined as the time required toreach a cut-off point previously defined by the food company. Due to the destructive nature of these evaluations, one neverknows the exact ‘‘failure time’’ of a given unit. Consequently, data arising from these studies are either left or right cen-sored. This article deals with the problem of modeling such kind of data for shelf life determination and develops a general

Weibull model (GWM). Simulations indicate a good performance of the parameter estimates obtained though the GWM.The modeling approach is applied to a real data set.Ó 2006 Elsevier Ltd. All rights reserved.

Keywords: Maximum likelihood; Sensory evaluations; Shelf life; Weibull distribution

1. Introduction

Some food products require extended aging under controlled conditions of storage to develop characteristicflavors. The unique, piquant taste of blue cheese is attributable to mold growth and enzymatic activity pro-moting fermentation of protein and carbohydrate, hydrolysis of fat, and secondary chemical reactions. Thetaste and odor of freshly distilled spirits, particularly whiskey, is rather raw and unpleasant; desired flavorcomponents develop during years of aging in wood. Unfortunately, not all food and beverage productsincrease in sensory quality with storage or holding time.

Problems related to storage stability are common to the food industry and that is why storage studies arean essential part of every product development, improvement, or maintenance program. Some studies center

0360-8352/$ - see front matter Ó 2006 Elsevier Ltd. All rights reserved.

doi:10.1016/j.cie.2006.04.005

q This manuscript was processed by Area Editor Gray L. Hogg.* Corresponding author.

E-mail addresses: [email protected] (M.A. Freitas), [email protected] (J.C. Costa).

Computers & Industrial Engineering 51 (2006) 652–670

www.elsevier.com/locate/dsw

mailto:[email protected]






on the rate of degradation, and others on estimating the shelf life: the length of time required for the prod-uct to be unfit for human consumption. By unfit for human consumption it is meant that the product exhibitseither physical, chemical, microbiological, or sensory characteristics that are unaccepted for regularconsumption.

The manufacturer attempts to develop a product with the longest shelf life practical, consistent with costs

and the pattern of handling and use by distributors, retailers, and consumers. Inadequate shelf life determi-nation will lead to consumer dissatisfaction or complains. At best, such dissatisfaction will eventually affectthe acceptance and sales of brand name products. At worst, it can lead to malnutrition an even illness. Forthese reasons, food processors pay great attention to adequate storage stability or shelf life.

When one talks about determining the shelf life, chemical, physical, microbiological and nutritional anal-ysis are fundamental but equally important are the sensory characteristics of the product. It is a fact that manyconsumers purchase a product on the basis of the sensory experience which it delivers for instance, sweetness,softness, chocolateness, aspect, odor, flavor, aftertaste, etc. For this reason, the role of sensory evaluations forshelf life determination is becoming more important as the value of these tests becomes recognized and as theconsumer industry increases its use of these methods.

In such experiments, a sample of product units is stored under certain conditions and periodically, at pre-specified evaluation times a sample of units is collected from the ones stored in a given condition and subjected

to sensory evaluations by trained panelists. A frequent test procedure is to ask each panelist to judge eachattribute separately by reference to a rating scale, for instance, a seven point rating scale varying from 0(‘‘no difference’’) to 6 (‘‘total difference’’). Each ‘‘test unit’’ is compared to a ‘‘control unit’’ and the scoresare the result of this comparison. Because of the destructive nature of evaluations, units that have already beenevaluated at a given time cannot be restored to be evaluated later on.

The usual approach to the modeling and analysis of this kind of data, is to fit a regression line (obtained bythe method of least squares) relating the scores (z) and the pre-specified evaluation times (x). An estimate of the shelf life is then obtained by solving the fitted regression equation for x and replacing the score (z) by thecut-off point c (which indicates ‘‘product failure’’) previously chosen by the food company. Gacula and Singh(1984) presented examples in which regression models were implemented to estimate the shelf life. However,this approach offers difficulties since in general, the assumption of Normality and homocedasticity (constant

variance)-basic requirements of Regression Analysis (Draper & Smith, 1998) are not valid for sensory scores.To overcome this difficulty, Gacula and Singh (1984) suggested some transformations of the experimental datato new scales where the Normality assumption may be approximately satisfied. Alternatives to overcome theviolation of the homocedasticity assumption include the use of variance-stabilizing transformations andweighted least squares. However, these procedures, extensively used in other practical problems seems to beof no use for this kind of data (see Gacula & Singh, 1984).

An alternative approach is to fit a parametric lifetime model such as Weibull, lognormal (just to name afew), to the ‘‘time to failure’’ data, in other words, to the time at which a specified deterioration level has beenachieved. But it should be pointed out that in these experiments, one does not observe the actual ‘‘time to fail-ure’’ of a given unit. For a unit evaluated at a pre-established time, one of the two situations might happen: thescore is either less than or equal to the cut-off point c (indicating that the ‘‘failure’’ has already been occurred)or greater than c. In the former case the ‘‘failure tim’’ is somewhere between the start of the experiment andthe present time of evaluation. In the latter case, the product is still appropriate for consumption and its ‘‘fail-ure’’ will take place sometime after evaluation but it will not be observed due to the destructive nature of theexperimental trials. Thus, data coming from these sensory evaluations are either left or right censored, respec-tively (Meeker & Escobar, 1998). Gacula and Kubala (1975) presented examples of data analysis for shelf lifedetermination. The authors used the evaluation times as approximations to the real failure time (eliminatingthe left censoring). Shelf life was then estimated using the Weibull distribution fitted to the approximated fail-ure times and the right censored observations. Discussing specific designs for shelf life determination Gacula(1975) suggested the use of a ‘‘staggered design’’ to implement weekly sensory evaluations. In this design theintervals between evaluations are larger at the beginning of the experiment and smaller towards the end. Thenumber of units sampled increases towards the end of the experiment. The idea is to get closer to the ‘‘real’’failure time. In the examples discussed in that paper, the author also uses the evaluation times as ‘‘failure

times’’ when the score is less than the cut-off point.

M.A. Freitas, J.C. Costa / Computers & Industrial Engineering 51 (2006) 652–670 653



Two problems associated with the above mentioned approach should be pointed out: (1) the failure timemodel (in those specific cases, Weibull) is not used appropriately since the evaluation times are fixed and(2) as a consequence of the approximation (i.e., the use of the evaluation times as if they were the actual failuretimes) there is a great risk of overestimating the shelf life.

In an attempt to incorporate both the information of the left and right censored data, Freitas, Borges, and Ho

(2003) proposed an alternative statistical model. The basic idea was a dichotomization of the original score data,by defining at each evaluation time si a Bernoulli random variable Y ij with parameter pij (probability of a score6c). Assuming a Weibull (scale; shape) = (a j ; d) as the underlying distribution for the shelf life time, the proba-bility pij was then defined as pij = P (0 < T ij 6 si ), where T ij is the ‘‘time to failure’’ of the j th unit evaluated at timesi . The authors allowed the scale parameter of the Weibull distribution to be linearly dependent on explanatoryvariables (covariates or factors of an experimental design). The parameters were estimated by the maximum like-lihood method. The model was applied to the data coming from a real situation where product units were storedin three different conditions and percentiles of the shelf life distribution and fraction of ‘‘defectives’’ were estimat-ed foreachone of them. A smallsimulation study wasimplementedconsidering only thebasic sample plan used inthe real experiment. Later, Freitas, Borges, and Ho (2004) expanded the simulation study with focus on the effectof several aspects such as, the total sample size, the proportion of allocation to each experimental condition,among others in the bias and precision of the estimates obtained with the model.

In this paper, we generalize the modeling approach proposed by Freitas et al. (2003). The idea is to continueworking with the basic dichotomized data as it was described above but, to use as the underlying distribution aWeibull where both parameters, scale (a j ) and shape (d j ) are dependent on explanatory variables.

The article is organized as follows. In Section 2, we describe the real motivating situation. In Section 3, webriefly review the modeling approach as in Freitas et al. (2003) and introduce the general Weibull model(GWM). In Section 4, we present our simulation results on the bias and precision of the estimators. In Section5, we apply the GWM to the same the real data used in the work by Freitas et al. (2003) and compare theresults to the ones reported by them. We provide concluding remarks in Section 6 and some derivationsand technical details of expressions used in Section 3 in Appendix A.

2. Motivating situation revisited

We briefly present here the situation described by Freitas et al. (2003).Sensory evaluations were conducted by a food company at the laboratory level in order to determine the

shelf life of a manufactured dehydrated product, stored at different environmental conditions. Three attributeswere evaluated by trained panelists: odor, flavor, appearance. The main characteristics of this study are pre-sented next.

2.1. Experimental design

A lot of product units was sampled from the production line and units were randomly assigned to one of the following storage conditions:

• Refrigeration: Units were kept under refrigeration at 4 °C (approximately). Temperature and humidity lev-els were not controlled but they were recorded daily and average weekly values were computed and used inthis study. These units were used as reference (control) during the trials;

• Room temperature and humidity: Temperature and humidity levels were monitored and registered contin-uously by an equipment and average weekly values were calculated and used in this study;

• Environmental Chamber 1: Temperature and humidity levels controlled at 30 °C and 80%, respectively;• Environmental Chamber 2: Temperature controlled at 37 °C. Humidity levels were not controlled but they

were collected daily and average weekly values were computed and used;

The last two conditions were used in order to simulate an aggressive storage environment. Researchersexpected to register a shorter shelf life under those conditions when compared with storage under room

temperature.

654 M.A. Freitas, J.C. Costa / Computers & Industrial Engineering 51 (2006) 652–670

http://-/?-

http://-/?-



2.2. Laboratory panel

Forty-five (45) subjects were trained for the sensory characteristics of the product before the main trialstarted. Sensory evaluations were made initially and then every week thereafter. Each week, eight trained sub-

jects were randomly selected to form the test panel.

2.3. Test procedure

Evaluations were performed weekly. At a given week, units were sampled from each storage conditiongroup. Each panelist was offered in random order three sets of units to be evaluated, namely [RE, BRE,R]; [RE, BRE, CH1] and [RE, BRE, CH2], where RE, BRE and R stand respectively for ‘‘reference’’,‘‘blind reference’’ and ‘‘room temperature and humidity’’; CHi stands for ‘‘Chamber i ’’ i = 1, 2. Withina given group, the reference unit (RE) was always evaluated first. For the other two (BRE, R, CH1 orCH2), the order was randomized. All units were discarded after evaluation. The unit labeled as ‘‘refer-ence’’ (RE) and the ‘‘blind reference’’ (BRE) were both sampled from the ‘‘refrigerated condition’’ group.Those experimental units were used only to check the consistency of panelists’ judgements. By consistencywe mean that it is expected that a panelist would give similar scores to the blind reference (BRE) and the

reference (RE) experimental units. If panelist’s scores for these two reference units disagree greatly thenthe panelist’s scores should not be included in the study. Fortunately, no inconsistencies were found inthis data set.

2.4. Measurement scale

Panelists were asked to compare each test unit (including the ‘‘blind’’ reference) with the reference andassign a score on a seven-point scale (0–6) individually to each attribute: 6 = ‘‘no difference’’; 5 = ‘‘very slightdifference’’; 4 = ‘‘slight difference’’; 3 = ‘‘different’’; 2 = ‘‘large difference’’ 1 = ‘‘very large difference’’;0 = ‘‘total difference’’.

2.5. Criterion of failure

The manufacturer adopted the following failure criterion: for each attribute, product units scored 0, 1, 2 or3 were considered unfit for human consumption.

2.6. Follow up time

Units stored at room temperature; Chambers 1 and 2 were followed for 51, 36 and 18 weeks, respectively.Attributes were scored separately therefore it is possible to have a product unit classified as unfit regarding

one particular attribute and fit regarding another one. According to the failure criterion adopted by the man-ufacturer, at a given evaluation week one of the following situations (for each attribute) might happen: if theattribute’s score is less than or equal 3 (three) then one knows that the particular unit has become unfit forhuman consumption in a moment somewhere between the beginning of the trial and the evaluation week.On the other hand, if the attribute’s score is greater than 3 then, that unit is still fit for consumption (regardingthat attribute). Unfortunately, because of the trial’s destructive characteristic, the follow up of that unit isinterrupted.

3. The general Weibull model

We first give a brief presentation the model by Freitas et al. (2003) since it is the basis of the formulationsuggested in this article.

Suppose a sample of N ¼P

k

i¼1ni food product units is taken from the production line and stored under agiven environmental condition. These units will be evaluated by trained panelists at pre-established evaluation

times in order to determine their shelf life.




time of the j th unit evaluated at time (fixed) si follows a Weibull distribution whose probability density func-tion at time t is given by the expression:

f jðt Þ ¼ ad j

j d jt d jÀ1 expðÀða jt Þd jÞ; t > 0 ð5Þwith parameters a j (scale) and d j (form) given by:

• a j = exp{X j b} = exp{b0 + X j 1b1 + Á Á Á + X jqbq};X j = (1, X j 1, . . . , X jq) is a 1 · (q + 1) vector of explanatory variables (covariates or experimental factors,measured with no error) related to the j th unit evaluated at si ( j = 1, 2, . . . , ni ; i = 1, 2, . . . , k ); b = (b0

b1 . . . bq)t is a (q + 1) · 1 vector of parameters associated to the explanatory variables;• d j = exp{W j h} = exp{h0 + W j 1h1 + Á Á Á + W jqhr};

W j = (1, W j 1, . . . , W jr) is a 1 · (r + 1) vector of explanatory variables also related to the j th unit evaluatedat si ( j = 1, 2, . . . , ni ; i = 1, 2, . . . , k ) (measured with no error) and

h = (h0h1 . . . hr)t is a (r + 1) · 1 vector of parameters associated to the variables W j .

Moreover, the reliability function evaluated at time t is now given by:

R jðt Þ ¼ expðÀða jt Þd jÞ: ð6ÞTherefore, the likelihood function, Eq. (1), now takes the general form:

LðkÞ ¼Yk

i¼1

Yni

j¼1

exp À sie X jb

À ÁeW jh& ' !1À y ij

1 À exp À sie X jb

À ÁeW jh& ' ! y ij

( ); ð7Þ

where kt = (bt; ht).The maximum likelihood estimate of k ðkÞ is obtained by direct maximization of the logarithm of expres-

sion (7).The calculations require the implementation of numeric optimization methods such as the Newton– Raphson algorithm.

In this work, a minor adjustment to the Newton–Raphson procedure sometimes used in statistical prob-lems, called Fisher’s Score was used (Seber & Wild, 1989). This adjustment is done by replacing the negativesecond-derivative matrix (ÀH ) by its expected value I (k) = E{ÀH (k)}, called the (expected) Fisher Informa-tion Matrix. This modification is implemented since the negative second-derivative matrix may not be positivedefinite at every practical solution kð jÞ and this can cause the Newton method to fail. The Fisher scoringmethod therefore uses an approximation of ÀH (k) which is always positive definite, so that the step takenat the ( j À 1)th iteration leads off in an uphill direction (Seber & Wild, 1989). The expressions of the quantitiesneeded for this calculation are shown in the Appendix A.

Now, if k ¼ ðb0; b1; . . . ; bq; h0; h1; . . . ; hr Þt is the maximum likelihood estimator of k = (b0, b1, . . . ,bq; h0,

h1, . . . ,hr)t then for a given set of explanatory variables X j = (1, X j 1, . . . , X jq) and W j = (1, W j 1, . . . , W jr),

we get:

• the maximum likelihood estimator of the (100 · p)% percentile (or equivalently, the pth percentile) of thefailure time (or, shelf life time) distribution, t p ð jÞ is given by

t p ð jÞ ¼ 1

a j

½À lnð1 À p Þ�1=d j ; ð8Þ

where, a j ¼ expf X jbg and d j ¼ expfW jhg;• the maximum likelihood estimator of F j (t0), the fraction of ‘‘defectives’’ (fraction of units considered

‘‘inadequate for consumption’’) in t0 is given by:

^ F jðt 0Þ ¼ 1 À ^ R jðt 0Þ ¼ 1 À exp À t 0a jÀ Ád j

n o ¼ 1 À exp À t 0e X jb eW j h

( ): ð9Þ


http://-/?-

http://-/?-



Making use of the asymptotic property of the maximum likelihood estimator it is possible to get the expres-sions of the (asymptotic) standard deviations and construct confidence intervals for the quantities mentionedabove. The expressions and details of the calculations are in the Appendix A.

4. Simulation study

Our purpose was to study the ‘‘quality’’ of the estimates of percentiles and fraction of ‘‘defectives’’ obtainedwith the expanded Weibull model presented in Section 4 applied to situations which imitated the behavior fol-lowed by the real data available. The basic idea was to generate observations under known conditions and usethe model proposed to estimate the quantities of interest. These estimates were then compared to the real val-ues. The details of the study are given next, including the definition of the metrics used to evaluate the ‘‘qual-ity’’ of the estimates obtained.

4.1. Description of the simulation study

4.1.1. The choice of the (vectors of) explanatory variables X j and W j

To implement the simulations we envisioned a scenario where two storage conditions were under investi-gation. This choice was motivated by a number of real situations associated to shelf life determination of foodproducts, in particular the one described in Section 2. In that case two environmental chambers were used inan attempt to investigate two (different) aggressive storage conditions.

By using the model proposed, we assumed that the failure time (shelf life time) T ij of the j th unit evaluatedat time si (i = 1, 2, . . . , k ; j = 1, 2, . . . , ni ) was distributed as a Weibull (a j ; d j ) (notation: T ij $ Weibull (a j ; d j )),with parameters:

• a j = exp(X j b) = exp (b0 + b1X j 1), that is, X j = (1,X j 1);• d j = exp(W j h) = exp (h0 + h1W j 1), that is, W j = (1,W j 1),

where

X j1 ¼ W j1 ¼ 1 if the jth unit evaluated at si was stored under the condition 1

0 if the jth unit evaluated at si was stored under the condition 2

&

Consequently,

T ij $ Weibullða j; d jÞ ¼ Weibullðeb0 ; eh0Þ for units stored under the condition 1

Weibullðeðb0þb1Þ; eðh0þh1ÞÞ for units stored under the condition 2

&

4.1.2. Performance or ‘‘Quality’’ measures

The expressions listed below were used as measures of ‘‘quality’’ or ‘‘performance’’. In all of them, M denotes

the number of samples generated; u is the real value of the quantity being estimated (for instance, a given percen-tile or fraction of ‘‘defectives’’) and ui denotes the estimated value of u, calculated for the i th sample.

• Estimated value of u (based on the M samples):

u ¼P M

i¼1ui

M ;

• Standard Deviation (SD):

SD¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP

M

i¼1ðui À uÞ2

M À 1s ;


http://-/?-

http://-/?-



• Bias (B ):

B ¼ u À u;

• Relative Bias (RB):

RB¼

ju À uju Â

100%¼

j Bju Â

100%;

• Mean Square Error (MSE):

MSE ¼ ðSDÞ2 þ B2:

4.1.3. The choice of the parameter values b0, b1, h0 and h1 (and consequently, a j and d j ) used in the simulation

Table 1 presents the five cases considered. The values of b0, b1, h0 and h1 chosen were such that the respec-tive a j and d j values used by Freitas et al. (2003) could be reproduced here. In that case, only the scale param-eter a j was modeled as a function of explanatory variables; the form parameter remained constant for differentstorage conditions. The range of values used by them in the simulations was based on the estimated values

obtained for the real data set.It is important to emphasize that for each case listed in Table 1, the parameters values were chosen in anattempt to make the degradation mechanism represented by the condition 2 (X j 1 = W j 1 = 1) more ‘‘aggres-sive’’ than the one for condition 1. That is why (except for case 5), dcond. 2 > dcond. 1.

4.1.4. Sample plans implemented in the simulation study

By ‘‘sample plan’’ we mean: the number of weeks of follow-up (nw); the number of panelists allocated toeach week (np) and the total number of product units under test (N = nw · np). Table 2 summarizes the sampleplans used in this simulation study. Note that the sample plans listed Table 2 include the plans implemented inthe real situation described in Section 2 (Plans III and V).

4.1.5. Steps followed in the simulation study

In the proposed model, the underlying distribution of the failure time (or shelf life time) is a Weibullbut in a real situation, one does not observe the actual failure time. In fact the information available is

Table 1Parameter values used in the simulation study

Cases Parameter Condition 1(X j 1 = W j 1 = 0)

Condition 2(X j 1 = W j 1 = 1)

b0 b1 h0 h1 a j d j a j d j

(1) À3.35 0.62 0.0 0.7 0.035 1.0 0.065 2.0(2) À3.35 0.62 0.18 0.7 0.035 1.2 0.065 2.4(3) À3.00 0.62 0.0 0.7 0.050 1.0 0.093 2.0

(4) À3.00 0.62 0.18 0.7 0.050 1.2 0.093 2.4(5) À3.35 0.62 0.0 0.0 0.035 1.0 0.065 1.0

Table 2Sample plans considered for each condition

Plan Number of weeks (ns) Number of panelists (n p) N = ns · np

I 12 7 84II 12 14 168III 18 7 126IV 18 14 252V 36 7 252

VI 36 14 504




the score assigned by a panelist to a given product unit. In the model proposed in Section 4, the resultsare dichotomized according to the cut-off point established by the company. In other words, the result iseither zero or one depending on the failure time (T ij ) being located before or after the evaluation time (si ).

In the simulation we assumed that the evaluations were implemented weekly. In addition, the total follow-up time was nw weeks and np panelists were requested weekly to compose the laboratory panel. The main steps

followed are given below:

• Step 1: Choose a set of parameter values b0, b1 (a j ) and h0, h1 (d j ) listed in Table 1 and a sample plan fromTable 2.

• Step 2: Generate a vector of ‘‘evaluation times’’ (weeks) taking into account the sample plan chosen in step1.

• Step 3: Generate a random sample of size N = nw · np of failure times from a Weibull (a j ; d j ), whereX j 1 = W j 1 = 0, that is, a j ¼ eb0 and d j ¼ eh0 . Use the parameter values b0 and h0 chosen in step 1.

• Step 4: Generate a random sample of size N = nw · np of failure times from a Weibull (a j ; d j ), whereX j 1 = W j 1 = 1 in other words, a j ¼ eb0þb1 and d j ¼ eh0þh1 . Use the parameter values b0; b1; h0 and h1 chosenin step 1.

• Step 5: Dichotomize the 2N results obtained in steps 3 and 4 and store them in a vector Y 2N ·1. The dichot-

omization is done by comparing each of the 2N failure times t with the evaluations times generated in step 2(s). If t > s then Y = 0, otherwise Y = 1 (meaning that failure has already occurred).

• Step 6: Calculate the maximum likelihood estimator of b0, b1, h0, h1 (and consequently a j and d j ) replacing‘‘ yij ’’ in expression (10) by the dichotomized data.

• Step 7: Find the estimates of percentiles and fraction of defectives, using the parameter estimates calculatedin step 6.

• Step 8: Store the values calculated in step 6.• Step 9: Generate another pair of samples as in steps 3 and 4 and repeat steps 5–8.• Step 10: Steps 3–8 should be repeated until M = 1000 samples of each condition (X j 1 = W j 1 = 0 and

X j 1 = W j 1 = 1) have been generated. Then based on the M = 1000 samples, calculate for each condition

and for each quantity of interest u (fraction defectives and percentiles):

– the estimated value of u, that is u based on the M = 1000 values ðuiÞ estimated (i=1, 2, . . . ,M); – the standard deviation (SD) based on the 1000 values ui; – the bias (B ), the relative bias (RB) and the mean square error (MSE).

These steps were implemented for each one of the sample plans listed in Table 2.

4.2. Simulation results

Although several ‘‘quality’’ measures have been calculated, we decided to present the results only for themean square error since this measure summarizes both, bias and precision of each estimate. We will be dis-cussing the results related to the percentile estimates only, since these are the quantities we are most interestedon.

4.2.1. Results related to the percentile estimates (t p )

Figs. 1 and 2 show the plot of the mean square error (MSE) of t p vs. ln(p), for six values of p (10À6; 10À5;10À4; 10À3; 10À2; 10À1). These plots were constructed for each case (see Table 1) and sample plan (see Table 2)considered in the simulations.

Some important aspects can be pointed out:

1. The MSE values increase along with the probability values p. This general pattern is found not only inall the sample plans considered but also in all the cases within a given sample plan. In other words, theoverall quality of the estimates gets worth with the increase of the percentile value. This is an attractivecharacteristic especially regarding shelf life estimation since in this kind of problem one is always interested

in small percentiles;




–6 –5 –4 –3 –2 –1

0 . 0

0 . 5

1 . 0

1 . 5

2 . 0

2 . 5

3 . 0

M S E

M S E

M S E

–6 –5 –4 –3 –2 –1

–6 –5 –4 –3 –2 –1

–6 –5 –4 –3 –2 –1

–6 –5 –4 –3 –2 –1

–6 –5 –4 –3 –2 –1

lnp

M S E

M S E

Plan I– Condition 1nw=12; np=7; N=84

Case 1

Case 2

Case 3

Case 4

Case 5

0 . 0

0 . 5

1 . 0

1 . 5

2 . 0

2 . 5

3 . 0

lnp

M S E

lnp

M S E

M S E

M S E

M S E

Plan I– Condition 2nw=12; np=7; N=84

Case 1

Case 2

Case 3

Case 4

Case 5

0 . 0

0 . 5

1 . 0

1 . 5

2 . 0

2 . 5

3 . 0

M S E

0 . 0

0 . 5

1 . 0

1 . 5

2 . 0

2 . 5

3 . 0

M S E

0 . 0

0 . 5

1 . 0

1 . 5

2 . 0

2 . 5

3 . 0

M S E

0 . 0

0 . 5

1 . 0

1 . 5

2 . 0

2 . 5

3 . 0

M S E

0 . 0

0 . 5

1 . 0

1 . 5

2 . 0

2 . 5

3 . 0

lnp

M S E

Plan II– Condition 1

nw=12; np=14; N=168

Case 1Case 2Case 3Case 4Case 5

0 . 0

0 . 5

1 . 0

1 . 5

2 . 0

2 . 5

3 . 0

M S E

0 . 0

0 . 5

1 . 0

1 . 5

2 . 0

2 . 5

3 . 0

M S E

0 . 0

0 . 5

1 . 0

1 . 5

2 . 0

2 . 5

3 . 0

M S E

0 . 0

0 . 5

1 . 0

1 . 5

2 . 0

2 . 5

3 . 0

M S E

0 . 0

0 . 5

1 . 0

1 . 5

2 . 0

2 . 5

3 . 0

lnp

M S E

Plan II– Condition 2

nw=12; np=14; N=168


0 . 0

0 . 5

1 . 0

1 . 5

2 . 0

2 . 5

3 . 0

lnp

M S E

0 . 5

1 . 0

1 . 5

lnp

M S E

lnp

M S E

M S E

M S E

Plan III– Condition 1nw=18; np=7; N=126


0 . 0

0 . 5

1 . 0

1 . 5

2 . 0

2 . 5

3 . 0

lnp

M S E

0 . 0

1 . 5

2 . 0

lnp

M S E

0 . 0

lnp

M S E

M S E

M S E

Plan III– Condition 2nw=18; np=7; N=126


a

c

e

d

f

b

Fig. 1. MSE values vs. ln( p) – Plans I–III (all cases).




lnp lnp

lnp lnp

lnp lnp

1nw=18; np=14; N=252


2nw=18; np=14; N=252


1nw=36; np=7; N=252


2nw=36; np=7; N=252


1nw=36; np=14; N=504


2nw=36; np=14; N=504


0 . 0

0 . 5

1 . 0

1 . 5

2 . 0

2 . 5

3 . 0

M S E

0 . 0

0 . 5

1 . 0

1 . 5

2 . 0

2 . 5

3 . 0

M S E

0 . 0

0 . 5

1 . 0

1 . 5

2 . 0

2 . 5

3 . 0

M S E

0 . 0

0 . 5

1 . 0

1 . 5

2 . 0

2 . 5

3 . 0

M S E

0 . 0

0 . 5

1 . 0

1 . 5

2 . 0

2 . 5

3 . 0

M S E

0 . 0

0 . 5

1 . 0

1 . 5

2 . 0

2 . 5

3 . 0

M S E

a

c

e f

d

b

Fig. 2. MSE values vs. ln( p) – Plans IV–VI (all cases).




2. For both conditions, the MSE values decrease when one goes from plan I to plan VI. This pattern wasalready expected since the sample size (N ) increases in this direction;

3. In the real situation described in Section 2, it was mentioned that for the environmental chamber 2, thefollow up time was 18 weeks and that about seven panelists were allocated to each evaluation date.Fig. 1f shows the simulation results obtained for condition 2 (assumed to be more aggressive than condition

1 and ‘‘similar’’ to the environmental chamber 2), under the same sample plan used in the real situation. Itwas already mentioned that an increase in sample size leads to a reduction in the MSE. Comparing Figs. 1dand f, we note that with an increase of about 33% in the sample size N (from N = 126 to N = 168) it is pos-sible to get much better estimates for the percentiles. In fact, Fig. 1d shows that these better results can beobtained by increasing the number of panelists allocated to each week and reducing the follow up time (inthis case, 18–12 weeks). On the other hand, if one decides to double the sample size, from N = 126 (Fig. 1f)to N = 252, Figs. 2b and d show that it does not make any difference if this increase in sample size isobtained by increasing the number of panelists (np) per week and decreasing the follow up time (Fig. 2b)or by increasing the follow up time and keeping the seven panelists per week (Fig. 2d);

4. Finally, if we look at Figs. 2a and c, we see that for condition 1, if one is interested in percentiles associatedwith p 6 10À3 then, Plan V (N = 252; nw = 36; used in the real experiment) and Plan IV (N = 252 butnw = 18) provide estimates with similar MSE.

5. Application

The analysis of the example introduced in Section 2 consisted of two parts:

1. We have concentrated on the data of the two storage conditions ‘‘environmental chamber 1’’ (30 °C; 80%relative humidity) and ‘‘environmental chamber 2’’ (37 °C). The experimental data were analyzed by fittingthe model proposed in Section 3 to each attribute separately. Percentiles and fraction of defectives werethen estimated and the shelf life for each attribute and storage condition was characterized by chosenpercentiles;

2. The results obtained were then compared with the ones obtained by Freitas et al. (2003). The comparisonwas done in terms of the percentiles estimates obtained by each one of the two models.

The first step of the analysis was the dichotomization of the data (scores). As it was already mentioned, thefood company established the score ‘‘3’’ as the cut-off point.

It was assumed that T ij , the failure time (shelf life time) of the j th unit evaluated at si (fixed) followed aWeibull (a j ; d j ) distribution (d j P 1) such that:

• a j = exp (b0 + b1X 1 j ) and• d j = exp (h0 + h1X 1 j ), for all j = 1, 2, . . . , ni and i = 1, 2, . . . , k with k = (18 + 36) weeks (Chambers 1 and 2)

and,

X j1 ¼ 0 if the jth unit evaluated at si was stored in Chamber 1

1 if the jth unit evaluated at si was stored in Chamber 2

&

Here, kt = (b0; b1; h0; h1).

Table 3 presents the results of the model fitting. The values in parenthesis below the parameter estimates,are the (asymptotic) standard deviations. The same table presents the Wald Statistic (Cox & Hinkley, 1974),one for each attribute, along with the respective p-values related to the hypothesis testing:

• H0: the failure time (shelf life time) of the units stored under the two conditions can be modeled by the sameWeibull distribution (same form and scale parameters).

• H1: the failure times can be modeled by different Weibull distributions.




In other words:

• H 0: b1 = 0 and h1 = 0 vs.

• H 1: b15 0 or h15 0 or (b15 0 and h15 0)

There is a strong evidence against the null hypothesis. In other words, it seems that the degradation mech-anism induced by the two storage conditions (Chambers 1 and 2) can be modeled by different Weibull distri-butions. The next step is then to investigate the nature of this difference since:

• if b1 = 0 then for Chamber 1 (X 1 j = 0), a = exp (b0) and d = exp (h0) while for Chamber 2 (X ij = 1),a = exp (b0) and d = exp (h0 + h1);

• if h1 = 0 then for Chamber 1 (X 1 j = 0), a = exp (b0) and d = exp (h0) while for Chamber 2 (X ij = 1),a = exp (b0 + b1) and d = exp (h0).

It is important to recall that Freitas et al. (2003), analyzed the same data set. The model implemented wasmore restrictive though. The form parameter d was not only considered fixed (i.e. was not dependent onexplanatory variables) but also assumed to be the same for both storage conditions. This is exactly the secondscenario listed above. Therefore, the model proposed here includes the situation analyzed by the above men-tioned authors as a special case.

Table 4 presents the p-values related to the individual hypothesis testing, H 0: b1 = 0 vs. H 1: b15 0 andH 0:h1 = 0 vs. H 1:h15 0. Those tests were constructed applying the asymptotic normality of the maximumlikelihood estimators. The right-hand side of Table 4 shows the Weibull parameters estimates for each storagecondition. Those values were calculated taking into account the results of the hypothesis testing. For example,the null hypothesis H 0:h1 = 0 was not rejected for the attribute ‘‘flavor’’ ( p = 0.4065). Therefore, we can

Table 3Parameter estimates and p-values of the hypothesis testing: H 0: b1 = 0 and h1 = 0 vs. H 1: b15 0 or h15 0 or {b15 0 and h5 0}

Attributes Estimates Wald Statistic

b0 b1 h0 h1

Odor À3.5794 0.8802 0.2704 0.7147 72090,64

(0.0079)

a

(0.0086) (0.0109) (0.0156) ( p = 0.0)Flavor À3.3386 0.6424 0.2773 0.0466 918.82

(0.0193) (0.0304) (0.0340) (0.0564) p = (0.0)

Appearance À3.8312 1.0619 0.5620 0.1402 14115.45(0.0203) (0.0227) (0.0247) (0.0346) p = (0.0)

a Standard deviation.

Table 4Hypothesis testing results and parameter estimates (a and d) for Chamber 1 (X 1 j = 0) and Chamber 2 (X 1 j = 1)

Attributes Estimates Chamber 1 Chamber 2

b0 b1 h0 h1 a j d j a j d j

Odor À3.5794 0.8802 0.2704 0.7147 0.0279 1.3105 0.0672 2.6781( p = 0)a ( p = 0)

Flavor À3.3386 0.6424 0.2773 0.0466 0.0355 1.3196 0.0675 1.3196( p = 0) ( p = 0.4065)

Appearance À3.8312 1.0619 0.5620 0.1402 0.0217 1.7542 0.0627 2.0182( p = 0) ( p = 0.00005)

a Probability of significance.




conclude that, for this attribute, the failure time of units stored in the two conditions can be modeled by Wei-bull distributions with the same form parameter (achamber 1 = 0.0355; achamber 2 = 0.0675; dchamber 1 = dchamber 2 =1.3196 % 1.32). The other attributes can be modeled by Weibull with distinct form and scale parameters foreach condition.

Fig. 3 presents the estimated hazard functions for each condition/attribute.

Table 5 presents the estimates for each condition along with the asymptotic 95% confidence intervals. Theresults seem to indicate that, except for ‘‘odor’’, the attributes of the units stored in Chamber 2 (37 °C) dete-riorated faster than those of products stored in Chamber 1. So, if the food company decides to use as a shelf life the 1% percentile, that is, t0.01 then (see Table 5):

0 5 10 15 20 25 30 35 0 5 10 15 25 30 35

0 .

0

0 .

2

0 .

4

0 . 6

0 .

8

week

h ( t )

20

Chamber 1

OdorFlavorAppearance

0 .

0

0 .

2

0 .

4

0 . 6

0 .

8

week

h ( t )

Chamber 2

OdorFlavorAppearance

Fig. 3. Estimated hazard function for each storage condition and attribute.

Table 5Percentiles estimates (t p weeks)- storage conditions: Chambers 1 and 2

Condition p Attributes

Odor Flavor Appearance

Chamber 1 10À3 0.18433 0.15023 0.89920(0.12478; 1.71415)a (0.03512; 0.74136) (0.37381; 3.67951)

10À2 1.07184 0.86304 3.34985

(0.57190; 6.17721) (0.21146; 3.42215) (1.82611; 8.71465)0.05 3.71760 2.96792 8.48302

(2.43153; 8.73865) (1.34931; 5.97352) (5.76131; 13.37145)0.50 27.10620 21.34663 37.42309

(20.35681; 34.12653) (17.60826; 25.01653) (26.83250; 52.58634)

Chamber 2 10À3 1.12788 0.07901 0.52050(0.97314; 2.37921) (0.01342; 0.42137) (0.12153; 1.97134)

10À2 2,66898 0.45394 1.63247(1.21324; 3.37324) (0.13627; 1,83675) (0.54178; 4.71579)

0.05 4.90491 1.56109 3.66069(2.17637; 5.23547) (0.77641; 3.14671) (1.97572; 6.61734)

0.50 12.96582 11.22809 13.29828(10.63524; 15.01381) ( 8.36551;15.25307) (10.58224; 18.34671)

a 95% (asymptotic) confidence interval.




• for units stored in Chamber 1 (30 °C and 80%): – Odor: 1.07 weeks (7 days) ([0.57 weeks; 6.18 weeks]); – Flavor: 0.86 weeks (6 days) ([0.21 weeks; 3.42 weeks]); – Appearance: 3.35 weeks (24 days) ([1.83 weeks; 8.71 weeks]);

• for units stored in Chamber 2 (37 °C)

– Odor

: 2.67 weeks (18.6 days) ([1.21 weeks; 3.37 weeks]); – Flavor: 0.45 weeks (3.1 days) ([0.13 weeks; 1.84 weeks]); – Appearance: 1.63 weeks (11 days) ([0.54 weeks; 4.71 weeks]);

We still have to deal with the problem that the food company needs to fix a unique shelf life for the product(associated to the sensory attributes). Once a specific parameter of the ‘‘failure time’’ (shelf life time) distribu-tion has been chosen to be reported as the shelf life, a usual practical approach is to adopt the smallest pointestimate value. In our example, the 1% percentile was chosen to represent the shelf life for each attribute.Then, using the proposed approach, the value to be used for products stored under the conditions simulatedby chamber 1 is 0.86 weeks (6 days). The construction of a confidence interval for this quantity is not an easytask since one needs to obtain first the probability density function of W ¼ minimumft odor

0:01 ; t flavor0:01 ; t

appearance0:01 g. An

alternative to the analytical calculations is the construction of this distribution empirically by Monte Carlo

simulation. Here a conservative approach is suggested and the interval is given by [0.21 weeks; 8.71 weeks],where 0.21 = min{0.57;0.21;1.83} (the minimum of the lower bounds values) and 8.71 = max{6.18;3.42;8.71} (the maximum of the upper bound values).

Similarly, the shelf life value to be reported for units stored under the conditions simulated by chamber 2 is0.45 weeks with a conservative confidence interval [0.13 weeks; 4.71 weeks].

It is important to emphasize that any other quantity could be used to represent the shelf life. For example,the median failure time (t0.50) or the mean failure time. Finally, it is interesting to note that for both storageconditions, the attribute ‘‘flavor’’ was the one with the smallest estimated value, indicating that it deterioratedfaster than the other two attributes.

5.1. Results comparison

Some of the results reported by Freitas et al. (2003) are reproduced in Tables 6 and 7.Table 6 reports the point estimates of the Weibull parameters, for each storage condition and attribute.

Comparing those values with the ones obtained in this work (Table 4) we note that although our modelallowed both Weibull parameters to vary freely, the point estimates of the scale parameters (a) obtainedthrough both models are very similar.

In addition, according to the results of our analysis, the failure time of units stored in both conditions(Chambers 1 and 2) could be modeled by Weibull distributions with the same form parameter (d) only when‘‘flavor’’ was the attribute under consideration. Therefore, for that particular case, it is fair to say that theparameters estimates obtained with the model developed by Freitas et al. (2003) should be very close to theones obtained with the model developed in this work, since the former is a particular case of the one latter.In fact, a close look at the parameter values shown in Table 6 confirms this expectation.

Table 6Parameter estimates obtained by Freitas et al. (2003)

Condition Attribute

Odor Flavor Appearance

a d a d a d

Chamber 1 0.0302 1.6 0.0358 1.4 0.0233 2

Chamber 2 0.0596 1.6 0.0659 1.4 0.0602 2




molds can grow as low as 0.80 (Jay, 1992). Some relationships have been shown to exist between aw values,temperature and nutrition for example. The water activity can be included as a covariate in the model.

We still have to deal with the problem of reporting a unique shelf life value and constructing a confidenceinterval for this quantity. The approach adopted in this paper was a conservative one and did not account forthe possibility of correlations between the failure times of the different attributes which in practice, it is pos-

sible to happen. It would be interesting to study a little bit more this situation. One possibility could be tomodel more than one attribute together, in other words, to work with the joint distribution of their time tofailure. This problem is now the subject of further research by us.

Acknowledgements

The authors are grateful to the referees and the editor for helpful comments that led to substantial improve-ments in the paper and also to CNPQ/Brazil for the financial support of this research.

Appendix A

In this section we present the expressions of the derivatives of the log-likelihood function needed in the

implementation of the Fisher Score algorithm and large sample confidence intervals for fraction of defectives and percentiles.

Maximum likelihood estimates are obtained by direct maximization of l (k) = ln L (k). The expressions of the first derivatives are:

olðkÞokok

t

!ðqþr þ2ÞÂ1

¼olðkÞob

olðkÞoh

" #;

where

olðkÞo

b !ðqþ1ÞÂ1 ¼ X

k

i¼1X

ni

j¼1

X t j

ÀeW jh

ðsie

X jb

ÞeW jh

þ

y ijeW jhðsie X jbÞeW jh

1 À eÀðsie

X jb

Þe

W jh 264

375

olðkÞoh

!ðr þ1ÞÂ1

¼Xk

i¼1

Xni

j¼1

W t j Àðsie

X jbÞeW jh

lnðsie X jbÞeW jh þ y ijðsie

X jbÞeW jh

lnðsie X jbÞeW jh

1 À eÀðsie X jbÞeW jh

264

375:

The elements of the Fisher Information Matrix I (k) are given below

I ðkÞ ¼ E À o2

lðkÞokok

t

!& 'ðqþr þ2ÞÂðqþr þ2Þ

¼ I 11 I 12

I t 12 I 22

" #:

Since, E

ð y ij

Þ ¼1

ÀeÀðsie

X jbÞeW jh

and E

ð1

Ày ij

Þ ¼eÀðsie X jbÞe

W jh

then:

I 11 ¼ E À o2

lðkÞobobt

!& 'ðqþ1ÞÂðqþ1Þ

¼Xk

i¼1

Xn j

j¼1

X t j X j

e2W jhðsie X jbÞ2eW jh

eÀðsie X jbÞeW jh

1 À eÀðsie X jbÞeW jh

264

375;

I 22 ¼ E À o2

lðkÞohoht

!& 'ðr þ1ÞÂðr þ1Þ

¼Xk

i¼1

Xn j

j¼1

W t jW j

ðsie X jbÞ2eW jhðlnðsie

X jbÞÞ2e2W jheÀðsie

X jbÞeW jh

1 À eÀðsie X jbÞe

W jh

264

375;

I 12 ¼ E À o2

lðkÞoboh

t !& 'ðq

þ1

ÞÂðr

þ1

Þ

¼ Xk

i

¼1 X

n j

j

¼1

X t jW j

ðsie X jbÞ2eW jh

lnðsie X jbÞe2W jheÀðsie X jbÞe

W jh

1À

eÀð

sie X jb

Þe

W jh 264

375:




Now, if k ¼ ðb0; b1; . . . ; bq; h0; h1; . . . ; hr Þt is the maximum likelihood estimator of k = (b0, b1, . . . ,bq; h0,

h1, . . ., hr)t then:

• using maximum likelihood large sample theory (asymptotic normality) and the delta method (see Cox &Hinkley, 1974), it is possible to find expressions of the asymptotic variance estimator

^Var ðt p ð jÞÞ ¼ Z t I À1ðkÞZ jk¼k;

where Z is a vector of dimension (q + r + 2) · 1, given by

Z ¼ ÀeÀ X jb½À lnð1 À p Þ�1

eðW jhÞ X t

j

ÀeÀ X jb½À lnð1 À p Þ�1

eðW jhÞ ½eÀW jh lnðÀ lnð1 À p ÞÞ�W t

j

24

35: ð10Þ

Therefore, the upper (UB) and lower bounds (LB) of a 95% (asymptotic) confidence interval for t p( j ) aregiven respectively by:

UB ¼ t p ð jÞ þ 1:96 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

^Var ðt p ð jÞq

Þ& '

;

LB ¼ t p ð jÞ À 1:96

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi^Var ðt p ð jÞ

q Þ

& ';

• equivalently, making use of the delta method and the asymptotic normality property of the maximum like-lihood estimators (Cox & Hinkley, 1974) we get the expression of a 95% confidence interval for the fractionof defectives at t0, F j (t0) = 1 À R j (t0):

UB ¼ 1 À f^ R jðt 0Þgexpf1:96 ffiffiffiffiffiffiffiffiffiffi

V ar ð/Þp

g;

LB ¼ 1 À f^ R jðt 0Þgexp

fÀ1:96 ffiffiffiffiffiffiffiffiffiffiV ar

ð/Þp g:

where / ¼ ln½ðÀ ln ^ R jðt 0ÞÞ� ¼ ln À ln exp Àðt 0e X jbÞeW j h

& ' !; V ar ð/Þ ¼ Z t I À1ðkÞZ jk¼k, and Z is a vector

(q + r + 2) · 1 given by:

Z ¼o/

ob

o/

oh

" #¼

eW jh X t j

eW jh lnðt 0e X jbÞW t j

" #: ð11Þ

We point out that the bounds (UB and LB) were calculated applying the asymptotic normal distri-bution to the transformation / = ln(Àln R j (t0)) for which the range is unrestricted. Then, the confi-dence interval for the fraction of defectives is found applying the inverse transformation. Thisprocedures suggested by Kalbfleisch and Prentice (1992, p. 18) prevents the occurrence of limits out-

side the range [0,1].

References

Cox, D. R., & Hinkley, D. U. (1974). Theoretical statistics. London: Chapman and Hall.Draper, N., & Smith, H. (1998). Applied Regression Analysis (3rd ed.). New York: Wiley.Freitas, M. A., Borges, W. S., & Ho, L. L. (2003). A statistical model for shelf life estimation using sensory evaluation scores.

Communications in Statistics – Theory and Methods, 32(8), 1559–1589.Freitas, M. A., Borges, W., & Ho, L. L. (2004). Sample plans comparisons for shelf life estimation using sensory evaluation scores.

International Journal of Quality & Reliability Management, 21(4), 439–466.Gacula, M. C. Jr., (1975). The design of experiments for shelf life study. Journal of Food Science, 40, 399–403.Gacula, M. C., Jr., & Kubala, J. J. (1975). Statistical models for shelf life failures. Journal of Food Science, 40, 404–409.

Gacula, M. C., Jr., & Singh, J. (1984). Statistical methods in food and consumer research. New York: Academic Press.


shelf life determination using sensory evaluation scores

Documents