chemometric analysis of food - ocean...

6
Optical spectroscopy has a long history as a powerful workhorse in ana- lytical chemistry. It is capable of measuring even minute concentrations of the compound of interest, typically in a solution. Most chemists have applied the Beer-Lambert law at one point or another in their careers to determine an unknown concentration from a solution’s absorbance at a specific wavelength, usually based on an independently recorded calibration curve. However, few are aware of the limitations of this tried-and-true law. It is no longer valid for even moderately concentrat- ed solutions, for samples that scatter or emit light, if stray light is gener- ated in the setup, in the presence of concentration-dependent chemical equilibria, for inhomogeneous media, or if the spectral resolution is too low. It is also very difficult to apply to measurements made in reflectance mode, as the depth of penetration and impact of scattering are difficult to assess. Chemometrics is a powerful tool for the analysis of optical spectroscopy of complex chemical systems like foods. This primer introduces the basics of chemometric analysis, showing how it can be applied to reflec- tance measurements of apples for quality control. From quantitative measurement of sugars to identification of the variety, chemometrics is a solution that is ripe for the picking. Keywords Food quality Apple variety and sweetness Statistical modelling Techniques Reflectance spectroscopy Chemometric analysis Applications Brix (sweetness) measurement Apple classification Quality control Chemometric Analysis of Food Written by Dieter Bingemann and Cicely Rathmell, Ocean Optics Application Note Limitations of the Beer-Lambert Law Reflectance Spectroscopy Reveals the Variety and Sweetness of Apples

Upload: ngomien

Post on 15-Mar-2018

214 views

Category:

Documents


1 download

TRANSCRIPT

Optical spectroscopy has a long history as a powerful workhorse in ana-lytical chemistry. It is capable of measuring even minute concentrations of the compound of interest, typically in a solution. Most chemists have applied the Beer-Lambert law at one point or another in their careers to determine an unknown concentration from a solution’s absorbance at a specific wavelength, usually based on an independently recorded calibration curve. However, few are aware of the limitations of this tried-and-true law. It is no longer valid for even moderately concentrat-ed solutions, for samples that scatter or emit light, if stray light is gener-ated in the setup, in the presence of concentration-dependent chemical equilibria, for inhomogeneous media, or if the spectral resolution is too low. It is also very difficult to apply to measurements made in reflectance mode, as the depth of penetration and impact of scattering are difficult to assess.

Chemometrics is a powerful tool for the analysis of optical spectroscopy of complex chemical systems like foods. This primer introduces the basics of chemometric analysis, showing how it can be applied to reflec-tance measurements of apples for quality control. From quantitative measurement of sugars to identification of the variety, chemometrics is a solution that is ripe for the picking.

Keywords

• Food quality

• Apple variety and sweetness

• Statistical modelling

Techniques

• Reflectance spectroscopy

• Chemometric analysis

Applications

• Brix (sweetness) measurement

• Apple classification

• Quality control

Chemometric Analysis of FoodWritten by Dieter Bingemann and Cicely Rathmell, Ocean Optics

Application Note

Limitations of the Beer-Lambert Law

Reflectance Spectroscopy Reveals the Varietyand Sweetness of Apples

Even when a simple Beer-Lambert relationship at a single wavelength cannot be used to determine the concentra-tion of interest, the information needed to extract that component is still present within the recorded spectrum. More sophisticated mathematical approaches can be used to determine an unknown concentration in these complex cases – a field known as chemometrics. Chemo-metric tools have gained popularity in recent years thanks to their powerful capabilities and application in a wide range of fields.

Optical spectroscopy has a long history as a powerful workhorse in ana-lytical chemistry. It is capable of measuring even minute concentrations of the compound of interest, typically in a solution. Most chemists have applied the Beer-Lambert law at one point or another in their careers to determine an unknown concentration from a solution’s absorbance at a specific wavelength, usually based on an independently recorded calibration curve. However, few are aware of the limitations of this tried-and-true law. It is no longer valid for even moderately concentrat-ed solutions, for samples that scatter or emit light, if stray light is gener-ated in the setup, in the presence of concentration-dependent chemical equilibria, for inhomogeneous media, or if the spectral resolution is too low. It is also very difficult to apply to measurements made in reflectance mode, as the depth of penetration and impact of scattering are difficult to assess.

www.oceanoptics.com | [email protected] | US +1 727-733-2447 EUROPE +31 26-3190500 ASIA +86 21-6295-6600

In today’s global food network, fruit is often picked well before maturity and allowed to ripen during transport, while other fruit is stored from season to season to provide a year-round supply. In the effort to provide convenience, food quality often takes a back-seat to appearance, leaving both consumers and resellers guessing as to the ripeness of the fruit. Chemometric tools, however, can predict the sweetness of an apple, as well as its nutritional value, by analyzing near-infra-red diffuse reflectance spectra obtained without open-ing the fruit.

Chemometrics shines where traditional spectroscopic analysis fails, as in the example of NIR diffuse reflec-tance spectroscopy of an apple. Why would the trusted Beer-Lambert law not apply in this application? For one, diffuse reflectance spectra depend on a number of factors that are hard to control, such as the measure-ment geometry, the size of the scattering particles inside the fruit, or the surface properties of the apple’s skin. Second, near-infrared spectroscopy is sensitive to all -OH, -NH, and -CH bond vibrational overtones. Most organic compounds absorb in this spectral range, which leads to broad, seemingly featureless NIR spectra due to the countless overlapping absorption bands.

NIR absorbance spectra, unlike their infrared cousins, cannot readily be assigned to individual chemical bonds, and thus this region of the electromagnetic spectrum has been ignored for decades. The development of chemometrics, however, has opened up a treasure trove of spectroscopic information that is easily accessible with sensitive optical spectrometers and affordable light sources.

The Need for Chemometrics

What, Exactly, is Chemometrics?Even when a simple Beer-Lambert relationship at a single wavelength cannot be used to determine the concentra-tion of interest, the information needed to extract that component is still present within the recorded spectrum. More sophisticated mathematical approaches can be used to determine an unknown concentration in these complex cases – a field known as chemometrics. Chemo-metric tools have gained popularity in recent years thanks to their powerful capabilities and application in a wide range of fields.

Chemometrics is the interdisciplinary application of multivariate statistics through powerful software tools to extract qualitative and quantitative answers from scientific measurements. Here we will focus on the interpretation of optical spectroscopic measurements, which is an important (but by no means the only) application of the computational technique.

Three types of questions are commonly answered with chemometrics (see Figure 1):

1. Quantification: How much of a certain substance is in a sample?

• Predicting the octane number of gasoline fuels• Measuring organic matter content in soils• Detecting moisture content of paper

2. Classification: What is the identity of the sample?

• Identification of raw materials• Authentication of high-quality wines or whiskeys

Figure 1: Schematic overview of the different uses of chemometrics

www.oceanoptics.com | [email protected] | US +1 727-733-2447 EUROPE +31 26-3190500 ASIA +86 21-6295-6600

Just as the Beer-Lambert law requires creation of a calibration curve, chemometrics requires its own calibra-tion, created using a large number of spectra from sam-ples for which the analyte concentration (in the case of quantification) or group membership (in the case of classification or discrimination) is known. Chemometric software tools utilize machine learning algorithms to develop a model to describe the concentration or group membership, and thus the calibration process is usually referred to as “training.” Training of the chemometric model often requires some tweaking of parameters to optimize the analysis. This is done using a process called cross-validation, whereby the model is tested using spectra pulled from the original data set.

It is also important to validate the model through prediction from spectra of additional known samples, with this “test set” being distinct from the original “training set.” If the error of the test set predictions is within the desired precision and accuracy, the model is ready to be used with the spectrum collected from a new sample to predict the answer of interest (concen-tration or group membership). A simple NIR reflectance spectrum of an apple, for example, when processed using a well-developed chemometric model, can now be used to predict a complex quantity, such as the apple’s sweetness.

While the effort to train the model might seem like a large up-front investment in time, the payoff is huge in terms of allowing fast, nondestructive, cheap predic-tions in the field, thus eliminating the cost of sending samples of goods for time-consuming and expensive lab analysis. NIR diffuse reflection spectroscopy in particular allows for very fast measurements, as little to no sample preparation is required.

3. Discrimination: Is the sample similar to a quality standard?

• Identification of out-of-specification samples• Monitoring progress in batch processing• Early detection of unusual events in a continuous process

Chemometrics borrows heavily from multivariate statis-tics, the science of using many observables to predict an unknown parameter when their relationship is not known. In spectroscopy, the observables are the absor-bance at a large number of wavelengths (the spectrum), but could possibly also include additional measure-ments, such as the temperature. Mathematically, the process is similar to the common (univariate) linear regression algorithm used to predict the unknown y from the measured variable x. In the case of multivariate statistics, however, instead of a single independent variable, x, now there is a large number of variables. Not surprisingly, linear algebra takes center stage in these calculations.

How, exactly, is this done? Instead of a single accepted approach, as in the “normal” linear regression, we now have to choose from an entire alphabet-soup of meth-ods: PLS, SVM, PCA and more. Fortunately, the details of these individual methods need not be understood in order to get started. In fact, even though their approaches can be very different, the results are often similar.

One popular method for quantification is Partial Least Squares Regression (PLS), which determines sets of spectra (the “components”) that can most effectively explain the variations in the concentration of the analyte. Another popular method used for classification is called Support Vector Machine (SVM), while discrimi-nation, the comparison against a standard, is often done with Principal Component Analysis (PCA).

The Chemometrics Three-Step

Just as numerous as the mathematical methods is the number of software tools available for the development of the trained model. Besides chemometric packages for programs like the statistics software “R,” or toolboxes for MATLAB, there are also dedicated com-mercial chemometrics programs, such as Analyze IQ, GRAMS (Thermo Fisher), Unscrambler (Camo) or Pirouette (Infometrix), just to name a few. These

The Math Under the Hood

Software Tools

www.oceanoptics.com | [email protected] | US +1 727-733-2447 EUROPE +31 26-3190500 ASIA +86 21-6295-6600

The sugar content of fruit (primarily fructose, glucose and sucrose) is commonly measured in sum as the solu-ble solids content (SSC) in the expressed fruit juice with a refractometer and reported as degrees Brix (°Bx), or grams of sucrose equivalent per 100 mL. Typical values range from 10°Bx to 16°Bx, depending on the apple variety, with unripe and ripe apples of the same variety differing by up to 4°Bx.

A Brix measurement is time consuming and requires sacrificial sampling from each batch to perform labora-tory analysis of the fruit. Chemometrics using near-in-frared reflectance spectra offers a rapid and nonde-structive alternative. In this first case study we will review the typical steps required to develop and test a chemometric model for a complex analyte, such as °Bx.

We randomly split the data set into one-third for the test set and two-thirds for the training set to optimize the model for the Brix value in repeated cross-validation. Cross-validation helps to walk the fine line between poor prediction in general (underfitting) and poor prediction on unknowns despite good performance on the training set (overfitting). In this case study the best model performance is achieved by including 5 compo-nents (basis vectors or dimensions) in the model. The quality of the prediction is obvious in Figure 3, which compares the apple’s sweetness predicted by the model from the NIR reflectance spectrum with the actual Brix value measured in the lab for both training and test sets. The deviation between predicted and actual values is summarized in the “standard error of prediction” (SEP), a measure of the quality of the model, which is better than 0.3°Bx in this investigation.

Figure 2a shows the diffuse reflection measurement setup and the recorded NIR reflectance spectra for 76 Ginger Gold apples, collected using a Flame-NIR spec-trometer (950-1650 nm) and tungsten halogen light source. Both ripe and unripe apples were used, and their Brix values determined in a separate lab analysis. Spectra from 5 locations across the “equator” are aver-

aged for each apple (Figure 2b). Due to slight differenc-es in the measurement geometry and the shape of the apple the spectra appear shifted and scaled relative to each other, which is corrected for in a “pre-processing” step using the SNV (standard normal variate) method (Figure 2c). As only the differences between the spectra contain the information about the varying sweetness, we subtract the average of all spectra (a process called mean-centering, shown in Figure 2d).

Case Study: Predicting Apple Sweetness

Figure 2: (a) Experimental setup, (b) Near-infrared diffuse reflectance spectra as recorded, (c) Spectra after scaling with SNV to correct for geometric variations, (d) Differences between the scaled spectra and the overall average spectrum.

Figure 3: Comparison of the actual sweetness of apples (as measured in a lab as the Brix value) and the prediction of the Brix value based on the near-infrared diffuse reflectance spectrum.

programs offer a large set of tools to build, optimize and test chemometric models and to combine them into decision trees, thus allowing the development of very sophisticated analyses.

Case Study: Identifying Apple Variety

www.oceanoptics.com | [email protected] | US +1 727-733-2447 EUROPE +31 26-3190500 ASIA +86 21-6295-6600

As an example of a classification task we demonstrate how to identify an apple’s variety using chemometric tools based on the visible diffuse reflectance spectra recorded in a setup similar to that depicted in Figure 2. The only difference was that our visible Flame spectrom-eter was used to collect the reflectance spectrum from 480-920 nm in order to gather information about pigments like carotenoids (420-500 nm), anthocyanins (540-550 nm) and chlorophyll (600-700 nm).

The recorded spectra were pre-processed in the same manner as described above: scaled with SNV and mean-centered to remove any commonalities and isolate the differences between the spectra (and hence the apples). To visualize the differences we perform a principal component analysis – a mathematical proce-dure that identifies those parts of the spectrum (the principal components) that can explain the largest varia-tions between the different apple spectra most success-fully.

Mathematically speaking, the principal components are directions in the multidimensional space defined by all wavelengths. In the case of this apple study they repre-sent combinations of the individual spectra for the main pigments in the apple skin, namely carotenoids (orange), anthocyanins (red) and chlorophyll a and b (600-700 nm). The principal components allow us to represent the highly multidimensional data set in terms of just a few important dimensions, separating the interesting information from meaningless noise.

For the current data set we found that 97% of the varia-tion between the spectra can be explained with just two principal components. Different apples “score” differ-ent amounts in the direction of these two principal com-ponents, i.e., their spectrum can be described to varying degrees as a combination of those two components, as shown in Figure 4. It becomes immediately obvious that the apple varieties fall into three groups: green apples, yellow-green apples and red apples, as indicat-ed. With this plot (the training) in hand, it is possible to measure the spectrum of an unknown apple in the same way, obtain scores for the existing principal com-ponents, and then infer the apple’s variety based on the location of its principal component scores in the plot.

Figure 4: Principal component analysis of the visible diffuse reflec-tance spectra for apples of different varieties. Labels: gg = Ginger Gold, gd = Golden Delicious, gs = Granny Smith, ff = Fuji, hc = Honeycrisp, mc = Macintosh.

Identifying apples as yellow, green or red based on their visible reflection seems trivial and is meant to serve as an illustration of the general approach in a classification task. “Support Vector Machine” (SVM) is a popular algo-rithm used to classify samples successfully even in much less clear-cut applications. To demonstrate, we randomly split our data set and trained an SVM classification model on 80% of the visible apple spectra (a total of 576 spectra) to recognize the apple variety. We determined the optimum values for the two model parameters in a cross-validation grid search and tested the performance of the final model on the remaining 20% of the data (our test set). How did the model do? In cross-validation the classification error was 0.3%; the model made no mistakes in the separate test set of 144 spectra. How do you like them apples?

The combination of fast, modular spectrometers with powerful chemometrics tools creates new opportunities for rapid, on-site testing of foods and other samples. From quick and inexpensive measurements in complex situations, to sample identification rivaling expert abilities or online quality control – the possibilities are endless. Modern chemometrics software packages make this tool set available for practitioners in all disciplines. Chemo-metrics might just be the tool you were looking for.

Conclusion

www.oceanoptics.com | [email protected] | US +1 727-733-2447 EUROPE +31 26-3190500 ASIA +86 21-6295-6600

Contact us today for more informationon setting up your spectroscopy

system from Ocean Optics.

Perform Your Own Reflectance Measurements of Apples

HL-2000-HP

QR200-12-MIXED

CSH

Flame-S-VIS-NIR

Flame-NIR

Tungsten halogen light source withcontinuous output from 360-2400 nm

Expanded wavelength coveragereflection probe for simultaneousUV-Vis and Vis-NIR measurements

Probe holder for measuring reflectionof curved surfaces

Vis-NIR spectrometer with Sony ILX511b detector for measurements from 350-1000 nm

Small footprint NIR spectrometerwith uncooled InGaAs detector for measurements from 950-1700 nm