statistical analysis of a database of absorption spectra...

14
Statistical analysis of a database of absorption spectra of phytoplankton and pigment concentrations using self-organizing maps Aymeric Chazottes, Annick Bricaud, Michel Crépon, and Sylvie Thiria We present a statistical analysis of a large set of absorption spectra of phytoplankton, measured in natural samples collected from ocean water, in conjunction with detailed pigment concentrations. We processed the absorption spectra with a sophisticated neural network method suitable for classifying complex phenomena, the so-called self-organizing maps (SOM) proposed by Kohonen [Kohonen, Self Organizing Maps (Springer-Verlag, 1984)]. The aim was to compress the information embedded in the data set into a reduced number of classes characterizing the data set, which facilitates the analysis. By processing the absorption spectra, we were able to retrieve well-known relationships among pigment concentrations and to display them on maps to facilitate their interpretation. We then showed that the SOM enabled us to extract pertinent information about pigment concentrations normalized to chlorophyll a. We were able to propose new relationships between the fucoxanthinTchl-a ratio and the derivative of the absorption spectrum at 510 nm and between the Tchl-bTchl-a ratio and the derivative at 640 nm. Finally, we demonstrate the possibility of inverting the absorption spectrum to retrieve the pigment concentrations with better accuracy than a regression analysis using the Tchl-a concentration derived from the absorption at 440 nm. We also discuss the data coding used to build the self-organizing map. This methodology is very general and can be used to analyze a large class of complex data. © 2006 Optical Society of America OCIS codes: 010.7340, 010.4450, 010.0010. 1. Introduction In oceanic case 1 waters, the spectral reflectance of the ocean is largely determined by the absorption proper- ties of phytoplankton cells present in the upper layer. Therefore under certain assumptions, the absorption coefficients of phytoplankton can be derived from ocean color measurements. 1–3 In turn, the absorption spectra of algal populations are strongly dependent on their pigment composition, so that much effort has been focused on extracting pigment information from algal absorption spectra, 4,5 or even directly from ocean color spectra. 6 Such methods could provide the possibility of getting taxonomic information on phy- toplankton from ocean color measured from satellite- borne sensors, which would have important biogeochemical applications. The retrieval of pigment concentrations from ab- sorption spectra remains a difficult inverse problem, because the variability of the absorption spectrum, at a given chlorophyll-a concentration, is dependent not only on the concentration of accessory pigments but also on the package effect. This effect accounts for the fact that, because of the packaging of pigments within phytoplankton cells (and inside cells within chloroplasts), the absorption spectrum is flattened with respect to the absorption spectrum of the same pigments in solution. 7 The package effect varies non- linearly with cell size and with intracellular pigment concentrations. Therefore the relationship between the spectral absorption coefficients of natu- ral algal populations and the corresponding pigment concentrations is complex and not easily modeled. The Laboratoire d’Océanographie de Villefranche (LOV) has gathered a large set of ocean water sam- ples for which the absorption spectra of phytoplank- ton, associated with detailed pigment concentrations A. Chazottes ([email protected]), M. Crépon, and S. Thiria are with the Laboratoire d’Océanographie et du Climat: Expéri- mentations et Approches Numériques (LOCEAN/IPSL), 4 place Jussieu, 75252 Paris, France. A. Bricaud is with the Laboratoire d’Océanographie de Villefranche, Centre National de la Recherche Scientifique and Université Pierre et Marie Curie, Villefranche- sur-Mer, France. Received 9 March 2006; revised 21 June 2006; accepted 23 June 2006; posted 27 June 2006 (Doc. ID 68672). 0003-6935/06/318102-14$15.00/0 © 2006 Optical Society of America 8102 APPLIED OPTICS Vol. 45, No. 31 1 November 2006

Upload: others

Post on 31-Dec-2019

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistical analysis of a database of absorption spectra ...omtab.obs-vlfr.fr/fichiers_PDF/Chazottes_et_al_AO_06.pdfof phytoplankton and pigment concentrations using self-organizing

Statistical analysis of a database of absorption spectraof phytoplankton and pigment concentrations usingself-organizing maps

Aymeric Chazottes, Annick Bricaud, Michel Crépon, and Sylvie Thiria

We present a statistical analysis of a large set of absorption spectra of phytoplankton, measured innatural samples collected from ocean water, in conjunction with detailed pigment concentrations. Weprocessed the absorption spectra with a sophisticated neural network method suitable for classifyingcomplex phenomena, the so-called self-organizing maps (SOM) proposed by Kohonen [Kohonen, SelfOrganizing Maps (Springer-Verlag, 1984)]. The aim was to compress the information embedded in thedata set into a reduced number of classes characterizing the data set, which facilitates the analysis. Byprocessing the absorption spectra, we were able to retrieve well-known relationships among pigmentconcentrations and to display them on maps to facilitate their interpretation. We then showed that theSOM enabled us to extract pertinent information about pigment concentrations normalized to chlorophylla. We were able to propose new relationships between the fucoxanthin�Tchl-a ratio and the derivative ofthe absorption spectrum at 510 nm and between the Tchl-b�Tchl-a ratio and the derivative at 640 nm.Finally, we demonstrate the possibility of inverting the absorption spectrum to retrieve the pigmentconcentrations with better accuracy than a regression analysis using the Tchl-a concentration derivedfrom the absorption at 440 nm. We also discuss the data coding used to build the self-organizing map.This methodology is very general and can be used to analyze a large class of complex data. © 2006Optical Society of America

OCIS codes: 010.7340, 010.4450, 010.0010.

1. Introduction

In oceanic case 1 waters, the spectral reflectance of theocean is largely determined by the absorption proper-ties of phytoplankton cells present in the upper layer.Therefore under certain assumptions, the absorptioncoefficients of phytoplankton can be derived fromocean color measurements.1–3 In turn, the absorptionspectra of algal populations are strongly dependent ontheir pigment composition, so that much effort hasbeen focused on extracting pigment information fromalgal absorption spectra,4,5 or even directly fromocean color spectra.6 Such methods could provide the

possibility of getting taxonomic information on phy-toplankton from ocean color measured from satellite-borne sensors, which would have importantbiogeochemical applications.

The retrieval of pigment concentrations from ab-sorption spectra remains a difficult inverse problem,because the variability of the absorption spectrum, ata given chlorophyll-a concentration, is dependent notonly on the concentration of accessory pigments butalso on the package effect. This effect accounts forthe fact that, because of the packaging of pigmentswithin phytoplankton cells (and inside cells withinchloroplasts), the absorption spectrum is flattenedwith respect to the absorption spectrum of the samepigments in solution.7 The package effect varies non-linearly with cell size and with intracellularpigment concentrations. Therefore the relationshipbetween the spectral absorption coefficients of natu-ral algal populations and the corresponding pigmentconcentrations is complex and not easily modeled.The Laboratoire d’Océanographie de Villefranche(LOV) has gathered a large set of ocean water sam-ples for which the absorption spectra of phytoplank-ton, associated with detailed pigment concentrations

A. Chazottes ([email protected]), M. Crépon, and S. Thiriaare with the Laboratoire d’Océanographie et du Climat: Expéri-mentations et Approches Numériques (LOCEAN/IPSL), 4 placeJussieu, 75252 Paris, France. A. Bricaud is with the Laboratoired’Océanographie de Villefranche, Centre National de la RechercheScientifique and Université Pierre et Marie Curie, Villefranche-sur-Mer, France.

Received 9 March 2006; revised 21 June 2006; accepted 23 June2006; posted 27 June 2006 (Doc. ID 68672).

0003-6935/06/318102-14$15.00/0© 2006 Optical Society of America

8102 APPLIED OPTICS � Vol. 45, No. 31 � 1 November 2006

Page 2: Statistical analysis of a database of absorption spectra ...omtab.obs-vlfr.fr/fichiers_PDF/Chazottes_et_al_AO_06.pdfof phytoplankton and pigment concentrations using self-organizing

measured by high-pressure liquid chromatography(HPLC) have been determined. These water sampleswere collected during several cruises covering differentparts of the ocean in different seasons and thereforepresent a wide variety of situations. Their propertieshave been analyzed by Bricaud et al.8 who showed thatthe field variations in algal absorption coefficients (at agiven chlorophyll-a content) are determined not onlyby the pigment composition but also by variations inthe predominant cell size in the phytoplankton popu-lations. This biological noise is likely to blur the rela-tionships between algal absorption spectra andpigment concentrations. Similar effects can also resultfrom photoacclimation of populations to various lightconditions (various incident irradiances, or a decreaseof irradiance with depth).

The objective of this paper was to extract pertinentobjective information embedded in the absorptionspectra of this data set from their statistical propertiesonly, without any other a priori knowledge. Owing tothe complexity of this information we propose a newand efficient way of visualizing it and consequently offacilitating its interpretation. We used a method basedon neural networks, the so-called self-organizing map(SOM).9 This method can be considered an automaticanalysis of the statistical properties of a data set. Itallows an objective synthesis of the information in thisdata set. We propose to identify some pertinent clus-ters extracted from the full absorption spectra data setby providing a reduced set of characteristic spectra(the so-called prototype spectra) associated with theclusters. Each prototype spectrum (or class) provides apartition (or classification) of the data set that canbe further associated with biogeophysical parame-

ters of the water samples. Using this classificationwe were able to invert the absorption spectrum ofphytoplankton to retrieve information concerningtheir pigment composition.

In Section 2 we describe the data set and the cod-ing. In Section 3 we briefly present the SOM methodand the maps resulting from the learning of the dataset. Moreover, we show that the complexity of thedata set is well represented by the maps. In Section 4we investigate the possibility of retrieving some pig-ment concentrations by inverting the absorptionspectral values with the help of the self-organizingmap. In Section 5 we present a discussion and ourconclusions.

2. Observations and Data Description

A. Observations

Samples were collected during ten cruises, in variousseasons and various areas of the world’s oceans,between 1990 and 2002. The location of each cruiseand the number of samples are displayed in Table 1.All the data considered in this study correspondedonly to oceanic case 1 waters, i.e., waters for whichthe optical properties are dependent on matter ofbiological origin only and not on terrigenous (partic-ulate or dissolved) substances.

The methods employed for particulate and algalabsorption measurements are described in Refs. 8and 10. In summary, for each sample particulatematter was collected on a 25 mm Whatman glass-fiber filter, and particulate absorption coefficientswere measured stepwise (in 2 nm steps) by spectro-photometry, from 400 to 700 nm. Spectra were mea-

Table 1. Cruises Where Absorption and HPLC Data Were Simultaneously Collected

CruiseNumber Cruises Location

Usual TrophicState Date

Total Numberof Samples

1 TOMOFRONT NorthwesternMediterranean

Mesotrophic April 1990 28

2 EUMELI3 Tropical NorthAtlantic

Oligotrophic,Mesotrophic

October 1991 49

3 FLUPAC Equatorial andSubequatorialPacific

Oligotrophic September–October1994

80

4 OLIPAC Equatorial andSubequatorialPacific

Oligotrophic November 1994 183

5 MINOS Eastern and WesternMediterranean

Oligotrophic May 1996 115

6 ALMOFRONT2 Alboran Sea Mesotrophic December 1997 andJanuary 1998

477

7 PROSOPE Eastern and WesternMediterranean andMorocco UpwellingArea

Oligotrophic,Eutrophic

September–October1999

606

8 POMME North Atlantic Oligotrophic,Mesotrophic

February–May andAugust–October2001

525 � 1571

9 BENCAL Benguela UpwellingArea

Mesotrophic,Eutrophic

October 2002 100

1 November 2006 � Vol. 45, No. 31 � APPLIED OPTICS 8103

Page 3: Statistical analysis of a database of absorption spectra ...omtab.obs-vlfr.fr/fichiers_PDF/Chazottes_et_al_AO_06.pdfof phytoplankton and pigment concentrations using self-organizing

sured directly on the wet filter (by reference to ablank filter), by using the quantitative glass fiberfilter technique (QFT), except for the FLUPAC cruise,for which the glass slide technique (Allali et al.10) wasused. With this latter technique, absorption mea-surements are not affected by the path-length am-plification effect (� effect). When the QFT was used,all the spectra were corrected for the � effect usingthe algorithms given by Allali et al.11 for samplescollected in oligotrophic waters, and by Bricaud andStramski12 for other samples. Finally, the respec-tive contributions of phytoplankton and nonalgalparticulate matter to total particulate absorptionwere determined either experimentally (pigmentextraction by methanol; Kishino et al.13) or by nu-merical deconvolution (Bricaud and Stramski12).

We applied a triangular moving window of size 3 tofilter the noisy part of the spectra. We then sampledeach filtered spectrum every 10 nm. Each spectrumwas therefore a 31-dimension vector. The phyto-plankton absorption spectral coefficients are repre-sented by the symbol a���, where � stands for thewavelength in nanometers.

Pigment concentrations were measured by high-pressure liquid chromatography (HPLC) using theprocedure described by Vidussi et al.14 Up to 20 pig-ments were identified for each sample. For the SOManalysis, all the pigments were grouped into fivemain categories according to their similarity of spec-tral absorption characteristics: (1) chlorophyll-a,divinyl–chlorophyll-a, chlorophyllid-a, and pheopig-ments (pheophytin and pheophorbide) (the sum isnoted as Tchl-a); (2) chlorophyll-b and divinyl–chlorophyll-b (noted as Tchl-b); (3) chlorophyll-c1,chlorophyll-c2, and chlorophyll-c3 (noted as Tchl-c);(4) photosynthetic carotenoids (noted as TPSC), i.e.,fucoxanthin, peridinin, 19=-HF, and 19=-BF; (5) non-photosynthetic carotenoids (noted as TPPC), i.e.,zeaxanthin, diadinoxanthin, alloxanthin, and �-carotene. Two photosynthetic carotenoids, lutein and�carotene, were also included in the latter category,because their absorption spectra are similar to thoseof zeaxanthin and �-carotene, respectively.

For each station, the first optical depth was com-puted as zeu�4.6, where zeu, the depth of the euphoticzone, is the depth at which the photosyntheticallyavailable radiation is reduced to 1% of its value justbelow the sea surface. The depth of the euphoticzone was either determined in the field or computedfrom the chlorophyll profile according to Morel andMaritorena.15

B. Database

The objective of the research discussed in this paperwas to analyze the absorption spectral values a��� ofalgal particulate matter and to try to associate themwith concomitant biogeochemical and physical pa-rameters, such as pigment concentrations or environ-mental factors. In this study we split the absorptionspectra data set into two subsets: one for learning andthe other for validation of the method. One part of thedata set, corresponding to the three POMME cruises,

was much larger than the remaining part containingthe spectra of all the other cruises (see Table 1). Inaddition, waters sampled during these cruises werefound on many occasions to be bio-optically differentfrom other waters with similar chlorophyll-a concen-trations.8 To avoid a possible bias, only 525 randomlyselected samples from the POMME cruises were re-tained in the learning data set. The remaining datafrom these three cruises (1571 samples) were keptapart and used for the validation of the method.

The learning data set, which was used to estimatethe parameters of the statistical model, comprised2163 samples. Hereafter we refer to D when dealingwith the whole data set. L and V will stand for thelearning set and the validation set, respectively.

C. Coding

Coding the data is the first step and plays an impor-tant role in data analysis. As topological maps pro-cess multidimensional data, it becomes possible todescribe each observation by a large set of character-istics, each one introducing a particular item ofknowledge about the phenomenon under study.

As shown by many studies dealing with opticalproperties of ocean waters, absorption is primarilydependent on chlorophyll-a concentration. Once thisfirst-order information has been extracted, there isstill more complex and more subtle information tobe extracted from the data set. To investigate thissecond-order information it is useful to remove thefirst-order information. This is usually done by divid-ing each absorption datum by the correspondingchlorophyll-a concentration. This method is widelyused by the concerned marine scientific communityand has recently been used by Bricaud et al.8 to an-alyze the present data set. It allowed these authors toextract pertinent information from absorption spec-tra and pigment concentrations. Owing to the theo-retical characteristics of the topological map and itshighly discriminative power,16 we decided not to ap-ply the chlorophyll-a normalization to the observa-tions. In fact, more satisfactory results were obtainedusing nonnormalized data than normalized data.Normalization by Tchl-a was introduced after thelearning phase for analyzing the concentration of thedifferent pigments.

As shown in Fig. 1, which displays the weight-specific absorption spectra of the different pigmentswith respect to wavelength, we note that the absorp-tion spectra of the pigments have their maxima atdifferent wavelengths and have different shapes. Con-sequently, the shape of the absorption spectrum of awater sample is related to the different pigment con-centrations. To help the analysis and the retrieval ofthe pigment concentrations from the spectral infor-mation, we decided to use the first derivative of thespectra with respect to the wavelength, which is aproxy of the spectrum shape in statistical process-ing. This coding was inspired by previous work byBidigare et al.17 and Faust and Norris.18,19 Owing tothe large variability (several decades) of the absorp-tion spectral values a���, we decided subsequently to

8104 APPLIED OPTICS � Vol. 45, No. 31 � 1 November 2006

Page 4: Statistical analysis of a database of absorption spectra ...omtab.obs-vlfr.fr/fichiers_PDF/Chazottes_et_al_AO_06.pdfof phytoplankton and pigment concentrations using self-organizing

use log10�a���� values rather than a��� values, and forthe same reasons to use the log-transformed values ofthe pigment concentration as well.

Each water sample spectrum is thus character-ized by a 31-component vector, whose first 30 com-ponents are the spectrum derivatives computed asthe difference between the log10�a���� values for twoconsecutive wavelengths {i.e., log10�a��400�10i��� log10�a��400�10�i�1���, where i � 1, . . . , 30}, and thelast component is the maximum value of the absorp-tion. This latter value is a pertinent piece of informa-tion about chlorophyll-a content and more generallyon the overall pigment content. As absorption coeffi-cients were log-transformed and a few spectra pre-sented null or slightly negative values in the greenpart of the spectrum, we added a small offset of10�4 m�1 to these spectra to have only positive ab-sorption coefficients. Finally, since all the pigmentconcentrations were also log-transformed, we set thezero pigment concentrations arbitrarily to a value of10�4 mg m�3.

3. Classification Method: Self-Organizing MapAlgorithm

A. Introduction

We used an unsupervised classification method toextract pertinent information from the statisticalstructure of the data set without any a priori infor-mation. This method has been extensively describedby Niang et al.20 We now give an outline of it.

The method consists of a statistical model, the so-

called SOM, which was described by Kohonen9 forvisualizing and clustering high-dimensional datasets. The classification aims at summarizing the in-formation contained in the learning data set L byproducing a set of reference vectors rv (syntheticspectra) that are representative of the data. The set ofrv’s represents the data set by compressing the in-formation contained in it. Each rv matches a certainnumber of data of L according to a quadratic distanced2 and defines a class, the matched data being theelements of the class. Each neuron of the SOM isassociated with a particular reference vector rv. Thedifferent neurons of the topological map C are con-nected and determine a topological (neighborhood)relationship among the different neurons (Fig. 2):close neurons on the map represent similar subsets ofdata (classes presenting similarities).

In this study we produced a 2D �n � p� topologicalmap with a large number of neurons �10 � 10 in thisstudy), providing a highly discriminating representa-tion of the observations. The 2163 spectra of L arethus clustered into 100 classes of spectra, each onerepresented by a typical spectrum characterizing theassociated reference vector. We now present a firstanalysis of the SOM using this method and show itscapacity to easily synthesize well-known results.

B. Validation of the Self-Organizing Map

We now analyze the structure of the SOM at the endof the learning phase. Each neuron of the map isassociated with a class. It is represented by a proto-type spectrum (the reference vector rv), which is astatistical mean of all the spectra captured by theneuron. Thus, in the following, we also associate theprototype spectrum with a pigment concentrationthat is defined as the mean of the pigment concen-trations corresponding to all the spectra selected bythe neuron.

Fig. 1. (Color online) Assumed in vivo weight-specific absorptionspectra of the main pigments, asol, i * ��� �in m2 mg�1�, as derivedfrom absorption spectra of individual pigments in solvent. [Thisfigure is taken from Bricaud et al. (Ref. 8).]

Fig. 2. Structure of the SOM. The network comprises two layers:an input layer used to present observations and an adaptationlayer �n � p neurons) for which a neighborhood system (squaresurrounding neuron i) is defined (distance � between neurons andneighborhood function). Each neuron i is associated with a refer-ence vector rvi and is fully connected to the input layer. At eachstep of the learning all the input vectors (x) are attributed to theclosest neuron. Then each neuron rvi is computed according to itsattributed data and its neighbor’s attributed data. We proceeduntil convergence or a fixed number of iterations is reached.

1 November 2006 � Vol. 45, No. 31 � APPLIED OPTICS 8105

Page 5: Statistical analysis of a database of absorption spectra ...omtab.obs-vlfr.fr/fichiers_PDF/Chazottes_et_al_AO_06.pdfof phytoplankton and pigment concentrations using self-organizing

1. Relationship between the Amplitude of theAbsorption Spectrum and the PigmentConcentrationsFigure 3 displays the mean log-transformed absorp-tion spectral data associated with each rv of the SOMand allows us to visualize the topological order pro-vided by the Kohonen map. We note that the SOM iswell organized, the adjacent neurons having verysimilar spectra thanks to the neighborhood procedurementioned previously. The learning phase was suc-cessful, as nearly every neuron captured a significantnumber of observed spectra, as shown in Fig. 3, inwhich the number of spectra captured by each neuronof the SOM is displayed. Each reference vector rep-resents 22 spectra on average (60 maximum and 2minimum) showing that the observations of L arewell distributed over all the reference vectors.

The set of 10 � 10 neurons provided by the SOM isa reduced data set that represents a rational decom-position of L; it is a way to compress the informationof this data set by extracting the most pertinent spec-tra. It is expected to be easier to extract significantobjective characteristics from this reduced data setthan from the whole data set. It also enables us toeasily visualize the main trends in the data setlearned by the map.

First we note that the mean amplitude of the spec-tra decreases from the top-left corner (where it isat its maximum) down to the bottom-right corner

(where it is at its minimum). This variation can beschematized by the first diagonal of the SOM fromthe top-left corner to the bottom-right corner. There-fore the classification can order the spectra accordingto their amplitude. In Subsection 3.B.2 we shall seethat more complex and more subtle information isembedded in this map, but various shapes of spectraare already apparent.

Figure 4 shows the spectra associated with somereference vectors. Each spectrum is displayed with itsassociated error bars (�2 standard deviations). Wealso added the different spectra of L captured byeach neuron. All the spectra captured by a neuronmay differ in amplitude but have generally similarshapes. The shape and the amplitude vary drasticallyamong the reference vectors. The SOM is thus or-dered with respect to the amplitude and the shape ofthe spectra.

Figure 5 shows the log-transformed concentrationsof the five different groups of pigments associatedwith the neurons of the SOM. The concentrations ofTchl-a, TPSC, Tchl-c, and TPPC decrease from thetop-left corner to the bottom-right corner. These vari-ations in pigment concentrations are in agreementwith the variations in the amplitude of the spectra[Fig. 5(f)]. The concentrations of Tchl-a and of TPSCappear to be closely correlated with the spectrummean amplitude, which suggests a strong link be-tween the spectrum amplitude and the concentra-

Fig. 3. Representation of the reference vectors rv corresponding to log-transformed phytoplankton absorption spectra of the SOM. Theneuron number goes from top to bottom and left to right. The arrow shows the spectrum amplitude gradient. The number displayed oneach spectrum is the amount of data captured by the neuron. (All the plots have the same axis scale.)

8106 APPLIED OPTICS � Vol. 45, No. 31 � 1 November 2006

Page 6: Statistical analysis of a database of absorption spectra ...omtab.obs-vlfr.fr/fichiers_PDF/Chazottes_et_al_AO_06.pdfof phytoplankton and pigment concentrations using self-organizing

tions of these two pigments. This link also exists forTchl-c and TPPC, but it is less pronounced. The con-centration gradient can be schematized by the firstdiagonal of the SOM (from the top left to the bottomright) showing the decrease in pigment richness ofthe water, which is first-order information embedded

in the map and consistent with previous studies,21

which showed that the chlorophyll content is the firstdeterminant of the amplitude of absorption spectra.Moreover, the Tchl-a concentration on the SOM alsovaries slightly on a line perpendicular to the Tchl-aconcentration gradient, which, we believe, is the

Fig. 4. Mean absorption spectrum �m�1� (rv spectrum) versus wavelength for some neurons of the SOM. The thick full curves representthe mean spectrum (rv spectrum) and the thick dotted curves represent the mean plus or minus two standard deviations; the thin curvesin between represent the spectra captured by the neuron.

Fig. 5. (Color online) (a)–(e) Mean of the pigment concentrations (on a logarithmic scale) of the neurons of the SOM for the five pigmentclasses. The color scale is different for each pigment. (f) Mean maximum amplitude for each neuron. Panels (a)–(e) show the strong linkbetween accessory pigments (TPSC, TPPC) and Tchl-a concentrations.

1 November 2006 � Vol. 45, No. 31 � APPLIED OPTICS 8107

Page 7: Statistical analysis of a database of absorption spectra ...omtab.obs-vlfr.fr/fichiers_PDF/Chazottes_et_al_AO_06.pdfof phytoplankton and pigment concentrations using self-organizing

second-order information embedded in the map, thefirst-order information being along the chlorophyll-agradient. The behavior of the Tchl-b concentration isdifferent. It is roughly maximum along a line parallelto the second diagonal (bottom left to top right) andminimum on each side of it, which is coherent withthe second-order information perpendicular to thefirst diagonal representing the first-order informa-tion.

From this analysis we note that the map dividesthe five groups of pigments into three categories ac-cording to the patterns displayed by the SOM. Thefirst group concerns Tchl-a and the TPSC; the secondgroup, Tchl-c and the TPPC; and the third group,Tchl-b. The first two groups have similar properties:their concentrations are well correlated with the am-plitude of the spectrum. The map, which was learnedonly with the spectrum, captured not only the first-order information embedded in the spectrum, butalso more subtle information because of the pigmentcomposition (see Subsection 3.B.2).

The region above the first diagonal (i.e., the diag-onal joining the top-left corner to the bottom-rightcorner) corresponds mainly to water samples takenbelow the first optical depth. The zone below thisdiagonal contains most of the first-optical-depth sam-ples mixed with some deeper ones, as we have notedpreviously, whose spectra are similar in shape and

amplitude. The division above and below the firstdiagonal divides deep and shallow samples.

2. Variations in the Accessory-Pigment�Tchl-aRatios with Respect to Spectral AbsorptionAt the first-order information level, the concentra-tions of accessory pigments are related to that ofTchl-a (Fig. 5). To remove the strong influence of theTchl-a concentration in the organization of the SOM,we studied the variations in the concentrations ofaccessory pigments normalized to Tchl-a. Figure 6shows the different pigments-to-Tchl-a ratios for theneurons of the SOM. These ratios display well-identified patterns on the SOM, which are differentfrom those in Fig. 5. This shows the ability of theSOM method to capture second-order informationfrom the data set. In particular, we noted the varia-tions in a direction orthogonal to the Tchl-a gradient.Indeed, the maps are no longer symmetrical withrespect to the first diagonal. Strong differences areevident between the surface neurons and the deeperones [compare Fig. 6(a) with Fig. 6(b), which displaysonly neurons above the first optical depth]. Further-more there are clearly identifiable patterns specific toeach pigment ratio in Fig. 6. Figures 6(a) and 6(e), forinstance, show that neurons corresponding to highTchl-b�Tchl-a and TPPC�Tchl-a ratios are groupedabove and below the gray line (separating the surface

Fig. 6. (Color online) (a)–(e) Ratios of the concentrations of the four pigment classes to the Tchl-a concentration for all the neurons of theSOM. The color scale is different for each pigment class. The ellipses enclose clear patterns for each of the pigment ratios. The neuronsabove the gray line roughly correspond to neurons associated with deep samples. The neurons below the gray line roughly correspond toneurons associated with samples in the “first optical depth.” (b) Tchl-b�Tchl-a ratios for neurons of the first optical depth.

8108 APPLIED OPTICS � Vol. 45, No. 31 � 1 November 2006

Page 8: Statistical analysis of a database of absorption spectra ...omtab.obs-vlfr.fr/fichiers_PDF/Chazottes_et_al_AO_06.pdfof phytoplankton and pigment concentrations using self-organizing

and deep neurons), respectively. This demonstrates,consistently with previous field studies,11,22 that highTchl-b�Tchl-a ratios are found preferentially in deepwater, whereas high TPPC�Tchl-a ratios are foundnear the sea surface. Figures 6(c) and 6(d) have somesymmetrical patterns with respect to the first diago-nal, showing that high Tchl-c�Tchl-a and TPSC�Tchl-a ratios may be found both near the surface andin deeper water.

The above analysis showed the ability of theSOM method to retrieve well-known features. Theweight of the map learned with phytoplankton absorp-tion spectra clearly contains first- and second-orderinformation consistent with the pigment compositionvariations shown in previous studies.8 It must beemphasized here that this information is derivedfrom the shape and amplitude of the absorption spec-tra only, without any ancillary data. Nevertheless itseems difficult to extract more information on thestructure of the data set from the SOMs as they arepresented in Figs. 5 and 6. We suggest refining theanalysis by looking at potential links between newvariables. In the following section we investigatelinks between the derivatives of the absorptionspectrum at specific wavelengths and some individ-ual pigment concentrations.

4. Exploitation of the Self-Organizing Map

The above results have shown that the SOM methodapplied to the absorption spectrum derivatives pro-vided well-organized topological maps with respect tothe concentrations of the various pigments and to thepigment ratios. They allowed us to show actual rela-tionships between pigment concentrations and theamplitude and shape of absorption spectra, some ofwhich are already known from previous field studies.We applied the SOM method to reveal new relation-ships between some pigment ratios in a water sampleand the derivative of the absorption spectrum of thephytoplankton in this sample.

We examined the derivatives of the rv spectra withrespect to the wavelength and drew 30 maps (Fig. 7)corresponding to the 30 derivative values of the rvspectra with respect to the wavelength, each maprepresenting the derivative values at a specific wave-length of the 100 reference spectra of the SOM. Thederivatives were computed for the log10 phytoplank-ton absorption spectra as

log10�aj� � log10�aj�10� � log10�aj�aj�10�, wherej � �400 � 10i� andi � 1, 2, 3, . . . , 30. (1)

Fig. 7. (Color online) Thirty maps of the thirty and one maps of the rvs. We have the 30 derivatives. Only the amplitude maximum ismissing (shown in Fig. 5).

1 November 2006 � Vol. 45, No. 31 � APPLIED OPTICS 8109

Page 9: Statistical analysis of a database of absorption spectra ...omtab.obs-vlfr.fr/fichiers_PDF/Chazottes_et_al_AO_06.pdfof phytoplankton and pigment concentrations using self-organizing

Some of these maps present well identified patternsat specific wavelengths, which can be related to someof the pigment–ratio maps, suggesting a linear rela-tionship between the pigment ratio and the deriva-tive values of the spectrum at given wavelengths.

A. Fucoxanthin�Tchl-a Concentration Ratio

The fucoxanthin�Tchl-a concentration ratio providesinformation about the phytoplankton cell size struc-ture,23 since fucoxanthin is the main carotenoid ofdiatoms, which are usually large cells. It must beemphasized, however, that ambiguities in this ap-proach may occur, as fucoxanthin may also be foundin some prymnesiophytes, chrysophytes, and pelago-phytes. The SOM of the rv fucoxanthin�Tchl-a con-centration ratio shows patterns that are similar tothose of the derivative of the log10 of the absorption at510 nm computed as log10�a510�a500� (Fig. 8). In Fig. 9,the scatterplot between the two quantities suggeststhat the fucoxanthin�Tchl-a ratio is actually corre-lated with this derivative. We computed both a linearand a log-linear regression from the original data set(2163 samples). The most accurate fit was providedby the log-linear relationship

fucoxanthin�Tchl-a � 1.311 �a510�a500�8.19 (2)

�R2 � 0.59, s � 0.284; see the definitions of thesecoefficients in Appendix A.)

The log-linear regression is illustrated in Fig. 9.The dots display the observations for the individualsamples, and the crosses, the different reference vec-tors (which are well spread among the samples, in-dicating that the reference vectors summarize theobservations well).

The above-mentioned regression suggests that it ispossible to get a rough estimate of the fucoxanthin�Tchl-a ratio from the derivative of the logarithm of theabsorption spectrum at 510 nm. We neverthelessnote that the dispersion is quite high for the verysmall values of the fucoxanthin�Tchl-a ratio. This canbe explained by the fact that, for these small ratiovalues, the relationship may depend on other pig-ments. The topological map is able to take this depen-dence into account. The above-mentioned relationshipcould hardly have been derived from heuristic argu-ments based on the specific pigment absorption spec-tra displayed in Fig. 1 in which the fucoxanthinabsorption is maximum at � � 490 nm and the fu-coxanthin absorption derivatives are maximum at� � 535 nm and � � 410 nm. Regressions with re-spect to the absorption at � � 490 nm are not signif-icant, probably because the interference of otherpigments is much stronger at this wavelength (seeFig. 1). The SOM allows us to determine visually themost significant spectrum derivative according to theconstraints due to the influence of the other pig-ments. This visual determination is more objectivelyconfirmed by computing the correlation on the map.We note that 510 nm is just in the middle of theinterval 488–532 nm, which corresponds to the wave-length interval for which Eisner et al.24 computed theslope of the absorption spectra of particles andshowed that it was linearly related to the TPPC�TPSC ratio.

B. TPPC�TPSC Concentration Ratio

The TPPC�TPSC ratio has been studied by Eisner etal.24 in surface water �0–20 m depth range) samplestaken off the Oregon coast in June 1998. The authorsshowed a strong linear relationship between this ra-tio and the normalized absorption spectrum slope ofapproximately � � 510 nm, computed as �a488� a532���a676�488 � 532��. By computing a proxy ofthis slope for the rv spectra of the SOM and theTPPC�TPSC ratio associated with the rvs, we founda stronger correlation between this slope and the log-transformed TPPC�TPSC ratio than that given byEisner et al.24 This correlation is illustrated in Fig.10. It is improved if we consider samples only fromthe first optical depth (between 2 and 25 m). In thatcase we found a correlation coefficient of 0.71 (insteadof 0.58), allowing us to estimate an empirical rela-tionship between these two quantities. In the physi-cal space, the logarithm relationship becomes

�a490 � a530���a680�490 � 530��� �0.0416 � 0.0262 � log10�TPPC�TPSC�. (3)

Fig. 8. (Color online) (a) Fucoxanthin�Tchl-a concentration ratiosfor the neurons of the SOM. (b) Derivative at 510 nm of the log10

absorption spectra. The correspondence between the patterns dis-played by the two maps suggests a strong link between the twovalues.

Fig. 9. Fucoxanthin�Tchl-a ratio as a function of the derivative ofthe log10 of the absorption spectrum of phytoplankton at 510 nm.The water sample data are represented by dots (·). The referencevector data (provided by the SOM map) are represented by crosses(�). The regression line linking the two quantities is shown as asolid line (—). Two estimators of the quality of the regression areprovided �s � 0.284, R2 � 0.591�.

8110 APPLIED OPTICS � Vol. 45, No. 31 � 1 November 2006

Page 10: Statistical analysis of a database of absorption spectra ...omtab.obs-vlfr.fr/fichiers_PDF/Chazottes_et_al_AO_06.pdfof phytoplankton and pigment concentrations using self-organizing

This relationship is illustrated in Fig. 11, whichshows the scatterplot of the two quantities. This re-sult may be considered as an extension of the work ofEisner et al.,24 who processed a small amount of data(32) whose characteristics were similar. Moreover,the Eisner et al. TPPC�TPSC ratio ranged between0 and 1, whereas in our data set it ranges between0 and 10. Even if the determination coefficient ob-tained in our study is smaller than that obtained byEisner et al.24 (0.93), the confidence level is similar,owing to the fact that the sample size in the presentdata set is much higher. Note also that the similaritybetween the variations in the fucoxanthin�Tchl-a andthe TPPC�TPSC ratios with respect to the absorptionspectrum slope is due to the fact that fucoxanthin isusually the predominant pigment in the TPSC.

C. Tchl-b�Tchl-a Concentration Ratio

We now compare the Tchl-b�Tchl-a ratio to the termlog10�a650�a640� (Fig. 12). There is a strong similaritybetween the two maps, suggesting the existence of asignificant relationship between the two quantities.The correlation coefficient between the Tchl-b�Tchl-aratio associated with the reference vectors and theabsorption derivative at � � 650 nm of the referencevectors is 0.82. Owing to the number of data used in

the computation of the correlations (100), these cor-relations are highly significant, still suggesting apotential linear relationship between these vari-ables. This led us to determine, as before, a log-linearrelationship of the following form in physical space:

Tchl-b�Tchl-a � 0.019 �a650�a640�13.33

with �R2 � 0.28, s � 0.84�. (4)

This linear regression poorly models the relationshipbetween the Tchl-b�Tchl-a concentration ratio andthe derivative of the logarithm of the absorption spec-trum at 650 nm [Fig. 13(a)]. This is confirmed by Fig.13(a), which shows the regression line, the observa-tions, with dots, and the quantities associated withthe reference vectors with crosses.

Three reference vectors (crosses) corresponding toa Tchl-b�Tchl-a ratio smaller than 10�2 are far fromthe regression line. The physical characteristics ofthese three reference vectors are given in Table 2.The Tchl-b concentration was 10�4 mg m�3, whichcorresponds to values of Tchl-b equal to zero arbi-trarily set to 10�4 mg m�3 to avoid zero values, asmentioned in Section 2. The Tchl-c concentrations arealso very low. All the spectra captured by these threeneurons were sampled just below the first opticaldepth during the PROSOPE cruise. Figure 5 showsthat these three neurons are close together and forman isolated pattern on the SOM [lighter neurons atthe top of Fig. 5(c)] far away from the large patterncorresponding to low concentration values of Tchl-band Tchl-c near the bottom-right corner, which is

Fig. 10. (Color online) (a) Values of log10(TPPC�TPSC) for theSOM neurons. (b) Absorption spectrum slope computed as�a490 � a530���a680�490 � 530��. A correspondence between the pat-terns displayed by the two maps suggests a strong link betweenthe two quantities.

Fig. 11. Absorption ratio of Eisner et al.24 �a490 � a530���a680�490 � 530�� as a function of the log10 of the TPPC�TPSC ratio,for the samples collected in the first optical depth. The watersample data are represented by dots (·). The reference vector data(provided by the SOM map) are represented by crosses (�). Thelog-linear regression line linking the two quantities is shown as asolid line (—). Two estimators of the quality of the regression aredisplayed �s � 0.011, R2 � 0.515).

Fig. 12. (Color online) (a) Tchl-b�Tchl-a concentration ratios forthe SOM neurons. (b) Derivative of the log10 of the absorptionspectra at 650 nm. A correspondence between the patterns dis-played by the two maps suggests a strong link between the twoquantities.

Table 2. Characteristics of Three Outliers Neurons

Pigment Concentrationa

(mg m�3)NeuronNumber Tchl-a Tchl-b Tchl-c TPSC TPPC

31 0.3231 0.0001 0.0001 0.2210 0.020832 0.8815 0.0001 0.0106 0.5958 0.033641 0.3546 0.0001 0.0001 0.2528 0.0343

aPigment concentrations associated with the three reference vec-tors considered as potential outliers in the Tchl-b�Tchl-a ratioanalysis. The Tchl-b and two Tchl-c concentrations are nearlyequal to zero (10�4 mg m�3).

1 November 2006 � Vol. 45, No. 31 � APPLIED OPTICS 8111

Page 11: Statistical analysis of a database of absorption spectra ...omtab.obs-vlfr.fr/fichiers_PDF/Chazottes_et_al_AO_06.pdfof phytoplankton and pigment concentrations using self-organizing

associated with samples of low Tchl-a content takenat depth and coming from the same cruise. Besidesthe zero Tchl-b concentration, the slope of the absorp-tion spectrum is dependent on other pigment concen-trations. It implies that the noise caused by otherpigments plays a strong role in determining the valueof the derivative that is no longer related to the Tchl�Tchl-a ratio. These data can thus be considered asoutliers with respect to the regression.

We removed the 19 samples associated with thesereference vectors and determined a new regressionline similar to the one above:

Tchl-b�Tchl-a � 0.090 �a650�a640�6.838, (5)

with an improved goodness of fit coefficient �R2 �0.531� and a smaller s �s � 0.246�.

The skill of the above regression is now improvedand is presented in Fig. 13(b). Moreover, the relation-ships are similar when limiting the analysis to thefirst-optical-depth data. In these three examples, weshow how we can use the SOM algorithm to establishnew links between the features of the absorptionspectra of phytoplankton and the pigment ratios andhow to improve them by analyzing the large cluster-ing of the data provided by the SOM. It is well knownthat because of the package effect the relationshipslinking the logarithm of the derivative of the spec-trum and pigment concentrations are not strictly lin-ear. This is confirmed by the values of R2 and s for thevarious regression analyses. Progress in understand-

ing and establishing new links among these above-mentioned variables implies the use of methods moresophisticated than linear regression. In Subsection4.D we compare the performance of the SOM methodwith those of linear regressions.

D. Retrieval of the Pigment Composition from theSpectral Characteristics of Absorption

In Subsection 4.C we showed that visualizing thereference vectors of the SOM allowed us to establishrelationships between the optical and the biologicalproperties. Figures 8, 10, and 12 suggest that theretrieval of pigment concentration information fromoptical properties is a possible task but more complexthan the simple regressions we tested, if we need toobtain a high accuracy in the inverted data. Nonlin-earity and the combination of several variables mayplay an important role. The SOM method offers us away to overcome these difficulties by using its classi-fication capabilities that can model multivariate andnonlinear processes.

The inversion procedure is as follows. The inver-sion of the phytoplankton absorption spectrum of agiven sample is processed by the SOM algorithm andis associated with a specific reference vector. Thenthe pigment concentrations of this reference vectorare attributed to the sample. The proposed method issimilar to the analogous methodology employed inmeteorology for weather forecasting.25,26 We testedthis procedure on the learning and validation setsdefined in Section 2. The results of the inversion are

Fig. 13. Tchl-b�Tchl-a ratio as a function of the slope of the log10 of the absorption spectrum at 510 nm. The water sample data arerepresented by dots (·). The reference vector data (provided by the SOM map) are represented by crosses (�). Log-linear regression linkingthe two values is shown by a solid line (—). s and R2, two estimators of the quality of the regression are displayed. In (a) all the data areprocessed �s � 0.842, R2 � 0.277). In (b) the outliers are removed, which greatly improves the quality of the regression �s � 0.246, R2

� 0.531).

8112 APPLIED OPTICS � Vol. 45, No. 31 � 1 November 2006

Page 12: Statistical analysis of a database of absorption spectra ...omtab.obs-vlfr.fr/fichiers_PDF/Chazottes_et_al_AO_06.pdfof phytoplankton and pigment concentrations using self-organizing

presented in Table 3 for the learning and the valida-tion data sets. The error performances are given interms of relative root-mean-square error (RRMSE)for each pigment. We have also computed the vectorRRMSE (VRRMSE), which is a multivariate estima-tor taking into account the estimation of the error ofthe pigment concentration vector globally and notseparately for each vector. The mathematical expres-sions of the RRMSE and the VRRMSE are given inAppendix B.

To assess the efficiency of the SOM approach, wealso compared the SOM performances to a rough in-version given by a classical log-linear regression be-tween the different pigment concentrations and theTchl-a content. The Tchl-a content was estimatedfrom the absorption coefficient of phytoplankton at440 nm using a log-linear regression similar to thatproposed by Bricaud et al.8 As suggested by the re-sults in Subsection 3.B, the other pigment concentra-tions were estimated by a linear regression betweenthe estimated Tchl-a concentration and that of eachpigment. These empirical linear functions were cali-brated on the learning data set. These regressionshave been computed only to provide threshold valuesfor the SOM retrieval to assess the confidence in theretrieval procedure.

The SOM retrieval gives a better statistical esti-mator for each pigment than those computed fromthe linear regressions. The improvement is meaning-ful for the VRRMSE, which is a multivariate estima-tor taking into account the pigment concentrationestimation globally and not separately for each vec-tor, meaning that a link exists between the retrieval

of the different pigments. This confirms the well-known fact that the relationship between pigmentsand absorption spectra is better modeled by a multi-variate nonlinear procedure, such as SOM, than by aunivariate linear procedure. Improving the pigmentretrieval should imply the use of methods other thanlinear regression, which are able to take into accountthe nonlinearity and the multiple dimensions of theproblem. A major advantage of the SOM approach isthat it allows us to retrieve the pigments globallywithout any a priori hypothesis with the same algo-rithm. One could imagine more efficient algorithms,each one being specific to a particular pigment.

5. Discussion and Conclusions

By processing a large data set of phytoplanktonicabsorption spectra with the SOM algorithm, wewere able to show pertinent information betweenthe pigment concentrations of the water samplesand the absorption spectrum values. When westarted this work, an important question was raisedon the coding of the absorption data set. To relatepigment composition variations with phytoplank-ton species, analysis of the absorption spectrum isusually done by normalizing it to the Tchl-a con-centration or to the absorption at � � 440 nm of thesamples to remove the effect of Tchl-a, which is im-portant and can hide that of the other pigments. Weused nonnormalized spectra. To find the optimumcoding for processing the SOM algorithm, we testedseveral SOMs learned with different coding for theretrieval of the pigment concentrations as describedin Subsection 4.D. The results of this sensitivity

Table 4. Error Performances for Various Codings

Coding of Spectra Used to Learn the Topological MapCoding

Number

Pigment Concentration Error Performance RMSEa

Tchl-a Tchl-b Tchl-c TPSC TPPC

Spectrum 1 0.726 0.042 0.123 0.544 0.1002Log10 (Absorption Spectrum) 2 0.849 0.042 0.139 0.611 0.1013Spectrum Divided by the Chlorophyll-a Content 3 1.132 0.054 0.178 0.764 0.1608Derivatives of log10 (Absorption Spectrum) and log10

(Maximum Amplitude)4 0.728 0.037 0.123 0.524 0.0923

aRMSE performances (see Appendix B for the definition) for the pigment retrieval using different SOMs educated with different codingsof the spectrum. The error performances are computed for the different pigment concentrations to determine which coding best representsthe pigment composition information embedded in the spectrum. The best performance for each pigment concentration is shown inboldface. The result of this sensitivity study shows that coding 4 gives the best performances.

Table 3. Error Performances on Pigment Retrieving

RRMSEa (%)

Retrieving Algorithm Tchl-a Tchl-b Tchl-c TPSC TPPC VRRMSE

SOMLearn 36.7 70.5 73.3 128.9 57.8 27Validation 38.8 70.9 43.2 74.8 84.6 24

Least-squares fitLearn 41.0 90.3 77.3 124.1 76.0 45Validation 49.5 74.6 58.5 103.6 95.4 36

aRRMSE and VRRMSE performances (see Appendix B for definitions) expressed as a percentage of the SOM and of the least-squaresfit, for the learning (L) and validation (V) data sets. The best skill values are boldface.

1 November 2006 � Vol. 45, No. 31 � APPLIED OPTICS 8113

Page 13: Statistical analysis of a database of absorption spectra ...omtab.obs-vlfr.fr/fichiers_PDF/Chazottes_et_al_AO_06.pdfof phytoplankton and pigment concentrations using self-organizing

study are presented in Table 4. We computed errorperformances when the coding is the raw spectrum,the log10 of the absorption spectrum, the spectrumdivided by the chlorophyll-a concentration and, fi-nally, the log10 of the derivatives of the absorptionspectra plus the log10 of the maximum amplitude ofthe spectrum. The best performance for the pigmentconcentration retrieval was given for the last coding(coding number 4 in Table 4), which is the coding weused in the present study. We note that this coding isa normalized quantity since it can be written aslog10�a��j��a��j�1��.

The SOM algorithm decomposed the set of absorp-tion spectra into a reduced number of prototypereference vectors (rvs) by considering the specific sta-tistical characteristics of the rvs for the reduced dataset derived from the whole data set. It is easier toextract significant objective characteristics from thereduced data set rvs than from the whole data set.

The data processing presented in this paper wascarried out by using the information embedded onlyin the absorption spectra. The physical and biogeo-chemical parameters helped us to interpret the classi-fication of the spectra but were not used as inputs tothe classification. First we validated the SOM algo-rithm by showing that it was able to retrieve well-established relationships corresponding to first-ordervariations in the absorption spectrum, i.e., covaryingwith the chlorophyll-a concentration. It also allowed usto expose more complex second-order patterns corre-sponding to variations in the accessory-pigment�chlorophyll-a ratios.

We then showed the possibility of associating thevariability of some pigment concentrations with thederivative value (or slope) of the log of the absorptionspectrum. By examining the SOMs of the spectralderivatives, we were able to propose two simple em-pirical models, one relating the fucoxanthin�Tchl-aratio to the spectral derivative at 510 nm, the otherrelating the Tchl-b�Tchl-a ratio to the spectral deriv-ative at 640 nm. These new empirical laws couldhardly have been derived by reasoning alone, as dis-cussed in Subsection 3.A for the fucoxanthin�Tchl-aratio. The optimal wavelength for the derivative wasfound by carefully inspecting the maps of the deriv-atives provided by the SOM algorithm and comparingthem to the map of the fucoxanthin�Tchl-a ratio.

We also revisited the Eisner et al.24 relationship,showing the existence of a linear relationship be-tween the slope of the spectrum at approximately510 nm and the log10(TPPC�TPSC), rather than withthe TPPC�TPSC ratio as argued by Eisner et al.24

This was possible because of the wider range of theTPPC�TPSC ratio in the present data set.

The SOM method allowed us to retrieve the pig-ment concentrations with respect to the spectrumcharacteristics. To our knowledge, this challengingproblem has never been solved in a global manner.The SOM was able to retrieve the concentrations ofthe main groups of pigments with reasonable accu-racy. To assess its performance, we compared it withthose given by a rough pigment retrieval estimation

from Tchl-a by using a regression for which Tchl-awas obtained from a440. These regressions are usedonly to provide benchmarks for statistical estimatorvalues to assess the skillfulness of the SOM method.The SOM retrieval gives better statistical estimatorvalues for each pigment than those computed fromlinear regressions. A major advantage of the SOMmethod is that it allows us to retrieve the pigmentsglobally without any a priori hypothesis with thesame algorithm. Improving the pigment retrievalshould imply the use of methodologies speci-fic to each pigment, such as multilayer perceptrons,which are other types of neural network well adaptedfor inverting nonlinear processes.27

Since specific pigment concentrations characterizethe phytoplankton species and modulate the spectralvariation in marine reflectance,6 it is expected thatthis work will contribute to identify some phyto-planktonic species from space by processing the sig-nal received by multispectral ocean color sensors.This methodology is general and can be used to an-alyze a large class of complex data.

Appendix A. Linear Regression

To establish the quality of a linear or log-linear rela-tionship, two values, R2 and s, are usually calculated.We consider a data set of n samples �xi, yi

observed�. Thedetermination coefficient R2, which is referred to asgoodness of fit, represents that part of the variation iny explained by x. It is given by

R2 �� �yi

estimated �y��2

� �yiobserved �y��2 , (A1)

where y� is the mean of the yiobserved; s, which is re-

ferred to as the rms, is an estimation of the error ony and is given by

s � �� �yiobserved � yi

estimated�2

n � 2 �1�2

. (A2)

Appendix B. Error Performances

The relative performances were calculated after dis-carding the xi

observed values under 0.05 for which rel-ative error performances are not appropriate. A smallvariation is still a small variation, even if it is a largepercentage.

Let us consider a variable xi, i � 1, 2, 3, . . . , N.The RMSE is defined as

RMSE � 1N �

i�1

N

�xiestimated�xi

observed�21�2

. (B1)

The RRMSE is defined as

RRMSE � 1N �

i�1

N �xiestimated�xi

observed�2

�xiobserved�2 1�2

, (B2)

8114 APPLIED OPTICS � Vol. 45, No. 31 � 1 November 2006

Page 14: Statistical analysis of a database of absorption spectra ...omtab.obs-vlfr.fr/fichiers_PDF/Chazottes_et_al_AO_06.pdfof phytoplankton and pigment concentrations using self-organizing

and the VRRMSE of dimension M is defined as

VRRMSE �1N �

j�1

N 1M �

i�1

M �xj,iestimated�xj,i

observed�2

�xj,iobserved�2 1�2

.

(B3)

The VRRMSE computes an estimate of the relativevector error in the following manner. Let us definethe pigment concentration vector whose componentsare different pigment concentrations. Let us define avector error, which is the sum of the relative error ofthe component of the pigment concentration error.The VRRMSE is the mean of the vector errors. TheVRRMSE takes into account a potential link betweenthe different components of the vector.

This work is a contribution to the European NeuralAlgorithms for Ocean Color project (EVG1-CT-2000-00034) and to several projects that have been fundedby the Processus Biogéochimiques dans l’Océan etFlux national program (EUMELI, EPOPE, FRON-TAL, PROSOPE, POMME). The authors thank thechief scientists of the cruises during which the in situdata were collected and H. Claustre, J. Ras, K.Oubelkheir, K. Allali, C. Cailliau, J. C. Marty, N. Sa-doudi, D. Tailliez, and F. Vidussi, who participated insample collection, HPLC measurements, or absorptionmeasurements. The authors also thank the two anon-ymous reviewers and H. Claustre for their help andsuggestions.

References1. C. S. Roesler and M. J. Perry, “In situ phytoplankton absorp-

tion, fluorescence emission, and particulate backscatteringspectra determined from reflectance,” J. Geophys. Res 100,22767–22767 (1995).

2. Z. Lee, K. L. Carder, C. D. Mobley, R. G. Steward, and J. S.Patch, “Hyperspectral remote sensing for shallow waters: I. Asemianalytical model,” Appl. Opt. 37, 6329–6338 (1998).

3. A. M. Ciotti and A. Bricaud, “Retrievals of a size parameter forphytoplankton and spectral light absorption by colored detritalmatter from water-leaving radiances at SeaWiFS channels in acontinental shelf region off Brazil” Limnol. Oceanogr. Methods4, 237–253 (2006).

4. N. Hoepffner and S. Sathyendranath, “Determination of themajor groups of phytoplankton pigments from the absorptionspectra of total particulate matter,” J. Geophys. Res. 98,22789–22803 (1993).

5. V. Stuart, S. Sathyendranath, T. Platt, H. Mass, and B. D.Irwin, “Pigments and species composition of natural phyto-plankton populations: effect on the absorption spectra,” J.Plankton Res. 20, 187–217 (1998).

6. S. Alvain, C. Moulin, Y. Dandonneau, and F. M. Bréon, “Remotesensing of phytoplankton groups in case 1 waters from globalSeaWiFS Imagery,” Deep-Sea Res. Part I 52, 1989–2004 (2005).

7. A. Morel and A. Bricaud, “Theoretical results concerning lightabsorption in a discretet medium, and application to specificabsorption of phytoplankton,” Deep-Sea Res. Part A 28, 1375–1393 (1981).

8. A. Bricaud, H. Claustre, J. Ras, and K. Oubelkheir, “Naturalvariability of phytoplankton absorption in oceanic waters: in-fluence of the size structure of algal population,” J. Geophys.Res. 109, C11010, doi:10.1029/2004JC002419 (2004).

9. T. Kohonen, Self-Organizing Maps (Springer-Verlag, 1984).10. K. Allali, A. Bricaud, M. Babin, A. Morel, and P. Chang, “A new

method for measuring spectral absorption coefficients of ma-rine particles,” Limnol. Oceanogr. 40, 1526–1532 (1995).

11. K. Allali, A. Bricaud, and H. Claustre, “Spatial variations inthe chlorophyll-specific absorption coefficients of phytoplank-ton and photosynthetically active pigments in the equatorialPacific,” J. Geophys. Res. 102, 12413–12423 (1997).

12. A. Bricaud and D. Stramski, “Spectral absorption coefficientsof living phytoplankton and non-algal biogenous matter: acomparison between the Peru upwelling area and the SargassoSea,” Limnol. Oceanogr. 35, 562–582 (1990).

13. M. Kishino, M. Takahashi, N. Okami, and S. Ichimura, “Esti-mation of the spectral absorption coefficients of phytoplanktonin the sea,” Bull. Mar. Sci. 37, 634–642 (1985).

14. F. Vidussi, H. Claustre, J. Bustillos-Guzman, C. Cailliau, andJ. C. Marty, “Rapid HPLC method for determination of phyto-plankton chemotaxonomic pigments: separation of chlorophylla from divinyl-chlorophyll-a, and zeaxanthin from lutein,” J.Plankton Res. 18, 2377–2382 (1996).

15. A. Morel and S. Maritorena, “Bio-optical properties of oceanicwaters: a reappraisal,” J. Geophys. Res. 106, 7763–7780 (2001).

16. A. Niang, F. Badran, C. Moulin, M. Crepon, and S. Thiria,“Retrieval of aerosol type and optical thickness over the Medi-terranean from SeaWiFS images using an automatic neuralclassification method,” Remote Sens. Environ. 100, 82–94(2006).

17. R. R. Bidigare, J. H. Morrow, and D. A. Kiefer, “Derivativeanalysis of spectral absorption by photosynthetic pigments inthe western Sargasso Sea,” J. Mar. Res. 47, 323–341 (1989).

18. M. A. Faust and K. H. Norris, “Rapid in vivo spectrophotomet-ric analysis of chlorophyll pigments in intact phytoplanktoncultures,” Br. Phycol. J. 17, 351–361 (1982).

19. M. A. Faust and K. H. Norris, “In vivo spectrophotometricanalysis of photosynthetic pigments in natural populations ofphytoplankton,” Limnol. Oceanogr. 30, 1316–1322 (1985).

20. A. Niang, S. Thiria, F. Badran, and C. Moulin, “Automaticneural classification of ocean colour reflectance spectra at thetop of the atmosphere with introduction of expert knowledge,”Remote Sens. Environ. 86, 257–271 (2003).

21. S. A. Garver, D. A. Siegel, and B. G. Mitchell, “Variability innear surface particulate absorption spectra : what can a satelliteocean color image see?” Limnol. Oceanogr. 39, 1349–1367(1994).

22. A. Bricaud, M. Babin, A. Morel, and H. Claustre, “Variabilityin the chlorophyll-specific absorption coefficients of naturalphytoplankton: analysis and parameterization,” J. Geophys.Res. 100, 13331–13332 (1995).

23. F. Vidussi, H. Claustre, B. B. Manca, A. Luchetta, and J. C.Marty, “Phytoplankton pigment distribution in relation to up-per thermocline circulation in the eastern Mediterranean Seaduring winter,” J. Geophys. Res. 106, 19939–19956 (2001).

24. L. B. Eisner, M. S. Twardowski, and T. J. Cowles, “Resolvingphytoplankton photoprotective:photosynthetic carotenoid ra-tios on fine scales using in situ spectral absorption measure-ments,” Limnol. Oceanogr. 48, 632–646 (2003).

25. E. N. Lorenz, “Atmospheric predictabilty as revealed by nat-urally occurring analogues,” J. Atmos. Sci. 26, 636–646 (1969).

26. R. T. Clark and M. Déqué, “Conditional probability seasonalpredictions of precipitation,” Q. J. R. Meteorol. Soc. 129, 1–15(2003).

27. S. Thiria, C. Mejia, F. Badran, and M. Crepon, “A neuralnetwork approach for modelling nonlinear transfer function:application for dealiasing spaceborne scatterometer data,” J.Geophys. Res. 98, 827–841 (1993).

1 November 2006 � Vol. 45, No. 31 � APPLIED OPTICS 8115