chapter 9 the use of neural networks, principal com ...brainmaker professional, a commercially...

18
Chapter 9 The Use of Neural Networks, Principal Com= ponent Analysis and Universal Process Modeling for the Interpretation of Environmental Data Devon A. Cancilla and Xingdmn Fang In recent years. numerous applications of computer-based methods to environmental chemistry have been. developed. These include the use of principal component analysis (PCA), soft independent modeling of class analogy (SIMCA), geogTaphical information systems (GIS), neural networks and expert systems (Natusch et al, 1983; Breen and Robinson, 1985; James 1993). The use ofthese techniques has been driven by the need to convert complex environmental analyiical data into useful information. Regulatory efforts, clean-up strategies, monitoring programs and other environmental efforts aU on the successful conversion of analytical data into a form that contains relevant information necessary to make decisions. Among others. analytical measurements are used to e·valuate loadings of toxic chemicals into ecosystems, the effectiveness of remediation efforts and in assessing drinking water treatment standards. Uniortunately, differing analytical methodologies, varying degrees of control in the anal:ytical process, and the complexity of environmental data have aH challenged the environmental scientist's ability to adequately translate data into environmentally useful information.. This is illustrated by the tact that there can be greater than. 65% relative standard deviation in the amount of specific contaminants reported by laboratories when the contaminants are at the parts per billion (ng/1) level (Garfield, 1991 ). These types of problems have led to situations where entire data sets, covering years of analysis, have been declared useless (Ben.noit, 1994). The first step in interpreting environmental Cancilla, r::i.A:-and X. Fang. 1996. "The Use of Neural Networks, Principal Component Analysis and Universal Process Modeling for the Interpretation of Environmental Data." Journal of Water Management Modeling Rl91-09. doi: 10.14796/JWMMR191-09. ©CHI 1996 www.chtjournal.org ISSN: 2292-6062 (Formerly in Advances in Modeling the Management of Stormwater Impacts. ISBN: 0-9697422-5-8) 153

Upload: others

Post on 20-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 9 The Use of Neural Networks, Principal Com ...Brainmaker Professional, a commercially available neural network was used in this project (California Scientific Software, 1993)

Chapter 9

The Use of Neural Networks, Principal Com= ponent Analysis and Universal Process Modeling for the Interpretation of Environmental Data

Devon A. Cancilla and Xingdmn Fang

In recent years. numerous applications of computer-based methods to environmental chemistry have been. developed. These include the use of principal component analysis (PC A), soft independent modeling of class analogy (SIMCA), geogTaphical information systems (GIS), neural networks and expert systems (Natusch et al, 1983; Breen and Robinson, 1985; James 1993). The use ofthese techniques has been driven by the need to convert complex environmental analyiical data into useful information. Regulatory efforts, clean-up strategies, monitoring programs and other environmental efforts aU on the successful conversion of analytical data into a form that contains relevant information necessary to make decisions. Among others. analytical measurements are used to e·valuate loadings of toxic chemicals into ecosystems, the effectiveness of remediation efforts and in assessing drinking water treatment standards.

Uniortunately, differing analytical methodologies, varying degrees of control in the anal:ytical process, and the complexity of environmental

data have aH challenged the environmental scientist's ability to adequately translate data into environmentally useful information.. This is illustrated by the tact that there can be greater than. 65% relative standard deviation in the amount of specific contaminants reported by laboratories when the contaminants are at the parts per billion (ng/1) level (Garfield, 1991 ). These types of problems have led to situations where entire data sets, covering years of analysis, have been declared useless (Ben.noit, 1994 ). The first step in interpreting environmental

Cancilla, r::i.A:-and X. Fang. 1996. "The Use of Neural Networks, Principal Component Analysis and Universal Process Modeling for the Interpretation of Environmental Data." Journal of Water Management Modeling Rl91-09. doi: 10.14796/JWMMR191-09. ©CHI 1996 www.chtjournal.org ISSN: 2292-6062 (Formerly in Advances in Modeling the Management of Stormwater Impacts. ISBN: 0-9697422-5-8)

153

Page 2: Chapter 9 The Use of Neural Networks, Principal Com ...Brainmaker Professional, a commercially available neural network was used in this project (California Scientific Software, 1993)

154 Use ofNN, peA and UPMfor Interpreting Environmental Data

data, therefore, is to ensure that the analytical variability is much less than the environmental variability being measured. This can only be done if laboratories adhere to strict quality control principles. Computational tools can then success­fully be used to detect trends associated with changing environmental conditions.

The Niagara River Toxics Management Plan is a program established by Environment Canada, the U.S. Environmental Protection Agency Region II, the Ontario Ministry of the Environment and the New York State Department of Environmental Conservation. The plan has, as one of its stated goals, to achieve a significant reduction oftoxic contaminants in the Niagara River and to reduce the inputs of specific toxic chemicals from point and non-point sources by 50% by 1996 (Williams et aI., 1994). Associated with this plan is an upstream! downstream monitoring program designed to specifically measure target organic compounds. The analytical procedures used to support this monitoring program are prescribed by the Niagara River Analytical Protocol and contain specific guidelines controlling the analytical methodologies and associated quality con­trol procedures used to generate analytical data (Analytical Protocol Group of River Monitoring Committee, 1992). Because this program has been in place since 1987 and because of its associated monitoring program which has a rigorous analytical component, the data generated from this program is of suitable quality for analysis by specific chemometric methods. This chapter describes the use of neural networks (NN), PCA and universal process modeling (UPM) for the evaluation of analytical data generated from three locations along the Niagara River (Figure 9.1).

This project had four specific goals: 1. To use NN, PCA and UPM techniques to detect variations in the

levels of target organic compounds over time between specific locations along the Niagara River.

2. To use NN, PCA and UPM techniques to identifY the source of water samples collected from locations along the Niagara River.

3. To use UPM techniques to detect variations in the levels of target organic compounds over time within specific locations along the Niagara River.

4. To evaluate the use ofNN, PCA and UPM techniques as tools for identifying non-target contaminants using a broad spectrum analyti­cal approach.

9.1 Experimental Design

The Niagara River is a major interconnecting waterway between Lake Erie and Lake Ontario. Flowing northerly from the former to the latter, the Niagara River drops some 100 meters in elevation over a distance of 58 kilometers. The

Page 3: Chapter 9 The Use of Neural Networks, Principal Com ...Brainmaker Professional, a commercially available neural network was used in this project (California Scientific Software, 1993)

9.1 Experimental Design

Lake Ontario

Ontario

Miles o 1 2 3 4 5 h-l, III I 1'1 ,I

o 2 4 6 S Kilometres

New York

U.S.A. Niagara Falls

CANADA ~-------

Lake Erie

Figure 9.1 Sampling sites along the Niagara River (FE: Fort Erie; BWI: Buffalo Water Intake; NOTL: Niagara-on-the-Lake).

155

Page 4: Chapter 9 The Use of Neural Networks, Principal Com ...Brainmaker Professional, a commercially available neural network was used in this project (California Scientific Software, 1993)

156 Use ofNN, peA and UPMfor Interpreting Environmental Data

river drains an urban region which is heavily industrialized and contains numerous chemical dump sites. The river includes Niagara Falls, which physically divides the river into an upper and a lower section. Two permanent sampling stations were established in 1987 and are located to collect representa­tive samples of water entering the Niagara River from Lake Erie and exiting the Niagara River into Lake Ontario. These stations are Fort Erie (FE) and Niagara­on-the-Lake (NOTL). A third station, the Buffalo Water Intake (BWI) was established in the early 90s and is tocated above the head of the Niagara River (Figure 9.1).

Weekly water samples are collected, extracted and analyzed by gas chromatography-mass spectrometry (GC-MS) following the procedures de­scribed by the Niagara River Analytical Protocol (Anaiytical Protocol Group, 1992). In general, 24 hour composite samples are collected and extracted using Goulden Large Volume Extractors. The extracts are analyzed for specific target organic compounds such as Chlorinated Pe~iicides (OCs) and Polynuclear Aromatic Hydrocarbons (PAHs) using GC-MS. The GC-MS data from these samples can be described in terms of a multivariate problem (Lavine, 1992; Lavine et at.,1993). That is, a large number of data points or variables (chromatographic and spectral data representing different compounds) are used to describe an object (water or environmental quality of a site). The analytical data, usually reported in ngll concentrations, is transferred into Microsoft Excel for analysis by NN, PCA and UPM methodologies (Figure 9.2).

The entire data set consisted of samples collected from FE, NOTL and B WI between 1987 and 1994 (Table 9.1). The data set contained 359 samples each measuring 23 target compounds from Fort Erie, 338 samples each measuring 21 target organic compounds from NOTL, and 42 samples each containing 32 variables for the Buffalo Water Intake. A subsetofthis data, consisting of samples collected from BWI, FE, and NOTL between 1993 and 1994 was used to determine between-site variability using PCA, NN and UPM techniques. These samples consisted of 149 samples each measuring 32 target compounds. The entire data set (samples collected from 1987 through 1994) was used to determine within site variability at the FE and NOTL locations.

Table 9.1 Data set used in the study,

Name of Data Set Region Numherof I Numherof Date Samples Variables

Niagara River Data BWI 42 32 1993-i994 FE 55 32 1993-1994

NOTL 52 32 1993-1994

NOTLData NOTL 338 21 1987-1994

l'EUata f'E 359 I 23 1987-1994

Page 5: Chapter 9 The Use of Neural Networks, Principal Com ...Brainmaker Professional, a commercially available neural network was used in this project (California Scientific Software, 1993)

9.1 Experimental Design

L Original Data Edited by EXCEL] I

r I

Data Matrix

r;:UIraI ) Visualization ~tworkJ I ~.~

\ !

~l Dlagnostio Information l---J I ~

i Result Interpretatio~J

Figun~ 9.2 Illustration of data analysis.

9.1.1 Neural Networks

157

A neurai network is a computer program designed to link a variety of inputs through a series ofinterconnected associations into a specific output The output produced from t~e associations can be used for problems related to predictions, classification, transformation and modeling (Zupan et at, 1993; Lawrence, 1993). Neural networks have been used in a variety of chemical applications, including chromatographic and spectral pattern recognition (Long et at, 1991; WIenke et aI., 1994; Zupan et at, 1993; Lawrence, 1993). For neural networks to be successfully applied to specific problems, the problems must be appropri­ately defined, a data set must be established and the network must be trained.

Page 6: Chapter 9 The Use of Neural Networks, Principal Com ...Brainmaker Professional, a commercially available neural network was used in this project (California Scientific Software, 1993)

158 Use ofNN, peA and UPMfor Interpreting Environmental Data

Brainmaker Professional, a commercially available neural network was used in this project (California Scientific Software, 1993).

9.1.2 Principal Component Analysis (PCA)

PCA is a display method for mapping multivariate data into a two­dimensional plane. This method first calculates the correlation matrix, then diagonalizes it to obtain the eigenvalues and eigenvectors. Finally, it transforms the original data into new ones by using the matrix of eigenvectors as a transformation matrix. The map is obtained by plotting the transformed data against whatever two of the new components bring the largest portion of the information into the correlation matrix. Similar samples lie close together in pattern space, forming clusters (Marssart, 1988). The variables modeled in this project include concentration and compound, generating a two dimensional space. Inspect 0.73 was used in this project (Lohninger, 1994).

9.1.3 Universal Process Modeling (UPM)

UPM is an M-in, M-out algorithm. The algorithm receives its input data vector of size M and responds with an output vector ofthe same dimensionality. To construct the response vector, UPM makes use of a reference library which is a database of exemplar patterns. Each time it is presented with a new test signal, UPM creates a localized model based on a subset of patterns selected from among the patterns stored in the reference library. The selection of exemplars for the localized model is based on the similarity of the test vector to each pattern in the reference library and the relative position of the exemplars. The similarity, which is calculated by the advanced metric, is used in two places: in selecting nearest neighbour images from the reference library, and in constructing the coefficients used to linearly combine those images into the predicted image. Once the exemplars are selected, the model is evaluated to determine the response vector and output diagnostic or classification information.

Universal Process Modeling is a proprietary empirical modeling technique which requires an historical data set that adequately describes the system. UPM provides a predicted output based upon the comparison of the historical data set with those obtained as contemporary data sets. Modelware Professional was used for this project (Teranet IA Inc, 1992).

9.2 Between-Site Variability

The initial question to be addressed was whether the analytical data from three locations along the Niagara River could be used to observe between-site variability. By determining the variability between-sites, a second question to

Page 7: Chapter 9 The Use of Neural Networks, Principal Com ...Brainmaker Professional, a commercially available neural network was used in this project (California Scientific Software, 1993)

9.2 Between-Site Variability 159

be addressed was whether the location of the sample could be determined based solely upon the analytical data.

Figures 9.3 and 9.4 show the principal component (PC) score plots that characterize the samples. Figure 9.3 is the score plot of the first PC (X-axis) against the second PC (Y-axis). Labeled points on the graph indicate individual samples which are outliers from the expected cluster and show the date of collection. For example, N940414 indicates a sample collected at Niagara-on­the-Lake on April 14, 1994. Two samples from FE and three from NOTL were observed to contain data outside the normal range. In each case, higher concentrations of specific target compounds were observed. These higher values could be associated with either true increases of these compounds as a result of a spill or release, or could be analytical outliers. Figure 9.4 is a magnified plot around the clusters in Figure 9.3. In general, two distinct clusters were observed, and were associated with the FE and NOTL locations. Data from the BWI was distributed between the FE and the NOTL clusters with more BWI data associated with the FE cluster. As the BWI and FE locations are in relatively close proximity, it is not surprising to observe this effect.

(PC2) 10.0-

0.0-

-10.0-

o BWI • FE

-20.0 - • NOTL

-5.0

I

0.0 5.0

N940414 •

• N93061 0

• F940406

• N930930

I

10.0 (PC1)

Figure 9.3 peA plot of 149 water samples from the Niagara River (FE: Fort Erie; BWI: Buffalo Water Intake; NOTL: Niagara-on-the-Lake).

Page 8: Chapter 9 The Use of Neural Networks, Principal Com ...Brainmaker Professional, a commercially available neural network was used in this project (California Scientific Software, 1993)

160

2.0

1.0

0.0

-1

Use of NN, peA and UP M for Interpreting Environmental Data

(PC2)

o o

o o ..

0 ~ 0 .. • {)

• 0 *,'" • o

0 .. 0 •

-4.0 (PC1)

Figure 9.4 Magnified plot of clusters in Figure 9.3 (FE: Fort Elie; OWl: Buffalo Water Intake; NOn,: Niagara-on-the-Lake).

NN and UPM analysis was undertaken on the same data set of 149 samples. Twenty-five percent of the data, with data coming from each of the three locations, were randomly selected for training input. The prediction rate for each of the methods was detennined based upon the number of times each of the systems could correctly identify the source of me analytical data when presented with the remaining 75% ofthe samples. Table 9. 2 shows the results ofthis study. In general, the NN could correctly identify the source of the data 94.4% of the time while UPM analysis had a prediction rate of 91.7%.

Table 9.2 The classification rate (%) of samples from three locations.

Method

Recognition Rate (%) 100

Prediction Rate (%) 94.4

Total Classification Rate (%) 98.7 93.3

Page 9: Chapter 9 The Use of Neural Networks, Principal Com ...Brainmaker Professional, a commercially available neural network was used in this project (California Scientific Software, 1993)

9.2 Between-Site Variability 161

In order to observe the effect of the size of the training set on prediction rate, the training set size was varied from 2% (3 samples) to 75% (112 samples) of the total number of samples. This is an important consideration as many data sets will have a limited number of samples which can be used for training. This is also of importance due to the cost associated with collecting and analyzing large numbers of samples. Obviously, the ideal situation for determining variability would be to use the smallest training set possible. Figure 9.5 shows the result of this study. In general, a 65% prediction rate was achieved with only 2% of the samples for both NN and UPM methods. The prediction rate increased steadily up to the 90 % level using only 20% of the samples. An unusual observation was the drop in prediction rate using UPM from 90% to 85% when between 20% and 50% of the samples were used for training. This may be a result of using samples which were collected at different times rather than samples which were collected sequentially. It is unclear why this did not affect the NN analysis. A smaller dip

100

• Brain maker

96 liliiii ModelWare

, ------.. "

, - , , ~ 90 I \

, 0 , , , - , ,

\ \ , CD

, \

, , \ , , en I \

, , , - 86 I B' UJ , CD , - I

.5 I

CD 80 , I

~ I I

a: iii c:::: , 0 76 II' 13

=0 ~ 70 I'l..

66

60~~~--.-~~~--------~----------~ 3 9 16 23 30 37 46 76 112

Sample Number in Training Set

Figure 9.5 The effect of training set size on the prediction rate.

Page 10: Chapter 9 The Use of Neural Networks, Principal Com ...Brainmaker Professional, a commercially available neural network was used in this project (California Scientific Software, 1993)

162 Use of NN, peA and UP M for Interpreting Environmental Data

in prediction rate (73-70%) was observed using the NN system when between 15 and 20% of the samples were used as training sets. Both the UPM and NN analyses were undertaken using default classification features. It is believed that higher prediction rates can be achieved when parameters within both the UPM and NN systems are optimized. These results demonstrate that it is possible to use peA, NN and UPM methods to identity data from separate iocations. This is an important first step in being able to monitor real changes in chemical contamination over time. That is, if a normal set of conditions crumot be defined, then it will be impossible to determine changes within a system.

9.3 Within-Site Variability

UPM was used to evaluate within-site variability from both FE and NOTL. Baseline data from 1987 was used as the learning set and consisted of 4 7 samples for each of the locations. The prediction data for FE consisted of 312 samples targeting 11 specific compounds. Samples were collected between January 1988 through June 1994. The prediction data for NOTL consisted of 288 samples targeting 18 specific compounds. Samples were collected from January 1988 through June 1994.

The UPM output consists of two graphs, a Trend plot and a Deviation plot. Trend plots, as shown in the graph in Figure 9.6A, contain a variety of information about the behavior of specific target compounds relative to the other target compounds within a chromatographic run. Each division on the X -axis of the plot represents a sampling event (one chromatographic run), while the Y-axis represents the concentration. The appearance of negative values on the Y-axis are due to software limitations which autoscales the axis. Two lines are observed: the dark line is the actual measured concentration and the light line is the predicted concentration based on data from the training set. The bottom edge of the graph represents the behavior of the specific target compound, with the light color indicating that the compound is within normal limits. The dark color indicates that the behavior of the target compound is different from that predicted. The upper edge of the graph shows the behavior of all of the target compounds

relative to predicted values developed from the training set. Dark sections indicate that the data set is outside the expected range, while the light areas indicate that the observed data set is within the range predicted by the modeL Note that a specific compound may be out of range (lower edge) while the entire data set can be within normal limits (upper edge), conversely, the data set may be out of range while the particular target compound is within its normal limits.

Deviation plots, as shown in the graph in Figure 9.6B, were used to observe the behavior of aU compounds within a chromatographic run relative to each other based on data from the training set. The Deviation plot provides a variety

Page 11: Chapter 9 The Use of Neural Networks, Principal Com ...Brainmaker Professional, a commercially available neural network was used in this project (California Scientific Software, 1993)

Element #: 7 10.20 11il .... IaIB. .. - .. IR~ .. __ 1!iD11

:::J "0, 7.78 c.: --(Jl c.: 5.36 a .~

10: N920716 MOD: 2.993 OBS: 8.176

A

BOH ill

C 2.94 <I.'l () c.:

f"""'V~hv'\.r'/''''vvJ'\"y,,''A,,,'Vy/~;,ft' _,-tiL . ...'.,JI .... JIJIA!I.\ . 8 0.52

-1.90 1 50 100 150 200 250

Sampling Event from January, 1988 to June, 1994

Figure 9.6A Part of a typical Modelware status plot (see Figure 9,6B for complement), Trend plot of BHD from Niagara-on-the-Lake (dark Bottom line: measured concentrations; light upper line: predicted concentrations),

;0 v"

~ S. 5' ib ~.

~ "'t

~ :::::. ~

...... 0\ W

Page 12: Chapter 9 The Use of Neural Networks, Principal Com ...Brainmaker Professional, a commercially available neural network was used in this project (California Scientific Software, 1993)

#203 N920716 +5dev

Elemel'lt#: 7 -B Di·f ---------"" -----""-B

dey

System Health Box

I<'igm'e 9.6B Part of It typical Modelwane status plot (see Figure 9.6A for complement). Deviation bar plot of 18 compounds from NOTL (+5 deviation units indicates a significant deviation from historical levels. Each bar on X-axis represents one compound.

-0'1 .j::>.

~ ~

~

~ "0 Q [ ~ ~ <:::> ... ~ ~ ~ ~ -~. ~ ..: ~. Sl ~ l:l ..... ~ l:l

Page 13: Chapter 9 The Use of Neural Networks, Principal Com ...Brainmaker Professional, a commercially available neural network was used in this project (California Scientific Software, 1993)

9.4 Conclusions 165

of information relative to the trend plot. Deviation plots contain the System Health Box (small box on the left of the Deviation plot) which shows how the entire data set relates to values predicted from the training set. A value of one would indicate that the predicted and observed values match exactly. The level at which the system is determined to be unhealthy, meaning significantly different from expected, can be set by the user and for the purpose of this study was set at 0.85. The X-axis of the Deviation plot shows each individual compound, with each bar representing one ofthe compounds within a chromato­

graphic run (l,4-dichlorobenzene, 1,2,4-trichlorobenzene, 1,2,3-trichlorobenzene, 1,2,3,4-tetrachlorobenzene, pentachlorobenzene, hexachlorobenzene, BHD, Lin­dane, heptachlor epoxide, dieldrin, hexchlorbutadiene, polychlorinated biphenyls, fluoranthene, pyrene, benzo( a )anthracene, chyresene, bis(2-ethyl-hexyl)phthalate, dioctylphthalate, from left to right). The Y-axis represents deviation units. The Deviation plot also provides a view of the deviation of each compound making up the data set. Individual compounds were defined as within normal limits if they fell within two deviation units, the warning level for each compound was set at three deviation units, and the compound was defined as out of control at four deviation units. The deviation of all compounds relative to the training set ultimately determines the system health. While one compound may show an out­of-control condition, for example, the system health could still be normal (above 0.85) if the combination of the deviations of the other compounds were still within normal or warning levels.

Figure 9. 7 shows a Trend plot generated for FE for the compound BHD. The numbers on the X -axis indicate sampling event (date) covering the specified time period. In general, the observed levels of BHD can be seen to be significantly lower compared to the prediction line. This would indicate that BHD levels at the FE iocation are indeed dropping from those initially found in 1987. This trend was also observed at the NOTL location (Figure 9.6). In contrast, values obtained for pyrene (Figures 9.8,9.9) for both FE and NOTL locations show a different trend. In general, there is little change between the predicted and observed levels over time. Specific increased concentration events, appearing as spikes relative to the prediction line were observed at both locations. These events appear to occur between October and March of each year with the absolu'te magnitude of each event appearing to drop off with time.

9.4 Conclusions

PCA, NN and UPM methods were shown to be useful tools for the prediction and detection of variation in the concentration of target compounds both within- and between- sampling sites. NN and UPM methods correctly identified the source of analytical data based upon minor differences observed in

Page 14: Chapter 9 The Use of Neural Networks, Principal Com ...Brainmaker Professional, a commercially available neural network was used in this project (California Scientific Software, 1993)

'"" ..J

BHD

5.50 1lI---.---IIfI-__ ,.iBU ... IIlIi_JlD--.r ID: F880803 MOD:

DIF:

4.22

2.993 1.441

08S: 81M:

1.338 0.8943

0, 2.94 c ......-(I) c 0 :;::; ~ +J c 1.66 ~ c 0 t.>

0.38

-0.90 Lilii.llllilllJlIIlIIl_!lMIiIIIIIlIBiIinl .. IllBillllmliiIlIliI.IUIIIIL 50 1 00 1 50 200 250 300

Sampling Event from January, 1988 to June, 1994

Figure 9.7 Seven year trend plot ofBHD from Fort Erie. (Dark bottom line: measured concentrations; Light upper line: predicted concentrations)

..-0'1 0'1

~ (';)

~ ~ '"'t:1 Q t ~ ~ ~ S' ~ ~ ~ ..... ~.

~ ..::

i ~ ~ .......

~ ~

Page 15: Chapter 9 The Use of Neural Networks, Principal Com ...Brainmaker Professional, a commercially available neural network was used in this project (California Scientific Software, 1993)

Pyrene 3.20 I ~111~ •• I}"._If"'I!I~_~ill~~I~iIi"D~.i"i'll

83 L_ID: F881123 MOD: 0.4662 OB8: 0.8652 ~ 2. I DIF: -0,399 81M: 0.9153

I-

2.46 ~-2.09

;! 1.72 Ol c "'-'

~ 1.35 0

:;::#

~ C 0.98 ~ c 0 (,) 0.61

0.24

-0.13

-0.50 1 50 100 150 200 250 300

Sampling Event from January, 1988 to June, 1994

Figure 9.8 Seven year trend plot ofpyrene from Fort Erie. (Dark line: measured concentrations; Light line: predicted concentrations)

)c 4.>,.

~ ~ ~.

...-0'1 -.l

Page 16: Chapter 9 The Use of Neural Networks, Principal Com ...Brainmaker Professional, a commercially available neural network was used in this project (California Scientific Software, 1993)

9.00 ~--"." •• """III~"-"IiI,I""".""1fl 8.00

7.00

6.00

~ 5.00 r:: '-'

~ , c (

:;: (

4.00 ~ 4.00 o • ...

r:: ~ r:: o (,)

3.00

2.00

1.00

0.00

ID: N88i 3 9518 -1.

OBS: 81

2.453

j I

~~.j

.. ~,.~~ ~1 i 50 100 150 250

Sampling Event from 1988 to 1994

9.9 Seven year trend ofpyrene from (dark line: measured concentrations; line: concentrations).

-' 0"1 00

s (\)

<Q.,

~ '"tl Q § ~

~ ~ S; ~ (?

'ti ~ :::t ;:S.

(jq

~ ::J -<::

Z· ::J ~ (\) ;:s

~ ~ is"

Page 17: Chapter 9 The Use of Neural Networks, Principal Com ...Brainmaker Professional, a commercially available neural network was used in this project (California Scientific Software, 1993)

References 169

the analytical data generated from different sources. NN techniques achieved a higher identification rate that UPM. This is an important first step in determining changes in the levels of environmental contaminants over time. That is, existing conditions have to be reliably and consistently classified before changes can be observed. Both PCA and UPM provided tools for the direct visualization of data. The visualization of data provides a useful tool for detecting outliers and trends such as the decrease in contaminant concentration over time.

The use ofthese systems for interpreting trends in target compound analysis will assist the environmental scientists ability to convert analytical data into useful information for making specific decisions which relate to the management of specific ecosystems. An additional benefit is the potential for using these systems for identifying and classifying non-target compounds. Once specific patterns have been recognized, the appearance and relative level of non-target compounds can also be classified. The appearance and levels of these com­pounds will aid the environmental analytical chemist in focusing appropriate chemical techniques necessary for identifying these compounds.

Acknowledgements

We would like to thank the Ontario Region of Environment Canada, in particular Mr. F. Philbert, Ms. M. Neilson and Mr. K. Kunts, for allowing us to use the analytical results from the Niagara River project. We are also grateful to Dr. H. Lohninger, who provided us with a copy ofINSPECT, the PCA software used in this project.

References

Analytical Protocol Group of River Monitoring Committee, 1992. Analytical Protocol for Monitoring • .1unbient Water Quality at the Niagara-on-the-Lake and Fort Erie Station, Environment Canada - Ontario Region, CCIW, Burlington, Ontario.

Bennoit, G. 1994. Clean Technique Measurement ofPb, Ag. and Cd in Freshwater: A Redefinition of Metal Pollution, Environ. Sci. Techno!. 28: 1987.

Breen, 1.J. and Robinson, P.E. 1985. Environmental Application of Chemometrics, American Chemical Association, Washington D.C.

California Scientific Software. 1993. BrainMaker Professional User's Guide and Refer­ence Manual, 4th Edition, Nevada City, CA, USA.

Garfield, E.M. 1991. Quality Assurance Principles for Analytical Laboraries, AOAC International.

James W., 1993. New Techniques for Modelling the Management of Storm water Quality Impacts. Lewis Publishers. p395-515.

Page 18: Chapter 9 The Use of Neural Networks, Principal Com ...Brainmaker Professional, a commercially available neural network was used in this project (California Scientific Software, 1993)

170 Use ofNN, peA and UPMfor Interpreting Environmental Data

Lawrence J. 1993. Introduction to Neural Networks: Design, Theory, and Applications, California Scientific Software, Nevada City, CA, USA.

Lavine, B.K. 1992. Environmental Applications of Pattern Recognition, Chemometrics and Intelligent Laboratory Systems, 15: 219-230.

Lavine, B.K.,Stine, A., Mayfield,H.T. 1993. Gas Chromatography-Pattern Recognition Techniques in Pollution Monitoring, Analytical Chimica Acta, 277: 357-367.

Lohninger H. 1994. INSPECT Users Manual, Institute of General Chemistry, Technical University, Vienna, Austria.

Long, J.R., Mayfield, H.T., Henley, M.V. 1991. Pattern Recognition of Jet Fuel Chromatographic Data by Artificial Neural Networks ,'!lith Back-propagation of Enor, Analytical Chemistry, 63: 1256-1261.

MarssartD.L. 1988. Chemometrics: a Textbook, Elsevier Scientific Publisher, Amsterdam, Netherlands.

Natusch ,D.F .S. and Hopke,P .K. 1983. Analytical Aspects ofEnvirorunenta! Chemistry, John Wiley & Sons, p219-262.

Teranet IA Inc. 1992. ModelWare™ and ModelWare Professional User's Manual, Nanaimo, B.C., Canada.

Williams, D.J., Kuntz, K. W., Neilson, M., Pjilbert, F., GJumac, V., Richman,.L. and Suns, K. 1994. The Niagara River Toxies Management Plan: An Approach to Measure and Communicate Progress, Environment Canada - Ontario Region, CCIW, Burlington, Ontario.

Wienke D., Hopke, P. K 1994. Visual Neural Mapping Techniques for Locating Fine Airborne Particles Sources, Environ Sci Technol. 28: 1015-1022.

Zupan, J., Gasteiger, J. 1993. Neural Networks for Chemists, VCH Publishers, New York, NY, USA.