prediction of soil corrosivitynlc-bnc.ca/obj/s4/f2/dsk3/ftp04/mq29604.pdf · suggested: ph,...

PREDICTION OF SOIL CORROSIVITY

USING LINEAR POLARIZATION

by

EUGENIA KALANTZIS

Department of Civil Engineering and Applied Mechanics

McGill University, Montreal

May 1997

A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDlES

AND RESEARCH IN PARTIAL FULFlLLMENT OF THE REQUIREMENTS

FOR THE DEGREE OF

MASTER OF ENGINEERING

O Eugenia Kalantzis, 1997

National Library l*l of Canada Bibliothèque nationale du Canada

Acquisitions and Acquisitions et Bibliographic Services sewices bibliographiques 395 Wellington Sbeet 395. rue Wellington Ottawa ON Kt A ON4 OMwa ON KIA ON4 canada CaMda

The author has granted a non- exclusive licence allowing the National Library of Canada to reproduce, loan, distribute or sel1 copies of this thesis in microform, paper or electronic formats.

The author retains ownership of the copyright in this thesis. Neither the thesis nor substantial extracts î?om it may be printed or othenvise reproduced without the author's permission.

L'auteur a accordé une licence non exclusive permettant a la Bibliothèque nationale du Canada de reproduire, prêter, distribuer ou vendre des copies de cette thèse sous la forme de microfichelfilm, de reproduction sur papier ou sur format électronique.

L'auteur conserve la propriété du droit d'auteur qui protège cette thèse. Ni la thèse ni des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation.

This report presents the results of a study on the benefit of chlonde ion testing in

the prediction of soil comsivity, which is determined using the method of iiiear

polarization.

Existing indusüy standards such as AWWA Cl05 and PACE 82-3 are currently

being used to evaluate the comsivity of soils. ïhese standards consist of various tests,

whose results permit the calculation of a comsivity index. ïhe following tests are

suggested: pH, oxidation-reduction potential, suifide ion content, resistivity, drainage

ability, soil type, and moisture content. Up to this point, no standards have incorporated

chloride ion testing into theû testing procedure, even though the effect of chlonde ions on

the corrosion rate is well documented. It is the goal of this project to determine whether

there is enough evidence to suggest that chlonde ion content be introduced into existing

standards.

In total, 153 soils were tested following the AWWA Cl05 and PACE 82-3

standards, as well as for the chlonde ion content. Of these, 75 soils were tested using

linear polarization to determine the "huee' corrosivity of the soils.

The analysis results showed that the information provided by the chlonde ion

content was not significant enough to suggest that this variable be added to the existing

grids. This is due to the fact that soi1 resistivity, which is a required test in both

standards, accounts for the presence of chlonde ions. However, it should be noted that

the chlonde ion content is a bener predictor of corrosivity than soil resistivity, and it is

suggested that chlonde ion content be tested whenever possible.

Ce rapport présente les résultats d'une étude sur la nécessité de déterminer la

teneur en chlorures dans I'évaluation de la corrosivité des sols, cene dernière étant

obtenue par la méthode de la polarization linéaire.

Présentement, les normes utilisées par l'industrie sont basées sur des grilles

d'évaluation permettant le calcul d'un index de corrosivité, e.g., les grilles AWWA Cl05

et PACE 82-3. Chaque g N e d'évaluation est constniite à partir d'une série de tests,

notamment le type de sol, le pH, le potentiel rédox, la teneur en sulfures, la résistivité, le

drainage, et l'humidité du sol. lusqu'à date, aucune norme n'inwrpore la teneur en

chionires dans sa grille d'évaluation, même si l'effet des chlomres sur la vitesse de

corrosion est bien documenté. l'objectif de cette étude est donc de déteminer si la

teneur en chionire devrait être introduite dans les grilles d'évaluation.

Au total, 153 spécimens de sols varies ont été testés selon les grilles AWWA

Cl05 et PACE 82-3, ainsi que pour la teneur en chionires. De ce nombre, 75 ont été

testés par la méthode de la polarization linéaire, et la corrosivité de ces sols a été ainsi

formellemnt déterminée.

Après l'analyse des résultats des tests, il a été determiné que l'information

additionnelle fournie par la teneur en chionires n'a pas été significative pour suggérer

que ce paramètre soit incorporé dans les grilles existantes. Ceci est dû au fait que la

résistivité du sol, qui est une mesure déjà incluse dans les deux normes, représente

indirectement sa teneur en chionires. Par contre, il est important de noter que la teneur

en chionires est une variable qui prédit mieux la corrosivité du sol que la résistivité.

Cependant, i l est fortement suggéré que la teneur en chlorures soit évaluée et étudiée si

possible.

iii

TABLE OF CONTENTS

ABSTRACT

RÉsm LIST OF FIGURES

LIST OF TABLES

NOTATION AND ABBREVIATiONS

ACKNOWLEDGMENTS

1. INTRODUCTION

2. CORROSION AND CORROSION CONTROL

2.1 Principles of Electrochemical Corrosion

ii ... 111

viii

xi ...

Xl l l

xvii

1

4

4

Necessary Elements for Corrosion 4

Physical Foms of Corrosion 7

2.1.2.1 Uniform Attack 7

2.1.2.2 Galvanic Attack 8

2.1.2.3 Crevice Corrosion 1 O

2.1.2.4 Pitting Corrosion 11

2.1.2.5 Erosion Corrosion 11

2.1.2.6 Selective Leaching 12

2.1.2.7 Stress Corrosion 13

Why Do Metals Corrode? 14

Deteminhg the Rate of Corrosion 16

The Exchange Current Densiîy, i, 22

Detemination of r,, and 4, 23

2.1.6.1 Activation Polarization 23

2.1.6.2 Concentration Polarization 25

Effect of Varying Parameters Using Polarization Diagrams 27

2.1.7.1 PO2 and H' Concentration 27

2.1.7.2 O2 Solubility

2.1.7.3 Multiple Corrodents

2.1.7.4 Galvanic Attack

2.1.7.5 Passivity

2.1.7.6 Chloride Content

2.2 Measuring Corrosion Rates

2.2.1 Tafel Extrapolation

2.2.2 Liear Polanzation

2.3 Soi1 Corrosion and Its Effects on Underground Infrastmcture

2.3.1 Differential Aeration Cells

2.3.2 Galvanic Anack

2.3.3 Selective Leaching

2.3.4 Stress-Corrosion Cracking

2.4 Standards for Determining Corrosivity of Soils

2.4.1 AWWA Cl05

2.4.2 PACE 82-3

3. PROCEDURES AND APPARATUS

3.1 Soil Samples

3.2 Soil Type

3.3 Drainage AbilityIMoisture Content

3.4 pH

3.5 Oxidation-Reduction Potential (Redox Potential)

3.6 Resistivity, p

3.7 hif ide Content

3.8 Chloride Concentration

3.8.1 Necessary Equipment

3.8.2 Sample Preparation

3.8.3 Electrode Preparation

3.8.4 Preparation of Calibrating Solutions

3.8.5 Calibration of Electrode

3.8.6 Calibration C w e and Equation

3.8.7 Testing Soil Samples

3.8.8 Determination of Concentration of Chloride Ions of Soil

3.9 Linear Polanzation

3.9.1 Necessary Equipment

3.9.2 Trial Runs and Reproducibility of Results

3.9.3 Sample Preparation

3.9.4 Preparation of the Working Electrode

3.9.5 Polarization ofthe Steel Specimen

3.10 Calculating the Corrosivity Indices According to AWWA and PACE

4. ANALYSIS OF EXPERIMENTAL RESULTS AND DISCUSSION

4.1 Analy~is of Preliminary Data

4.1.1 Data Exploration

4.1.2 Transformation of Variables

4.1.3 Regression of the Individual Variables

4.1.4 Correlation Matrix

4.1.5 RSQUARE Results

4.1.6 Categoncal Variables

4.1.7 Variables Retained For Further Analysis

4.2 Consideration of Chlorides in Predicting the Corrosion Rate

4.2.1 Determiniig Significance

4.2.2 The Effect of Removing Outliers

4.3 Power Analysis

5. CONCLUSIONS AND RECOMMENDATIONS

5.1 Summary of Results

5.2 Recommendation for Future Work

APPENDM A: DERIVATION OF POTENTIAL EQUATIONS

A.l Equation for Czn A.2 Equation for b c u

A.3 Equation for b+

APPENDIX B: TESTING FOR CHLORIDE ION CONCENTRATION

B.l Creating a Concentration vs. Potential Curve

APPENDIX C: TRIALS FOR REPRODUCIBILITY

APPENDIX D: PRINCIPLES OF REGRESSION ANALYSIS

D.l Data Exploration

D.2 Simple Linear Regression Analysis

D.3 Data Transformations

D.4 Multiple Variable Regression

D.5 Categorical Variables

D.6 Outliers

D.7 Variable Selection

D.8 Model Validation

D.9 Power

D.10 The SAS Statistical Package

vii

LIST OF FIGURES

The Two Stages of Crevice Corrosion

Pining Corrosion

Erosion Corrosion

Microstructure of Gray Cast Iron

Intercrystalline Crack

Transcrystallie Crack

Stable and Unstable Positions

Schematic of a CdZn Banery

4 vs. log 1 for CdZn Banery

Polarization Diagram for Corrosion in Acidic Solution

Polarization Diagram for Corrosion in Neutra1 Aerated Water

Dependence of I,,, on the Value of 1,

Activation Polarization Diagram

Concentration Polarization Diagram

Distribution of H' in Time

Effect of Varying PO2

Variation of 02 Solubility with NaCl Concentration

Effect of Multiple Corrodents

Galvanic Anack

Polarization Diagram of a Metal Exhibiting Passivity

Variation of 1, and 1, with Potential4

Tafel Curve

Schematic of Setup for Tafel Test

Tafel Curve Obtained by Varying Q

Tafel Regions

Obtaining i,, fiom the Tafel C w e

L i e a r Polarization Curve

Components of the Working Electrode

Typical Tafel Plot

Typical Linear Polarization Curve

Spreadsheet for Quick Calculation of Corrosivity Indices

SAS Output: Univariate Procedure using pHdir

SAS Output: Univariate Procedure usingpHsat

SAS Output: Univariate Procedure using Reddir

SAS Output: Univariate Procedure using Reddir,

with Extreme Values Removed

SAS Output: Univariate Procedure using Redsar

SAS Output: Univariate Procedure using Redsat,

with Extreme Values Removed

SAS Output: Univariate Procedure using Resdir

SAS Output: Univariate Procedure using Ressar

SAS Output: Univariate Procedure using Chloride

SAS Output: Univariate Procedure using CorrRate

SAS Output: Univariate Procedure using LChl

SAS Output: Univariate Procedure using LResdir

SAS Output: Univariate Procedure using LRessar

SAS Output: Univariate Procedure using LCorr

SAS Output: Univariate Procedure usingpHdir Residual

SAS 0utput:pHdir Residual vs. Predicted Value of CorrRare

SAS Output: Univariate Procedure usingpHsat Residual

SAS 0utput:pHsat Residual vs. Predicted Value of CorrRaie

SAS Output: Univariate Procedure using Reddir Residual

SAS Output: Reddir Residual vs. Predicted Value of CorrRoie

SAS Output: Univanate Procedure using Redsat Residual

SAS Output: Redsat Residual vs. Predicted Value of CorrRate

SAS Output: Univanate Procedure using LResdir Residual

SAS Output: LResdir Residual vs. Predicted Value of CorrRate

SAS Output: Univariate Procedure using LRessat Residual

SAS Output: LRessat Residual vs. Predicted Value of CorrRate

SAS Output: Univariate Procedure using LChl Residual

SAS Output: LChl Residual vs. Predicted Value of CorrRate

SAS Output: Correlation Matrix

SAS Output: Correlation Matrix for Clay Samples

SAS Output: Correlation Matnx for San6 Samples

SAS Output: Correlation Matrix for SandClay Samples

Potentials Obtained f?om Calibrating Solutions: Senes 1

Calibration Curve for Senes 1

Trial No. 1: Tafel Results

Trial No. 1: Linear Polarization Results

Trial No. 2: Tafel Resilts

Tnal No. 2: Linear Polarization Results

Stem and Leaf Diagram

Stem and Leaf Diagrarn and Boxplot

Normal Probability Plot

Y vs. X Plot

Example of an Insignificant Predictor

ANOVA Table

Normal Distribution

Normally Distnbuted Y Values

Y Values Not Distributed Normally

Ideal Y vs. X Distribution

Non-Linear Relationship Between X and Y

Venn Diagrams for 1 and 2 Independent Vanables

Effect of Outliers on R'

LIST OF TABLES

Elecîromotive Senes

Galvanic Senes for Seawater

Typical p Values

Soil Type Results

Drainage Ability Results

Moisture Content Results

pH-direct Results

pH-saturated Results

Redox-direct Results

Redox-satwated Results

psaturated Results

p-direct Results

Sulfide Content Results Using lodine Solution

Sulfide Content Results Using HCI and Lead Acetate Paper

Preparation of Calibrating Solutions

Chloride Ion Concentrations

Values Specified for Tafel Test

Values Specified for Linear Polanzation Test

Results Obtained for Soil Sample No. 96

Corrosion Rates

Corrosion indices According to AWWA

Corrosion Rates According to PACE

Values of LChl

Values of LResdir

Values of LRessat

Values of LCorr

Residual Characteristics: pHdir

Residual Characteristics: pHsut

Residual Charactenstics: Reddir

Residual Charactenstics: Redrar

Residual Characteristics: LResdir

Residual Characteristics: LRessat

Residual Characteristics: LChl

Possible 1,2,3, and 4-Variable Models

Possible Models with Corresponding SSE and DOF Values

Cntical Values for F

Information about Possible Models

Correlation Matnx

Dummy Variables for Soilfype: 2 Variable Case

Dummy Variables for Soilfype: 3 Variable Case

Relationship Between a, P, and Power

Relationship Between n and Power

Values of L for a = 0.5

xii

NOTATION AND ABBREVIATIONS

Type 1 error

Type II emor

Point of intersection of a line with y = O

Slope of a line

Overvoltage

Current density

Current density at the anode

Current density at the cathode

Corrosion current density

Exchange current density

Mean

Resistivity

Summation

Standard deviation

Potential

Nerst potential

Standard Nerst potential

Corrosion potential

Oxidation potential

Reduction potential

Benchmark mode1 in a significance test

Model being tested in a significance test

Ohm

Degrees Celsius

Activity

Amperes

xiii

ANOVA

&x

a d

atm

AWWA

C

cal

CD

Chloride

cm

CorrRare

CP

dec

deg

DOF

Drainage

Analysis of Variance

Activity of species being oxidized

Activity of species &mg reduced

Atmosphere

American Water Works Association

Coulomb

Calories

Cook's distance

Variable representing chloride content of soi1

Centimeter

Variable representing the corrosion rate of a metal in a soi1

Mallow's number

Decade

Degrees

Degrees of fieedom

Variable representing the drainage ability of a soi1

Correlation coefficient

Residual

Electron

Equivalent

Faraday's Constant

Effect size

Current

Corrosion Current

Joule

Kelvin

Number of variables in the R-mode1

Mass transfer coefficient

kilogram

liters

xiv

M

mm

MU+

Moisture

mV

N

N

P

pHdir

PO2

PPm

PRESS

PROC

R2adjurted

Reddir

Redox

Redsat

Variable representing the logarithm of the cliloride concentration of a soi1

Variable representing the logarithm of the corrosion rate of a metal in a soi1

Variable representing the logarithm of the resistivity of a soil, when measured

as received in the laboratory

Variable representing the logarithm ofthe resistivity of a soil, when measured

afier saturation with distilled water

Metal

Millimeter

Metal ion

Variable representing the moisture coptent of a soi1

Millivolt

Number of obse~ations

Solution normality

Number of variables in a mode1

Variable representing the pH of a soil, when measured as received in the

laboratory

Variable representing the pH of a soil, when measured afier saturation with

distilled water

Partial pressure of oxygen

Parts per million

Predicted residual sum of squares

Procedure statement in SAS

Adjusted partial correlation

Variable representing the reduction potential of a soil, when measured as

received in the laboratory

Oxidation-reduction

Variable representing the reduction potential of a soil, when measured afier

saturation with distilled water

Variable representing the resistivity of a soil, when measured as received in

the laboratory

Ressat Variable representing the resistivity of a soil, when measured afler saturation

with distilled water

b Polarization resistance

SandIClay A soi1 composed of a mixture of soil and clay particles

SAS

SSE

SuIfHcr

Suljl

T

v VIF

wt. %

X'

xd

x, Y

Y'

Y=.

Siatistical Analysis System

Error surn of squares

Variable representing the result of the sulfide content test using HCI and Lead

Acetate Paper

Variable representing the result of the sulfide content test using the iodine

solution

Temperature

Volt

Variation Inflation Factor

Weight percentage

Mean of X,

Dummy variable

Independent variable

Dependent variable

Mean of Y

year

xvi

ACKNOWLEDGMENTS

First and foremost, 1 would iiie to express my gratitude to my supervisor,

Prof. Saeed M. Mina, whose unending guidance and encouragement proved invaluable in

the successhl realization of this research program.

1 am also deeply indebted to Mr. Nourrediie Kadourn of COREXCO,

Montreal, for suggesting the research topic, and for devoting considerable attention to its

progress. Furthemore, 1 am very grateful to Mr. Gérard Benchétrit and to COPEXCO,

Montreal for the unrestricted access to equipment and materials, without which this

project would not have been possible.

Fially, 1 would like to thank my family and fiiends for their support and

encouragement.

The research project was supported by the Natural Sciences and Engineering

Research Council's PGS-A Scholarship held by the author.

xvii

CHAPTER 1:

INTRODUCTION

The corrosion of underground infrastructure is a very widespread problem.

Stmctures such as water mains, natural gas pipelines, and gasoline storage containers are

only some of the many structures affected by soi1 corrosion al1 around the world. When

a nahiral gas pipeline or a gasoline storage container fails, there is a high danger of fire

and subsequent explosion. Furthemore, the environmental darnage caused by such

failures is oflen devastating and irreparable. Failure of water mains can be equally

dismptive, as Canadians depend on drinking water for domestic, industnal and fire

fighting purposes. The physical integrity of the water distribution system is an essentiel

component for the health and economic well being of Canadians.

Every year, $200 million are spent on renewing iron water mains in Canada. The

majority of the problems occur on water mains made up of cast or ductile iron, which

account for 70% of the water mains. The fundamental cause of the detenoration of the

pipes is soi1 corro~ionl'~. ïhere is therefore a great need to determine the causes of soi1

corrosion, and to establish a quick and easy method of evaluating the corrosivity of soils.

There has been much research done in the field of corrosion and, in particular, soil

corrosion. Certain standards are now in use by the industry to determine the extent to

which a soi1 is considered corrosive. Standards such as that of the American Water

Works Association (AWWA C 105) and PACE 82-3 are widely used to determine

whether or not a metal subjected to a given soi1 will suffer detenoration. In al1 of these

standards, certain soi1 charactenstics are measured and a standard grid allows the

technician to calculate a corrosivity index for the soil. The term grid refers to the

established method of calculating the corrosivity index, and is composed of the test

results in combiiation with the appropnate points allocated to each. However, none of

the standards take into account the chloride ion content of the soil. It has been argued

that chloride ion content is measured indirectly through the measurement of the soi1

resistivity, which is incorporated in some f o m in al1 the grids. This variable accounts for

the total ion content responsible for the conductive nature of the soil. However, chlonde

ions have a dual role in the corrosion process. They not only promote corrosion because

they are conductive by nature, but they also inhibit passivity of the metal, i.e. they inhibit

the formation of an oxide layer on the metal surface which protects the metal from

corrosion[21. For this reason, it is suspected that the ineasurement of chloride ion

concentration will permit prediction of the corrosivity of a soi1 more accurately than is

possible without the knowledge of this parameter. It is the main goal of this research

program to determine whether the chloride ion concentration can provide the

information that the variables already being tested in the standards do not provide. If the

answer is affumative, then this variable can be recommended for incorporation into the

existing grids, or a new grid be created to adequately account for the soil chloride ion

content.

The linear polarization test (an accelerated electrochernical test which can be

used to evaluate the corrosion rate) will be used to determine the soil corrosivity. This

method has been used extensively in the examination of steel corrosion in reinforced

c~ncrete['.'.~.~], and has recently been used in the investigation of soil corrosion 17]. in this

project, the variable obtained using the method of linear polarization is considered the

"true" corrosion rate of the pipe in the given soil, and it will be compared with the other

soi1 characteristics. The following soil characteristics are measured: soil type, drainage

ability, pH, oxidation-reduction potential, sulfide content, resistivity, and chloride ion

content. The above variables are analyzed using the statistical package SAS, and the

relationship between the soi1 characteristics and the "true" corrosion rate will enable the

analyst to determine the extent to which each soil characteristic predicts the actual

corrosion ratels1.

The objectives of this project are the following:

O To study the method of linear polarization (applications and limitations), and to

determine the extent to which it can be used in the field of soi1 corrosion.

O To become familiar with the AWWA and PACE standards for soil testing, and to

outline the limitations and advantages of each standard.

r To study the relationship between the soi1 characteristics and the corrosion rate of the

soil, and to determine which variables play the most important role in the corrosion

process. What is the role of the variables which are expected to be the rnost

influential? What is the importance of the chlonde ion content of the soi1 in

predicting the corrosion rate?

To determine whether the chlonde ion concentration provides information that the

variables already 'bcing tested in the standards do not provide and, if so, to suggest

that this variable be incorporated into the existing grids or that a new grid be created

to include this variable.

The report is divided into two main sections: the rneasurement of the soi1

characteristics, and the analysis of the collected data using SAS. Chapter 2 introduces the

basic phenornena underlying the corrosion process, and provides the background

information essential to understand the variables being studied and their role in the

corrosion process (Chapter 2: Corrosion and Corrosion Tesring). In Chaprer 3:

Procedures and Apparatus, the rnethods and equipment used to rneasure the vanous soi1

characteristics are presented. The statistical analysis of the experirnental data obtained in

Chapter 3 is presented in Chapier 4: Analysis of Erperimental Results, and the results are

discussed. Finally, conclusions and recommendations for future work are made in

Chapter 5: Conclusions and Recommendations.

CHAPTER 2: CORROSION AND

CORROSION TESTING

To fully understand the factors that conhibute to corrosion in a particular

environment, a thorough howledge of the various corrosion mechanisms is essential. A

sound knowledge of the basic principles will allow the corrosion engineer to predict the

aggressiveness of a given environment, to alter the environment to decrease its corrosivity

to a particular material, to protect the materials from corrosion, or to choose materials

which will not be affected by the existing aggressive environment.

The basics of electrochemistry with respect to corrosion of metals in aqueous

media are briefly reviewed, along with the information deemed essential to

understandiig the variables selected for this study, and their role in the corrosion process.

The second section of this report introduces the reader to the method of linear

polarization, and examines the principles underlying the determination of the corrosion

rate. The following section introduces the causes and effects of soi1 corrosion, and the

final section discusses the AWWA and PACE standards currently being used by the

industry to determine the corrosivity of soils.

2.1 Principles of Electrochernical Corrosion

2.1.1 Necessarv elements for corrosion

Corrosion can take various forms, and can occur under different circumstances.

However, there are certain constants in al1 corrosion processes. Four elements must be

present for corrosion to occur: an anode, a cathode, an electrical conductor, and an ionic

conductor [2.7e91.

The anode consists of a metal (Fe, Cu, etc.) which is oxidized in the presence of

an oxidizing agent, or a corrodent. The metal, denoted by M, undergoes the following

reaction:

M + Mn' +ne' Oxidation of metal M (2.1) Anodic Reaction

It is the anode that undergoes damage. The metal M dissolves, releasing ions (Mn') and n

electrons. Some examples of metal oxidations are:

The cathode can consist of a metal, or a solution nch in oxygen or hydrogen ions.

While the anode is undergoing oxidation, the cathode is undergoing reduction. During

reduction, the cathode or corrodent is consuming the electrons released by the oxidation

of the metal. The two corrodents that are of major importance are the acidic solution, and

the neutral aerated water (e.g. rainwater or sea~ater)[~-' .~]. The reduction equations are as

follows:

Acid solution: 2 H ' + 2 e ' + H z Reduction of H' (2.3a) Neutral Aerated Water: 112 0 2 + H?O+ 2 e' + 2 O K Reduction of 0 2 (2.3b)

The complete corrosion equation is obtained by combiniig the equation of the

oxidation of metal M with one of the above reduction equations. in an acidic

environment, the complete equation becomes:

A product of this reaction is hydrogen gas, which can often cause problems such as

hydrogen blistenng, or hydrogen embrittlement of metals [2.91.

in neutral aerated water, the complete reaction is as follows:

n i e term 112 O2 in the above equation refers to the dissolved oxygen present in the

water. Furthermore, the products of the above reaction often combine to form a

precipitate:

If the metal M represents iron (Fe), then Fe(OH)2 or m t is precipitated when oxygen is

the corrodent.

Another element essential for corrosion to occw is an electrical conductor, which

allows electrons to move fiom the anode, where they are released, to the cathode, where

they are consumed [2.7.91. If this movement of electrons cannot proceed, then the

reduction reaction would stop. Furthemore, the anode would now be negatively charged

due to the presence of the electrons released, and this disequilibnum would stop any

further oxidation and release of electrons [2.7.91.

In the case when a piece of metal is the site of both the anodic and cathodic

reactions, or when the two sites are located on separate pieces of metal which are in

physical and electncal contact with one another, then the metal itself is the electrical

conductor. However, if the two sites are found on separate pieces of rnetal, then any

metal wire connecting the two will act as the electrical conductor through which the

eiectrons will move [2.7.91.

The last essential element in the corrosion process is an ionic conductor, or the

electrolyte. The electrolyte, which is the aqueous solution in contact with both the anode

and the cathode, allows the movement of ions fiom the anode to the cathode thus

ensuring electrical neutrality and allowing the corrosion process to continue 12.7.91.

in summary, the f o u essential elements to the corrosion process are the anode, the

cathode, the electrical conductor, and the ionic conductor. The anode is the site where

damage occurs as the rnetal is oxidized and electrons and ions are released. The electrons

travel fiom the anode to the cathode via the electrical conductor, which is usually the

metal itself, or a metal wire connecting the two sites. The cathode is the site where

electrons are consumed while oxygen or hydrogen are reduced. As the ions move fiom

the anode to the cathode via an ionic conductor, which is an aqueous solution

simultaneously in contact with the anode and the cathode, electrical neutrality is

established. Corrosion cannot occur unless al1 of these four elements are present.

2.1.2 Phvsical forms of corrosion

Corrosion can take various forms. The most common forms of conosioii are the

following 12*91 :

Uniform attack

Galvanic attack

Crevice corrosion

Pitting corrosion

Erosion corrosion

Selective leaching

e Stress corrosion

2.1.2.1 Uniform Affack

Uniform attack is the most common form of corrosion, making up 80-90% of the

cases in practice [2.91. It is normally characterized by a reaction which proceeds uniformly

over the entire surface of the metal. Al1 points on the surface corrode at a sirnilar rate

because every point acts altematively as an anode and a cathode. There is not one fixed

point acting as the anode, therefore not one fixed point of deterioration. This form of

corrosion is easiest to predict, and can be prevented or slowed down most easily.

2.1.2.2 Galvanic Attack

Galvanic attack occurs wlien two different metals are placed in electrical contact

in a corrosive environment [2.9.'01. If the two metals are not in contact with one another,

they would each corrode at their own rate. However, when they are placed in electrical

contact, the more anodic of the two metals suffers accelerated corrosion (anodic reaction)

while the corrosion rate of the more cathodic metal decreases.

In order to determine which of the two metals will corrode, the electromotive

series, which is an ordered list of each elements accompanied by their reduction

potential, is consulted. Table 2.1 is a reproduction of this senes.

Standard Potenfial Elcctrode Reaction @(in volts) at ZS'C

AU" + 3e- = Au pi:- + 2e- = pi Pd2' + 2e'= Pd Hg:' + 2e- = Hg Ag- + e- = Ag Hg:" + Ze- = 2Hg Cu' + e- = Cu Cu" + te- = Cu 2H' + 2e- = H: Pb'- + 2e- = Pb Sn2' + 2e- = Sn Mo'^ + 3e- = Mo Ni:' + 2e- = Ni Co" + 2e- = Co n- + e- = n In3- + 3r' = In Cd:' + 2e- = Cd Fe" + 2e- = Fe Ga" + 3e- = Ga Cr" + 3e- = Cr C i ' + 2e- = Cr Zn2* + 2e- = Zn Nb" + 3e- = Nb Mn:' + 2e- = Mn Zr" + 4e- = Zr Ti:' + 2e- = Ti Al3' + 3e- = AI Hf" + 4e- = Hf U" + 3e- = U Be:' + 2e- = Be Mg" + 2e- = Mg Na' + e- = Na

1.50 Ca. 1.2

0.987 0.854 0.800 0.789 0.521 0.337 0.000

-0.126 -0.136

Ca. -0.2 -0.250 -0.277 -0.336 -0.342 -0.403 -0.440 -0.53 - 0.74 -0.91 -0.763

Ca. -1.1 -1.18 -1.53 - 1.63 - 1.66 - 1.70 - 1.80 - 1.85 -2.37 -2.71

Table 2.1 Elrctromotive Strier "'

8

A shorîcoming of the electromotive series is that is fails to take into account any

alloying, or the effect of the formation of protective films which occur in the various

environments. A more practical alternative to the electromotive senes is the galvanic

series which is specific to a given environment. Table 2.2 indicates the galvanic series

for seawater.

Acliw (Read down) Magnesium 18-8 stainlcss steel. typc 305 (active) Magnesium ailoys 18-8. 3% Mo slainless steel. type 316

(active) Zinc Lcad

Tin Aluminum 5052H Muntz metal Aluminum 3004 Manganese bronze Aluminum 3003 Naval b a r s Aluminum 1100 Aluminum 6053T Nickel (active) Alclad 76% Ni-16% Cr-7% Fe (Inconel 600)

(active) Yellow brass

Cadmium Aluminum bronze Red brass

Aluminum 2017T Copper Silicon bronze

Aluminum 2OXT 5% Zn-ZE Ni. Bal. Cu (Ambrac) 70% Cu-3m Ni

Mild steel 88% Cu-2% Zn-IWt Sn ~comoosition G-

Wrought iron bronze)

88% Cu-3% Zn-6.5% Sn-1.5% Pb tcomp.

Cast iron Nickel (passive) Ni.Resist 76% Ni-16% Cr-7% Fe (Inconel 600)

(passive) 1 3 5 Chromium stainlcss steel. 71% N i - 3 N Cu (Monel)

type 410 tactivel Titanium 18.8 stainless steel. typc 305 (passive)

50-50 lead-lin solder 18-8. 3% Mo stainless steel. type 316 (passive)

Nable (Read cp)

Table 2.2 Calvaniç Series for Seawater 12]

2.1.2.3 Crevice Corrosion

Crevice corrosion is highly localized, and reflects the site at which it occurs. As

the name implies, corrosion occurs at crevices (openings of about 1 mm), or at points of

contact between the two surfaces [2.91. The opening is suflicient to allow the corrodent to

enter, but not large enough to allow the corrodent to flow. Corrosion occm in two

stages, which are illustrated in Figures 2.la and 2.1b.

I I

Water .tapa.t , 1 WZ//

--

Figure 2.1 The Two Stages of Crevice Corrosion

In stage 1, unifom attack occurs in the crevice. However, afier some t h e the

stagnant water is depleted of the dissolved oxygen, and stage 2 begins. Within the

crevice, the reduction reaction cannot proceed because the dissolved oxygen is depleted.

However, the oxidation of the metal continues. The electrons released in the crevice

travel through the metal to a site outside the crevice where dissolved oxygen is present.

The result is that the crevice continuously acts as the anode and suffers corrosion, while

the remaining metal acts as the cathode and suffers no furîher damage.

The danger associated with this fonn of corrosion is that it is unpredictable, and

that the damage proceeds undetected because its location is well hidden. Furthemore,

the rate at which the crevice metal detenorates is quite high when the crevice area is

small with respect to the surface area in contact with the corrodent. This occurs because

the crevice metal (anode) must produce electrons at a rate to satisfy the demand of the

entire cathodic area.

2.1.2.1 Pifring Corrosion

Pining corrosion is a highly

localized form of corrosion. It

generally starts on horizontal

surfaces which can hold water under

gravity, and at a surface discontinuity

(scratch or dent), and grows

downward. As in crevice corrosion,

two local sites are involved [2.9."1.

The stagnant water within the pit is

depleted of oxygen, and the tip of the Figure 2.2 Pining Corrosion

pit becomes the anodic site. Electrons move through the metal to the surface of the metal

which is in contact with aerated water (cathodic site) and enables the reduction reaction to

proceed.

Pining is one of the most destructive forms of corrosion. It can cause equipment

to fail because of perforation and it can be extremely dangerous when it occurs on vessels

whose contents are under pressure. Furthermore, it can be difficult to detect because the

corrosion products ofien cover the pits, which continue to grow undetected. Figure 2.2

illustrates schematically a metal undergoing pining corrosion.

2.1.2.5 Erosion Corrosion

Erosion corrosion is normally associated with moving çlunies [2.91. Solids in the

slurry erode (or scrape off) the protective oxide layers which form on metal surfaces.

These protective surface films provide metals such as aluminum, lead, and stainless steel

with their ability to resist corrosive

environments. Corrosion occurs in the

areas where the protective layer has been

scraped off. The exposed metal is anodic

to the metal protected by the surface film

and, therefore, suffers corrosion as s h o w

in Figure 2.3. This fonn of corrosion is

usually accompanied by surface striations,

i.e. gooves following a distinct direction.

Anodic Siics m 1 Mctal Surface 1

Figure 23 Erorion Corrosion

2.12.6 Selecrive Leoching

Selective leaching is the removal of one element from a solid alloy. It occurs

when an alloy is composed of two elements far apart fiom one another in the

electrochemical series. The more anodic of the two metals will be the anode and will

suffer accelerated corrosion, leaving behind the more cathodic metal [2.91.

An example of a metal subject to selective leaching is brass which is made up of

copper and zinc. Zinc, the more anodic

metal, is "leached out" and the resulting

material is a porous copper matnx.

Another example of selective

leaching is the well known phenomenon

of graphitization of gray cast iron. Gray

cast iron is composed of a network of

graphite within a matnx of iron or steel.

Figure 2.4 shows the microstnicture of

gray cast iron. The graphite is in the

form of flakes connected in such a way

Figure 2.4 Mirrortruîturc of Gray Cas1 lroo "'

that the material is able to hold its shape as the iron dissolves 12.9.12s131. This dissolution

occurs because graphite is cathodic to iron and a galvanic ce11 develops. lron dissolves

Ieaving behind a porous mass consisting of graphite, voids and rut , which can be easily

cut with a knife. in contrast, the graphite in ductile or malleable irons is in the shape of

nodules or spheres, and a porous matrix cannot form. As such, these matenals are not

subject to graphitization.

2.1.2.7 Stress Corrosion

Stress corrosion is the result of the combiied effect of a weak applied or residual

tensile stress and a weak corrodent [2.91. Each of these two components alone would not

be problmatic, but together they accelerate the rate of corrosion. It has been observed

that, in most cases, no corrosion would occur when a metal subjected to a weak corrodent

is not subjected simultaneously to a tensile stress. Stresses f?om 5-70% of the yield

stress are sufficient to cause severe damage [2.91. Another point of interest is tha: the

corrodent is metal specific, i.e. not ail corrodent will affect al1 metals. For example, a

weak chloride solution will cause severe damage to stainless steels, but will not affect

plain carbon steel at all. In addition, a weak nitrate solution will damage plain carbon

steels, but will not affect stainless steel at al1 [2.91.

Like pitting corrosion, the crack starts at a surface pit or scratch, and moves

downward. n i e crack follows an anodic path. One example of an anodic path is that of

zinc in brasses, which is an alloy of zinc and copper. An anodic path can also be created

when an element of an alloy precipitates at either the grain boundary or within the grain

itself leaving one of the two areas anodic to the other. When the grain boundary is

anodic to the grain, the crack is said to be intercrystalline [2.91. When the grain itself is

anodic to the boundary, then the crack is said to be transcryçtalline 12.91. Figures 2.5a and

2.5b illustrate the difference between the two types of cracks.

When cracks begin to form, the reduced cross-sectional areas are unable to

withstand the design loads. Furthemore, solid corrosion products which often

accompany the corrosion process cause additional stresses by their expansive nature. As

the cracks grow under the combiied action of corrosion and stress, the tensile stress in the

uncracked section grows exponentially and can lead to sudden unexpected failures '2,91.

Figure 2Ja Intcrcrystnlline Crack Figure 2.Sb Tnnscrystallinc Crack

2.1.3 Whv do metals corrode?

The electrochemical series indicates whether a metal is more anodic compared to

another, and it can provide the potential 4 of a reaction. But what does this potential

represent and why does a metal corrode in the first place?

Corrosion of a metal occurs because ofthe element's tendency to attain the natural

state, which is the ionic form. The metallic form of most elements is unstable, and there

is a potential for these metals to be oxidized:

M + Mn+ + ne' +oxidation (2.1) unstable + nahxal state. stable ore

This potential c m be compared to the potential energy of a sphere when held at an

elevated position 19'. As seen in Figure 2.6, at position 1 the sphere equilibrium is

unstable and it possesses potential energy. Some ofthis energy will be used up as the bal1

moves to position 2, a point of lowcr potential energy. This is the spontaneous direction

for this particular system. Movement fiom position 2 to position 1 would not occur

spontaneously in nature. Energy fiom an extemal source must be provided for such a

movement to occur.

Figure 2.6 Stable and Unstable Positions standard most often used is the reduction

of hydrogen ions:

2H'+2e-+H2 where 4=0.000V (2.3a)

Similarly, electrochemical

reactions are accompanied by a potential

4, indicating the potential for the reaction

to proceed spontaneously. It must be

noted absolute that value, the potential but a relative 4 is not one. an

Potentials of reactions are always

measured with respect to a standard. The

By convention, the value of the potential of this equation is chosen to be equal to

zero volts, and the potential of other reactions are measured against this standard.

Another convention adopted is the used of reduction potentials, &,d, instead of oxidation

potentials, 4,,, in tables such as for the electrochemical and galvanic series. To obtain

the value of a particular oxidation reaction, the value of h e d is simply multiplied by -1.

For example:

fl I

Position 2

The potentials listed in Tables 2.1 and 2.2 are termed half-ce11 potentials, because

they accompany only half of the overall reaction. A complete reaction is made up of two

reaction halves. One reaction-half is a reduction reaction, and the other is an oxidation

reaction. For exarnple, for the following two reaction halves:

It is useful to determine which of the two reaction-halves will be reversed such

that the potential of the entire system will be non-negative, i.e. proceed spontaneously. It

is easily noted that if Equation 2.7a is reversed, the total potential of the system will be

equal to 4, + 4IrCd = -(-0.440 V) + 0.000 V = 0.440 V, which is positive. The system

will spontaneously behave according to the following equation:

When the two reaction halves are combined, the one with the srnaller reduction potential

will be reversed and the element will undergo axidaiion.

Retuming to the two most conunonly encountered corrodents, the acidic solution

and neutral aerated water, it is evident from the electrochemical senes that the reduction

potential of both hydrogen ion reduction and oxygen reduction is higher than most metals

of interest to engineers:

The combination of one of the above corrodents with a metal whose reduction potential is

lower than that of the corrodent will result in the oxidation, or corrosion, of that element.

2.1.4 Determinina the Rate of Corrosion

Examination of the electrochemical or galvanic senes enables one to determine

whether or not a metal will corrode in a given environment. But of more interest to the

corrosion engineer is the determination of the raie at which this corrosion will proceed.

Corrosion rates are determined by studying the polarization behavior of the two

reaction halves. As seen previously, the two reaction halves are the following:

ANODIC REACTION: M + Mn+ + ne- Oxidation of metal M

CATHODIC REACTION:

Acid solution: 2H++2e '+H2 Reduction of H' OR Neutral Aerated Water: 112 0 2 + HzO+ 2 e' + 2 O K Reduction of Oz

In order to fully understand the above corrosion system, an analogy will Grst be

made with the copperlzinc banery [2.7.9.'48'51. AS seen in Figure 2.7, a CdZn banery is

made up of a copper rod immersed in a solution of CU'' ions (a solution of CuSOd), and a

Zn rod immersed in a solution of 2n2' ions (a solution of ZnSO4). The two solutions are

c o ~ e c t e d by a diaphragm which allows the passage of ions, enswing electncai neutraliiy.

From the electromotive series, it is observed that the reduction potential of Zn is lower

than that of Cu, and therefore Zn will be anodic to Cu and suffer oxidation. The two

resulting equations are:

Zn + Zn2' + 2 e- Oxidation +& = 0.76 V (2.9a) cu2+ + 2e- + Cu Reduction = 0.34 V (2.9b)

Pnor to electncal contact

of the metal rods, the two

separate systems are at

equilibnum, and no corrosion is

occurring. Once the two metals

at different potentials are placed

in electncal contact, the system

will attempt to reach a point of

equilibrium at a potential

somewhere between +a and +c, . The driving force of (+c, - OZ,, ) volts will cause Zn ta be

oxidized, and copper to be

reduced according to the above

R 4 Control of resistance T"'_rl

I

Figure 2.7 Schematic o l a CulZn Banery

equations.

When Zn is oxidized, electrons and 2n2' ions are released. The electrons travel

through the wire to the surface of the copper rod, where they combine with the cu2+ ions

from solution, to form Cu. As the electrons travel through the wire, a current is registered

by an ammeter.

In order to study the variation of the potential with current, the system is

manipulated by varying the

current permined to flow

through the wire, via various

resistors. Figure 2.8 is

obtained by ploning the

potential of Cu and Zn venus

the registered current [2.91.

Three distinct points

on the diagram are of interest:

1. the open circuit at I=O,

2. a point of restricted

current flow, and

3. the short circuit at I=Imax.

log l

Figure 2.8 $ vs. log 1 for CuRn Battery

1. Open Circuit

The point on the diagram representing an open circuit is at I=O, i.e. when no

current flows. This represents the behavior of the system when electrical contact is not

provided, and the two metals behave independently. It is observed that the potential of

each of the metals is the standard Nernst potential which is defined as [2.91:

I$N = $NO + 2.303 RT/nF * log [a&a,d] (2.10)

where $N = Nernst potential

$NO = Standard Nernst potential (equilibnum potential of metal in

contact with its own ions, at unit activity)

R = Gas constant (8.314 Jtdeg mole)

T = absolute temperature (K)

n = number of electrons transferred

F = Faraday's constant (96500 Cleq)

&% = activity, or concentration, of oxidized species

a,,d = activity, or concentration, of reduced species

For the CdZn banery, the Nernst potential is calculated for both the Zn and Cu electrode.

The denvation of the following equations c m be found in Appendix A.

For the reduction of Zn at 25 OC: Zn -t zn2* + 2 e- , Equation 2.1 O becomes:

= $ N ~ O + 0.059212 log [zn2'] (2.1 1)

For the reduction of Cu at 25'C: Cu -t cu2+ + 2 e-, Equation 2.10 becomes:

4h.c" = ~ N C " ' + 0.059212 log CU^'] (2.12)

2. Point of Restricted Flow

Between 1=0 and I=Imax, the current is manipulated to flow at a predetermined

rate. Using resistors, the current is allowed to vary and the potential of each metal is

measured and ploned versus the current. From the diagram, the change in potential for

e x h metal, termed the overvoltage q, can be calculated as 12s91:

f ) ~ u = $8 - $NU (2.13)

W n = $b - $ ~ z n (2.14)

3. Short Circuit

The point of short circuit is the point when the current is allowed to fiow

unresinctedly, i.e. the resistance R=O and the current FI,. The current is govemed only

by the potential difference of the system. This situation is the fiee corrosion situation.

The equilibrium potential is called the free corrosion potential, &,,,,, and the current

associated with 4,, is the corrosion current, I,,.

The above analogy can serve to better undersland the two corrodents îhat are most

commonly encountered by corrosion engineers:

the reduction of H' ions in an acidic solution, and

the reduction of 02 in a neutral aerated solution.

When a metal M is placed in an acidic solution, the following reactions occur:

Since the reduction potential of H' is larger than the reduction potential of most

metals of interest, the metal will undergo oxidation (anodic reaction).

diagram will result:

Figure 2.9 Poiarization Diagram for Corrosion in Acidic Solution

The following

When the system is allowed to corrode freely, the potential $,, and the corrosion

current I,,will apply. Fwthemore, the Nernst potential of H' reduction becomes [291:

It is very interesting to note that, in the case of corrosion in acidic environments,

the Nernst potential depends only on the temperature and on the pH. n i e derivation of

the above equation can be found in Appendix A.

When a metal M is placed in neutrol oerored water, e.g. rainwater or seawater, the

following reactions occur:

Since the reduction potential of O2 is larger than the reduction potential of most metals of

interest, the metal will undergo oxidation (anodic reaction). The following diagram will

result :

Figure 2.10 Polarizalion Diagram Tor Corrosion in Neutra1 Aerated Water

When the system is allowed to corrode freely, the potential $,, and the corrosion

current Ic,,will apply. Furthemore, the Nernst potential of O2 reduction becomes [2.91:

4 ~ 0 2 = 4 ~ 0 2 ' + 2.303 RTl4F log {PO~/[OHJ~}

where PO2 = partial pressure of oxygen in the solution.

In the case of corrosion in neutral aented water, the Nernst potential depends on the

temperature, pH (or pOH), and the partial pressure of oxygen in the water.

2.1.5 The Exchanee Current Densitv. 1,

A term that appeared ofien in the previous polarization diagrams was i,, which

represents the exchange current density. When a metal is in equilibrium with its o u n

ions, the Nerst potential, $N, and exchange current density, i,, apply 12.91. An example of

this is a Cu rod placed in a solution of CU" ions (a solution of CuS04). The equilibrium

reached is a dynamic one. Although no changes are visible to the naked eye, reduction

and oxidation of the metal are taking place at equal rates. This rate is termed the

exchange current density, IO. Electrons travel through the metal from the anodic to the

cathodic sites, which are continuously changing locations.

In reduction

reactions such as H' and

0 2 reduction, the exchange

current density, i,, is very

sensitive to the condition

of the metal surface.

Furthemore, the corrosion

current, I,,, is highly

dependent on the value of

i,. As it is s h o w

schematically in Figure

2.1 1, Ison increases as i,

increases. Consequently,

the rate at which a metal

log 1

Figure 2.11 Dependence oll,, on the value 01 i.

22

will corrode varies as the surface preparation of the metal varies 12*91. When testing metal

samples to obtain Ln, it is very important to ensure that the surfaces of the samples are

prepared consistently, so that variations in 1. will not introduce errors in the

determination of Ln.

2.1.6 Determination of I,,&,,

As mentioned previously, the values of L, and 4,, are obtained fiom the

intersection point between the anodic and the cathodic lines of polarization diagrams.

Up to this point, the diagrams are similar in that each of the two lines is represented by a

straight line. This will be correct in approximately 90% of cases in which activation

polarization behavior governs 12.91. However, an altemative to this situation warrants

some attention. This behavior is termed concentration polarization. The shape of the

polarization lines are determined by either concentration or activation polarization.

For the sake of compatibility in calculations, the term I,, is used instead of

Lon. The term r,, represents the corrosion current demity, and it is diiectly proportional

to Lon, the corrosion current. in fact, Ln = r,, * A, where A is the surface area of the

anode.

2.1.6.1 Activation Polarizotion

Activation polarization, or Tafel behavior, makes up 90% of the cases, and it

occurs when the rate of a reaction is controlled by the slowest of the steps in the reaction

sequence, Le. the electrochemistry of the system govems the rate t2.91. This behavior

occurs in well stirred solutions, where the reaction rate is not limited by the speed at

which a slow species can move through the solution.

In activation polarization, both the reduction and the anodic reactions display

Tafel behavior, i.e. both behave linearly. The Tafel equation relates the overvoltage q to

the current density r by the following equation:

? = P 1% (i/b) (2.17)

where p = Tafel constant, or Tafel dope [2s91. The values of P have been tabulated for the

vanous metals in different media. Table 2.3 shows some typical values.

Metal 1 Temperatore (OC) Solution

1N HC1

0. IN HC1

0.1N HC1

1N HCI

0. IN HC1

2N Hzso4

1N HC1

1N

0.01-SN HCI

Table 2.3 Typieal p Values l4

Figure 2.12 shows a typical diagram in which activation polarization govems. In

order to solve for i,,, the following two equations are solved simultaneously, and the

values of $,, and i, are obtained:

I O' lcon

log 1

Figure 2.12 Activation Polarizntion Diagram

2.1.6.2 Concentration PolarCation

In concentration polarization, only the reduction reaction is affected. The

oxidation reaction exhibits Tafel behavior as it did in the case of activation polarization.

A typical diagram is s h o w in Figure 2.13.

O I c o r r = l ~ log 1

Figure 2.13 Conecntration Polarization Diagram 12'91

Concentration polarization usually governs in cases where the solution is stagnant.

The rate of the reaction depends on how quickly certain species are capable of diffusing

îhrough the stagnant solution, towards the metal surface where corrosion occurs [291. For

example, in corrosion due to H' reduction where the solution is stagnant, the initial

condition is represented by Figure 2.14a. However, as H' reduction occurs at the surface

of the metal, H' ions are used up. As a consequence, a thin boundary layer is formed in

which the concentration of H' ions varies iÏom the concentration H' in the buk solution,

[H+]b, 10 zero. This concentration gradient causes ions to diffuse towards the surface

where they are then consumed by the corrosion process. Figure 2.14b illustrates the

boundary layer in question.

0 1 , Distance from metal surface

IH' I . 1- *

O Distance from metal surface

Figure 2.14 Distribution of H' ions in Time

The rate at which certain species (in this case the H' ion) are able to diffuse

through the boundary layer will govern the rate of the corrosion reaction. in

concentration polarization, it is said that mass transfer controls the rate of the reaction.

The maximum expected current in concentration polarization cases is called the limiring

current, i,. As it can be seen in Figure 2.13, when concentration polarization

governs, I,, is srnaller than it would be if activation polarization governed, i.e. if the

solution had been well stirred.

The limiting current densities for the reduction reaction can be calculated from the

following equation [2.91:

11. = knF [arc& (2.1 9)

where k = mass tmnsfer coefficient (cmlsec)

n = number of electrons transferred in the reduction

F = Faraday's constant (96500 Cleq)

[a&, = activiîy, or concentration, of the reduced species

2.1.7 Effect of Varvine Parameters Using Polarization Diaerams

Polarization diagrams have many uses. They help one visualize electrochemical

phenomena which would otherwise be quite abstract. Polarization diagrams also help

visual understanding and prediction of the effect of varying certain parameters

influencing the corrosion rate in the two corrodents of interest, acidic solutions and

neutral aerated water. The parameters discussed in the following sections are:

r POz and Hi concentration,

r 0 2 solubility,

r multiple corrodents,

r passivity,

r galvanic anack, and

r chloride content.

2.1.7.1 PO2 and Concentration

In cases of corrosion in neufral aerafed water, the value of PO2, the partial

pressure of oxygen (1 atm for pure oxygen, 0.4 atm in air), affects only the value of the

Nernst potentiai, 4 ~ 0 2 12.91:

h 0 2 = h O 2 O + 2.303 RTl4F * log { P O ~ / [ O H ~ ~ }

As the value of PO2 increases, so does the value of bol , and this causes the cathode line

to shift upwards. This is illustrated in Figures 2.15a and 2.15b, for activation and

concentration polarization, respectively. In the case of activation polarization, the results

of increasing PO2 are as follows: - increase in I,, , increase in $,,,,

0 no change in 10,

no change in the anode line.

The above results also apply in the case of concentration polarization because, as

P02 increases, so does the value of [02]b. and this leads to an increase in IL. n i e verticai

line is therefore shifted to the nght.

In cases of corrosion in acidic solutions, as the concentration of H' ions increases

and the pH decreases, the Nernst potential $NH+ increases ($NH+ = -2.303 RTE * pH).

This results in the reduction lines shifting upwards. In the case of activation polarization,

the results of increasing the concentration of H' are as follows:

increase in I,,, ,

increase in $,,,,

0 no change in I,,

no change in the anode line.

The above results also apply in the case of concentration polarization, because as

the value of [H+Ib increases, the value of IL increases as well, and the vertical portion of

the reduction cuve is shifted to the nght.

increases

increases PO, = l atm (bubbling O,)

PO, = 0.21 atm I (air wtunted water) I I

log 1 - 1 con lncreases

m m 1 increases

1 I I I - log t

l corr InCrePSeS

Figure 2.15 Elleel of Vsrying PO1

This parameter must not be confused with P O 2 studied previously. in this case,

the PO2 is kept constant, but the solubility of Oz varies dependiig on the presence of

impurities such as chloride ions in the aqueous medium [2.91.

The solubility of 0 2

affects only the cases of

NaCl content, which I I

corrosion in neutral aerated

water, where concentration

polarization govems. The

solubility of 0 2 varies with

chloride content as illustrated in

corresponds to typical seawater Figure 2.16 Variation olOl Solubility with NaCl Concentration

[2.91. Assuming a constant PO2,

O, Solubility 16, \ Fresh :

[OH] and temperature, an increase in the concentration of dissolved 0 2 in the water

Figure 2.16. The 0 2 solubility is % NaCl

highest at approximately 3% 1 O - 3%

causes an increase in IL (1~=knF[02]b ). Consequently, this will result in an increase in

2.1.7.3 Multiple Corrodents

Situations where a metal is subjected to the effects of more than one corrodent are

not uncommon. For example, acid rain is a corrodent rich in both oxygen and H+ ions.

in such a case, two cathodic reactions occur simultaneously 12.91:

However, only one anodic reaction is involved:

M +M''++neo

Figure 2.17a illustrates the situation of multiple corrodents in cases where

activation polarization governs. The terms ta and ib on the polarization diagram

represent the current density that would apply if one corrodent was acting at a time. In

situations of multiple corrodents acting simultaneously, a new reduction l i e must be

drawn. This line is constructed by addiig i , and ib at any given value of 9. This line is

used to determine the actual current density existing at the metal surface.

The value of t,,, the total current density of the oxidation of the metal, is equal

to IO>+ t ~ + , the current densifies of the reduction of 0 2 and H', respectively. Figure 2.17a

shows clearly that when a second corrodent is introduced in a system, the value of i,,

and 4,, increase. Another interesting point to note is that the reduction of 0 2 h a a

higher contribution to t,, that does the reduction of H' ions.

In cases where concentration polarization govems, the result of addiig a second

corrodent to a system is to increase both t,, and 9,, . This conclusion is more easily

reached when studying the polanzation diagram in Figure 2.17b. The new line

representing the situation of multiple corrodents has, as before, a constant value of i,,

rotai. which is equal to 102 + i ~ + .

0 0

log t 1 con H4 1 con O* l con tau1

I I I I

! ! ! log t O ~ r o n K I c o n 0 , Iranuitai

Figure 2.17 Elleet of Multiple Corrodentr 19]

2.1.7.4 Galvanic Atrack

Galvanic ûttack occurs when two metals are placed in electrical contact in the

presence of a corrodent. In this case, there is one cathodic reaction (reduction of H' or

0 2 ) , and two anodic reactions:

The resulting corrosion system is illustrated in Figure 2.18. If only metal Mi is

present, lines 2 and 4 would apply and i ~ i would result, while if only metal M 2 is present,

Figure 2.18 Galvanic Attark 19'

lines 1 and 5 would apply and I ~ Q would result [91. When both metals are involved, then

lines 3 and 6 would apply. Lines 3 and 6 are obtained by adding the value of the curent

densities of lines 1 and 2, and lines 4 and 5, respectively. It can be seen that when two

metals are involved, the rate of corrosion of the more anodic of the two metals, Ml, will

increase and the rate of corrosion of the more cathodic of the two, M*, will decrease.

Metal Ml is said to suffer accelerated corrosion, or galvanic at ta~k[*~].

Many metals, such as Fe, Cr, Ni, Ti, and Al, exhibit passivity in various

conodents. Passivity is the formation of a protective oxide layer on the surface of the

rnetal which causes it to corrode at a much slower rate than that predicted by Tafel

behavior [91. Figure 2.19 illustrates a typical polarization line of a metal which exhibits

passivity. Three distinct regions can be discemed: the active region, the passive region,

and the transpassive region.

Figure 2.19 Polarization Diagram of a Metal Exhibiting Passiviiy 12.91

The active region, considered so far, is the region limited by the Nernst potentk il,

4NM, and the passive potential, 4,,. The currents in this region vary between the exchange

current density, L,, and the cntical current density, 1,. In this region, the metal exhibits

siandard Tafel behavior.

The passive q i o n is the region limited by the passive potential, $,,. and the

transpassive potential, 6,. In this iekion, the current is equal to the passive current

density, i,, and does not vary with potential.

Passivity is due to the adsorption of Oz onto the metal surface. This adsorption

occurs at potentials between Op, and $,,, at which point passivity begins to breakdoun.

As it can be seen in Figure 2.19, the lower is the value of i,, the lower is the value of I,,

obtained when the intersection of the two polarization lines occurs within the passive

region.

The transpassive region is the region where the potential is higher than O,,. The

breakdown of passivity begins at $,,, when the adsorbed layer of Oz is no longer stable

and begins to disintegrate Iz9]. The value of the current is not constant in this region, but

increases with increasing potential.

2.1.7.6 Chloride Content

The effect of the presence on chloride ions in a solution, and to a lesser extent

halogen ions, is to increase the value of the exchange current density, 1,. of the metal in

the given solution, and to breakdown its passive layer Iz1.

Chloride ions break down, andlor prevent the formation of a passive layer in

metals such as Fe, Cr, Ni, Co, and stainless steels. The passive layer forms due to the

absorption of oxygen onto the metal surface. When chlorides are introduced into the

solution, they compete with 0 2 for absorption Iz1. Unlike the adsorbed Oz which causes

the rate of the metal dissolution to decrease, chloride ions favour hydration of the metal

ions and therefore increase the rate of dissolution I2l.

The value of the potential of the system will determine whether Oz or C1' ions will

be adsorbed, Le. whether passivity will form or breakdown. Below a certain potential,

chloride ions cannot displace the adsorbed OZ and the passive layer will remain stable and

corrosion will be negligible. This potential is termed the cntical potential ['l. At

potentials higher (or more noble) than the critical potential, CI- ions are capable of

displacing adsorbed 02, thus destroying the passive layer.

Breakdown of passivity occurs locally and is not spread out uniformly over the

metal surface. Destruction of the passive layer tpically starts at a point of discontinuity

in the passive film. The result is localized attack and the formation of pits [21. This

combination of snall anodic area, the pit, and large cathodic area, the remaining metal

surface, results in a situation of accelerated corrosion. Furthermore, the higher the current

flow at any pit, the less likely that other pits will form nearby, i.e. the number of pits per

unit area is smaller for deeper pits than for shallower ones 12]. An effective inhibitor for

Cl- ion anack is the addition of extraneous anions to the solution. Species such as N O j

and SOJ', which will not break down the passive layer, compete with Cl- ions for sites on

the passive film and, consequently, inhibit the formation of pits ['l.

The effect of Cf ions can be so pronounced that in some cases stainless steels,

which are known for their resistance to most corrosive environments, have been obsewed

to corrode at rates similar to those of metals that do not exhibit passivity at al1 ['l.

2.2 Measuring Corrosion Rates

In this section, the theory behind the corrosion rate measurements is outlined. It is

on these basic principles that corrosion-measuring equipment are developed. Essentially,

there are two methods used to obtain the corrosion rate electrochemically: Tafel

Extrapolation, and Liiear Polarizztion [9s'61.

A metal which is exposed to a corrodent such as an acidic solution or neutral

aerated water will acquire a certain potential, 4,. This can be seen on the polarization

diagram of Figure 2.20a that at this potential, the current resulting fiom the metal

oxidation is equal in magnitude to the current feediig the reduction of the corrodent, i.e.

at this point of equilibrium, the electrons are being produced and consumed at the same

rate. This current is termed the corrosion current density, i,,.

If the system is manipulated such that a potential 4, other than +,,, is applied,

then the anodic and cathodic currents, i, and i,, will no longer be equal and a net current,

i, will flow. Figures 2.20b and 2 . 2 0 ~ illustrate this point. When the potential increases

above +,,, then the cment leaving the anode will increase, causing the metal to dissolve

more quickly. This phenomenon is called anodic polarization [9.'61. Conversely, if the

potential is decreased below O,, then the cment leaving the anode will decrease and the

metal will dissolve at a slower rate. ïhis phenomenon is called cathodic polarization[9.161.

if the imposed potential is varied and each value is plotted against the logarithm

of the resulting current, a curve resembling Figure 2.21 would be obtained. The section

of the curve below $,, represents the region of cathodic polarization, and the section

above it represents the region of anodic polarization. When the potential is equal to g,,, no net current is expected to flow.

The above theory forms the basis of the two methods used to determine corrosion

rates electrochemically: Tafel Extrapolation and Liear Polanzation.

COLT - - - - - - - - - - - - -

1%

' I 1 I log 1

lc lcorr la

, log t

la lcon lc

(cl

Figure 2.20 variation of 1. and 1. wilh Potential E$

38

I Anodic Polarization ofmetai M /

Cathodic Polarization K log t

Figure 2.2 1 Tafel Curve

2.2.1 Tafel Extrapolation

In Tafel Extrapolation, corrosion rates are measured using data obtained by

polarizing a metal sample cathodically and then anodically. The very simplified

schematic diagram in Figure 2.22 illustrates the typical setup.

The metal under study is called the working electrode. It is placed in the

corrodent along with the awiliary and the reference electrodes. The auxiliary electrode is

usually made up of an inert metal, such as graphite or platinum. The purpose of this

electrode is to act as either a source, in the case of anodic polarization, or a sink, in the

case of cathodic polarization, for the resulting current i. The reference electrode

measures the potential 4 of the metal, and a potentiometer records these values.

Simultaneously, an ammeter records the current flow to or fiom the working electrode.

Finally, a potentiostat is used to impose the desired potential on the system.

Figure 2.22 Schematic of Setup for Tnfel Test ['61

The first step, prior to polarizing the metal sample, is to determine the value of

&,,. The metal sample is placed in the corrodent, and the potential is allowed to attain its

equilibriurn, and the anodic and cathodic reactions are allowed to proceed undisturbed.

There is no net flow of electrons, i.e. ia = ic= i,, and $ = $,,. This potential is called

the opepl circuit corrosionporenrial, and it is measured by the reference electrode.

Once the value of $,,, is recorded, the potentiostat then imposes a potential of

$con-A4. This situation is represented by point a in Figure 2.23. The potential remains at

$,-A4 for a specified amount of t h e , and the value of the resulting current, a, is

recorded. The potential is then increased by a predetermined increment, $,,, and the

resulting current is again ploned at this new value 4 value. This continues until the

potential reaches 4, + A$, and thus al1 potential values between $-A$ and $,,,+A$

have been scanned. The result of ploning the imposed potential versus the logarithm of

the resulting current is the complete curve illustrated in Figure 2.23.

Figure 2.23 Tale1 Curve Obtained by Varying @,., Ig1

Another mical curve is illustrated in Figure 2.24. At low currents, this curve is

non-linear. However, the two branches of the curve become linear at higher current

values. This region of linearity is called the Tafel region. The slopes of the cathodic and

anodic polarization lines in the Tafel regions are termed P, and Pa, respectively. The

value of Ag can range 60x11 50 to 250 mV, or more. Typically, the Tafel region begins at

+,, f. 50 mV, and ends when the various phenornena cause the linearity of the curve to

be lost, e.g., the potential attained encourages the formation of a passive layer and the

cuve suddenly continues vertically upward (current does not increase with increasing

potential) f9.'61.

The value of i,, is obtained by extrapolating the Tafel regions back to the

corrosion potential, g,,, where the two l i e s intersect. Figure 2.25 shows the intersection

of the two dashed lines at a point where $ = $,,, and 1 = I,,. Once the value of i,, is

known, the corrosion rate in mm/yr. can be computed.

Tafel ',

O log i

Figure 2.24 Tafel Regionr 19J61

Figure 2.25 Obînining i,, from the Tafel Curve 19"q

2.2.2 Linear Polarization

An alternative to Tafel Extrapolation is the method of Linear Polarization

which has been studied extensively to date. The procedure is the same as that for Tafel

Extrapolation with the following exceptions 19.'?

O n i e value of A$ is approximately 10-10 mV,

O The values of p, and p, are not obtained automatically, but must be knom or

estimated before hand,

O The data points obtained during polarization are ploned on a linear-linear paph, and

not on a linear-log plot.

In the method of Linear Polarization, once the value of $, is recorded, the

potential is dropped to ($,,, - 20 mV). It is then raised incrementally up to a potential of

($con + 20 mV), and the current is recorded at each step. The $ values and corresponding

i values are ploned on a linear scale and the resulting graph resembles Figure 2.26.

4 corr + AI$

4 corr

4 corr - A$

Figure 2.26 Linear Polarization Curve

Under these conditions of slight polarization, Le. with A+ i: 20 mV, the potential

varies linearly with the resulting current. Stem and Geary (1957) derived the following

relationship to obtain the value of I , , [~~ '~- '~ ' :

where the term (A+ 1 Ai) is also called the polarization resistance, %, given in ohms.

The values of p, and p, can be either determined by the method of Tafel Extrapolation, or

it can be estimated. The value of i,,, is determined by the Stem-Geary equation, and the

corrosion rate in mmlyr. can then be computed.

2.3 Soi1 Corrosion and I ts Effects on Underground Infrastructure

This section deals with the principles of soi1 corrosion and its effects on the

underground infrastructure. Underground pipelines make up the greatest proportion of

the metals threatened by soi1 corrosion. The various mechanisms of soi1 corrosion are

outlined and explained from an electrochemical perspective.

The deterioration of metal pipelines in soils can be due to many phenomena. The

most important ones are the following:

the formation of differential aeration cells,

r galvanic attack,

selective leaching, and

r stress-corrosion cracking.

2.3.1 Differential Aeration Cells

When a pipeline is exposed to conditions which vary along its length, it can be

subjected to variations in the 0 2 exposure 112.91. This results in potential differences and,

consequently, the corrosion in the pipe section located in the area of low 02content.

A situation which is often faced is a pipe which encounters different soi1 types

dong its path. Different soils have different porosities and therefore different 0 2

contents. For example, clays typically have very low porosities and, consequently, low

O2 concentrations. On the other hand, sands are highly porous and well aerated, and

generally contain higher levels of 02. When a pipe rüns through both of these soils, a

corrosion cell is created. ï h e section of pipe located in the clay will have a lower

potential (since the O2 concentration is loaer) than the section in the sand. As a result,

the section in the clay will be anodic to the section in the sand, and corrosion will occur

in the pipe located in the clay. n i e pipe itself will serve as the electrical conductor

allowing electrons to move from the anode to the cathode, and the groundwater will serve

as the ioNc conductor. The circuit is completed, and localized corrosion will proceed at

an accelerated Pace I I 1 .

A similar situation may be created when a pipe passes under a paved surface, such

as a parking lot or a street ['l. The soi1 beneath the paved surface generally has a lower

oxygeri content than does the soi1 beneath the unpaved surface, which is more readily

exposed to air and oxygen-nch rainwater. A corrosion cell is therefore set up with the

pipe beneath the pavement being anodic to the surrounding pipe. Once again, the pipe

itself acts as the electncal conductor, and the groundwater as the ioNc conductor.

Another cause of differential aeration cells is the improper installation of new

pipes[71. Pipes are usually rested directly on undisturbed soi1 and then covered with

relatively loose backfill. The backfill is generally more permeable than the compacted,

undisturbed soil, and will contain higher concentrations of oxygen. A cell is, therefore,

formed with the pipe bottom being anodic to the pipe crown. Electrons move through the

pipe itself, from the bonom to the more aerated crown, with the groundwater acting as the

ionic conductor. This explains why most corrosive attacks on pipelines occur on the

bonorn 114 of the pipe.

2.3.2 Galvanic Attack

Another very common rnechanism of soi1 corrosion is the phenomenon of

galvanic attack. As it was descnbed previously in Section 2.1.2.2, galvanic attack occurs

when dissimilar metal are placed in electrical contact, and exposed to a corrosive

environment. The more anodic of the meials suffers accelerated corrosion, while the rate

of corrosion of the more cathodic metal decreases

A common example of galvanic attack is the corrosion of steel (iron) water and

gas mains at the point of contact with the copper pipe services [Il . Copper, being cathodic

to iron, will result in the iron pipe to suffer accelerated corrosion. Luckily, this situation

does not cause too much damage because the area of the anode (the iron pipe) is much

larger that the area of the cathode (the snialler copper line), and the corrosion is spread

out over a large area.

Galvanic attack can also occur when a new pipe is placed in electncal contact

with an old pipe, even if the pipes are niade of the same material [Il. At first glance,

galvanic anack may not be suspected because the matenals are not different. However,

over the years a protective surface film has formed on the surface of the old pipe,

providing passivity and resistance to corrosion. The old steel is therefore cathodic to the

new steel, which will suffer accelerated corrosion when the pipes are in contact with one

another. Before long, the new pipe may be in worse condition than the old one, leading to

the erroneous conclusion that the pipe material itself is to blame. This situation is often

encountered when the capacity of a water pipe is insufficient and an additional water pipe

is laid parallel to the old one and the two are connected by cross-overs. The old pipe is

the cathode, the new pipe is the anode, the metallic cross-over is the electrical conduc:or,

and the groundwater is the ionic conductor.

Another example of galvanic attack is the accelerated corrosion of iron pipes

placed in contact with a soi1 containing cinders [ I l . Cinders are essentially made up of

carbon, and are therefore cathodic to the iron pipe. The potential difference between the

two metals is in the range of 0.8 to 1.1 V, which can cause very senous damage to the

pipe.

2.3.3 Selective Leaching

Selective leachiig, as described in Section 2.1.2.6, is the removal of one element

îÏom a solid alloy. This occurs because the alloy is composed of elements whose

potentials are very different, resulting in the more anodic of the two beiig "corroded",

leaving behiid a porous mass consisting of the more cathodic element.

An example of this is the graphitization of cast iron pipes [Il. Cast iron is

composed of graphite flakes within a matrix of iron. Graphite is cathodic to iron,

therefore a galvanic ce11 exists. As iron dissolves, it leaves behind a weak porous

material which is characterized by a dark gray color.

2.3.4 Stress-Corrosion Cracking

Stress-corrosion crackiig (SCC) results when a metal is subjected to a

combination of weak corrodent and a weak tensile stress [1s2.91. As described in Section

2.1.2.7, failwe c m appear quite suddenly because no general surface corrosion is

apparent.

An example of localized stresses in buried pipes is "cold bendiif of pipes ['l.

When underground pipes are manufactured, they are often subjected to "cold bending" to

produce bends. f i s c m result in significant residual stresses forming at the bends of the

pipes. Also, the pipe c m be subjected to localied stresses when they are forced into

alignment once placed in the ground. These forces are suficiently large to cause serious

SCC problems. The weak corrodent is usually neutral aerated groundwater, a weakly

acidic groundwater. The result is the accelerated corrosion of the pipe in the areas where

the pipe is subjected to tensile stresses.

2.4 Standards for Determining Corrosivity of Soils

The majority of the standards for determining soil corrosivity were designed to

respond to a particular need, and as such, many different standards have been developed

in North America, France, and Germany. Typically, the variables tested are the same,

although the testing proceedure may vary. Two standards which are used extensively in

Quebec are AWWA Cl05 and PACE 82-3. ïhis project focuses on these two standards.

2.4.1 AWWA Cl05

The Amencan Water Works Association (AWWA) Standard was designed to

assist the engineer to decide whether or not to use polyethylene pipes instead of

traditional materials. It mus1 be kept in mind that the AWWA is a pnvate organization

and not an independent national entity, and as such, the grid developed may be biased to

some extent. Nonethelass, the AWWA standard is used extensively in North America.

The soi1 characteristics examined in the AWWA Standard are the following:

a soi1 type,

a drainage ability,

a soil resistivity,

PH,

a oxidation-reduction potential, and

a sulfide content.

These soil characteristics are evaluated separately and the appropriate point is allocated to

each result dependig on the extent to which the factor contributes to the corrosivity of

the soil. The points are then sununed, and a fmal corrosivity index is reported.

According to the AWWA standard, an index of 10 or more indicates that the soi1 tested is

corrosive, whereas an index below 10 suggests that the soil is not corrosive 1251.

A detailed description of the testing procedure is presented in Chaprer 3:

Procedures and Apparatus. This section deals with the factors tested and the points

allocated to each. The following soi1 characteristics are considered:

O Soi1 type is a characteristic which is recorded in the AWWA grid, but which is not

allocated any points. The type of soil (sand, clay, silt) is reported along with the

following characteristics: color, odor, presence of rocks or pebbles, and the presence

of organic materials.

O The drainoge ability of a soil estimates the ease in which the soil is penetrated by

water. The better the drainage ability of a soit, the less I iely that a soil will become

anaerobic and permit bactenal corrosion. The drainage ability is classified as either

excellent, good or poor, and the following points are allocated:

Excellent 1 O II

O Soil resistivity is a measure of the ability of a soil to conduct a current. The lower the

resistivity of a soil, the beîîer are the soil's electrolytic properties, and the higher is

the rate at which the corrosion can proceed. Soil resistivity is measwed in ohm-cm,

and the following points are allocated:

r The pH of a soil is a rneasure of the H+ ion content of the soil. H+ ion reduction is an

important reaction in the corrosion process. The following points are allocated to this

factor:

r Oxidation-reduction potential, or redox potential, is a rneasure of the potential + of

the soil. The potential of a soil indicates whether or not a soil is capable of sustainiig

sulfate-reducing bacteria, which contribute greatly to the corrosion problern. A low

potential indicates that the oxygen content of the soil is low and, consequently, the

conditions are ideal for the proliferation of sulfate-reducing bactena. The following

points are allocated:

The sulfide content of a soi1 serves as an indicator to the presence of sulfate-reducing

bacteria. The greater the sulfide content, the greater the possibility of the presence of

sulfate-reducing bacteria. The following points are allocated:

When sulfides are present and the pH of the soi1 lies between 6.5 and 7.5, an

additional 3 points shall be added to the calculated index. These points are added to

account for the fact that the conditions are optimal for the proliferation of sulfate-

reducing bacteria.

2.4.2 PACE 82-3

The PACE 82-3 standard was designed to assist the engineer in the decision to

provide protection to buried steel reservoirs, such as a petroleum tanks. In the original

standard, three soil samples are taken £rom the site and tested in the laboratory. Each soil

sample is tested individually, and the results are compared with those of the other two

samples. The three samples are originally located at a distance of 30 meters £rom one

another, and their locations form an equilateral triangle when viewed from above.

An adaptation of this test was used in this project. The soi1 samples were received

and tested individually, with no comparison made between samples. The soil

characteristics examined in the PACE standard are the following:

moisture content,

soi1 resistivity,

0 pH,and

sulfide content.

These soi1 characteristics are evaluated separately and the appropriate point is allocated to

each result depending on the extent to which the factor contributes to the corrosivity of

the soil. The points are then summed, and a final coaosivity index is reported.

A deîailed description of the testing procedure is presented in Chapter 3:

Procedures and Apparatus. This section presents the factors tested and the points

allocated to each. The following soi1 characteristics are considered:

The moisture content of a soil describes the state in which the soi1 is received in the

laboratory. This parameter indicates the extent to which a soil is saturated during the

year. The soil is classified as either dry, moist or saturated, and the following points

are allocated:

II Moist 2 1

r Soi1 resistiviîy is measured in ohm-cm, and the following points are allocated:

r The pH of a soi1 is a measure of the H' ion content of a soil. The following points are

allocated to this factor:

The sulfide content is classified as positive or negative. The following points are

allocated:

1 Sulfide Content 1 Points 11 -

CHAPTER 3: PROCEDURES

AND APPARATUS

The various laboratory experiments performed, the apparatus and the matenals

used, the purpose of the experiment, and the results obtained are described in this chapter.

For each soi1 sarnple collected, the following variables were evaluated:

soi1 type,

drainage ability and moisture content,

pH: direct and saturated,

oxidation-reduction (redox) potential: direct and saturated,

resistivity: direct and saturated,

sulfide content: using HCl + lead acetate paper, and a solution of iodine + Na3N,

concentration of C1' ions,

rate at which a standard metal sample will corrode in the given soil using the method

of linear polarization, and

calculated corrosion indices according to the AWWA and PACE methods.

3.1 Soil Samples

The soil samples tested were obtained fiom the various regions of Quebec. In

most cases, the samples were taken for the purpose of beiig tested according to the

AWWA or the PACE methods by COREXCO, Montreal, to determine the need for

cathodic protection of various metallic structures embedded in the given soil.

in total, 153 soil samples were tested. Of these, only 75 were available in

quantities sufficient enough to permit testing for the corrosion rate using the method of

linear polarization.

3.2 Soil Type

During the course of al1 of the tests to follow, the technician should observe

certain characteristics that will enable the determination of the soil type, i.e. a sand, a

clay, or a mixture of both (sandclay). For example:

s The ability of water to penetrate a soil is a good clue to the soil type. For example, a

sand is very quickly penetrated by water, a sandclay is penetrated slowly, and a clay

is almost not penetrated at all.

c The consistency of the soil when manipulated in one's hands: fine sand forms clumps

that c m b l e easily, whereas clayey rnatenals typically f o m clumps that are either

hard or malleable, but do not c m b l e easily.

c The ease with which the soil is washed off the equipment, e.g. electrodes, plastic

bowls, soil box, rnetal spatulas, etc. Sand rinses off equipment easily, requiring no

scnibbiig at all. Clays, on the other hand, require significant brushing to be rernoved,

and sandclays are relatively easy to wash off, but notas easily as pure sand.

Experience will enable the technician to confidently classifi a soil as a sand, a

sandclay, or a clay. The soil types of the samples tested are presented in Table 3.1, in

which a sand is represented by S, a clay by C, and a sandlclay by SC.

- Soil # -

1 2 3 4 5 6 7 8 9 10 1 1 12 13 14 15 16 17 18 19 20 2 1 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 -

- Soil # - 40 4 1 42 43 44 45 46 47 48 49 50 5 1 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 7 1 72 73 74 75 76 77 78 -

- ioil type - SC S S S S SC SC SC S SC S S S S S S S S S S S S C SC C S S SC SC SC SC SC S S SC SC SC SC SC -

- Soil # - 79 80 8 1 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 1 O3 104 105 106 107 108 109 110 1 1 1 112 113 Il4 115 116 117 -

- ioil type - SC S SC S S S S SC S S SC SC SC C SC SC SC SC SC SC SC SC SC SC SC SC S SC S S S SC S SC SC C S SC SC -

- Soil # - 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 247

-

- ioil spe --. SC SC SC C SC C SC SC SC S S SC SC SC SC SC SC SC S SC S C SC SC S S S S S S C C C SC C S

- Table 3.1 Sou Type Results

56

3.3 Drainage Ability 1 Moisture Content

n i e AWWA and PACE standards define humidity differently. According to

AWWA, humidity is the ability of a soi1 to be penetrated, or to drain water. ïhis variable

is referred to as the drainage ability of the roil. Accordiig to PACE, humidity refers to

the moisture content of a soil on site, or as it is received in the laboratory. This variable is

termed the moisture content of the soil in this thesis.

Drainage Ability

The definition of the humidity index in the AWWA Standard is the drainage

ability of a soil. In the laboratory, this parameter is determined very subjectively. Soi1 is

placed in a bowl, and distilled water is added slowly to the soil. The speed with which

the water penetrates the soil is observed. The drainage ability of the soi1 is th-n classified

in one of the following three categories:

Excellent : a soil that is easily penetrated by water, e.g. a sand

Good: a soil that is penetrated slowly by water, e.g. a sandlclay

Bad: a soil that is almost not penetrated by water at all, e.g. a clay

The drainage ability of the soil samples tested are presented in Table 3.2, in which

excellent drainage is represented by E, good drainage by G, and poor draiiage by B.

Moisture Content

Unlike AWWA, the humidity index in the PACE grid is a measure of the moisture

content of the soil sample as it is received in the laboratory. Again, this is a subjective

evaluation, and it is dependent on the experience of the technician. The moisture content

of the soi1 is determined by visual inspection, and by rollig the soil in one's hands. The

moisture content of the soil is then classified in one of the following three categories:

Saturated

Moist

Dry

ïhis parameter estimates the moisture content of the soil under usual

circumstances. Knowledge of the moisture conditions that a soil is subjected to

throughout the year will enable the engineer to determine how corrosive the soil is to a

water pipe placed permanently in that soil. For example, irrespective of the corrosivity of

a soi1 in sanirated condition, if it is kept very dry throughout the year, the pipe will not

suffer any corrosion. However, the state of one sample does not indicate the general year-

round conditions. This test should therefore be used in conjunction with interviews with

the individuals who are knowledgeable of the condition of the soil in general, i.e.,

percentage of time that a soi1 is saturated, moist, and dry. The rnoisture contents of the

soil samples tested are presented in Table 3.3, in which a dry soi1 is represented by D, a

moist soi1 by M, and a saturated soil by S.

- - Xainage Ability =

G E E E E E E E E G E E E E E E E E E E E E B G B E E B B B G B E E G G G G G v

- Soi #

79 80 8 1 82 83 84 85 86 87 88 89 90 9 1 92 93 94 95 96 97 98 99 100 101 102 1 O3 1 O4 105 1 O6 107 108 109 110 1 1 1 112 113 114 115 116 117 -

- Xainage Ability - G G G E E E E G E E B B B B B G G G G G G G G G G B G G E E E E E G G B E G G -

- - a e Ability - G G E B G B G G G E E G G G G G G G E G E G G G E E E E E E B B B B B E

- Table 3.2 Drainage Abiliîy Resulis

59

- Soi #

- 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 7 1 72 73 74 7s 76 77 78 - Table 3 3 Moisture Content ResuItc

60

The pH of the soi1 samples was measiued in two different ways. The first method

consisted oftesting the soi1 in the state in which it was received in the laboratory, i.e. pH-

direct. This test serves to represent the conditions found on site. The second method

consisted of testing the soil once it had been saturated with distilled water, i.e. pH-

saturated. This test may better represent the case in which the soil is saturated after a

heavy rainfall, or snow melt. Furthemore, it represents the conditions in which the soi1

is found during the linear polarization test. Although both procedures have their

limitations in applicability, the pH of the soil was detennined accordiig to these two

procedures because they were recommended by the AWWA and PACE grids, and were

required to calculate the corrosivity index accordiig to each of these grids.

Necessary Equipment

pH meter

30 ml plastic container with cap

Distilled water

The pH was measured using a pH meter, an electronic device with a probe that

can be inserted into a solution of an unknown pH. A pH meter is an example of an ion-

selective, or ion-specific, elecirode. It is based on the principle that the measured

potential of a solution depends on the concentration of the reactants and the products

involved in a cell reaction.

The pH meter has three main components: a standard electrode of known

potential, a special glass electrode that changes potential depending on the concentration

of H' ions in the solution into which it is dipped, and a potentiometer that measures the

potential between the two electrodes. The potentiometer reading is automatically

converted electronically to a direct reading of the pH of the solution being tested.

The g las electrode contains a reference solution of dilute hydrochlonc acid in

contact with a thin g las membrane. A silver wire coated with silver chlonde is

embedded in the solution. The electrical potential of the glass clectrode depends on the

difference in H+ concentration between the reference solution and the solution being used

in the test. Thus the electrical potential vanes with the pH of the solution tested 1'4.151.

The AWWA recommends that the pH of the soil be determined for the soil as it is

found in its natural state. The pH electrode is simply immersed into the soil and the value

obtained is noted once it has stabilized. Extreme care must be taken when attemptiiig to

plunge the pH meter into dry clay, or into a soi1 containing small pebbles, because of the

delicate nature of the glass bulb. The values of pH-direct of the soi1 samples tested are

listed in Table 3.4.

PACE recommends thrit the pH of the soi1 be determined by testing a slurry

consisting of soil and distilled water. The 30 ml plastic container is filled halfway with

soil and then filled almost to the top with distilled water. The container is then capped

and shaken vigorously. The mixture is allowed to rest for approximately 5 minutes. The

pH of the slurry is then determined by immersing the pH electrode into the saturated soil,

and allowing the value to stabilize. The values of pH-saturated of the soil samples tested

are listed in Table 3.5.

- Soil #

- 40 4 1 42 43 44 45 46 47 48 49 50 5 1 52 53 54 55 56 57 58 59 60 6 1 62 63 64 65 66 67 68 69 70 7 1 72 73 74 75 76 77 78 --.

- PH-

direct - 6.8 7.2 7.5 7.6 7.4 7.8 8.8 8.1 6.1 6.7 7.6 6.7 6.8 7.9 6.9 7.7 7.3 7.7 6.9 6.9 7.5 7.1 5.9 6.7 5.8 6.1 6.2 7

7.2 7.1 7.8 7.7 5.9 7.7 7.4 8.2 7.7 7.5 7.6 -

- - Soil #

- 79 80 8 1 82 83 84 85 86 87 88 89 90 9 1 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 -

- PH-

direct =

7.3 7.9 7.4 8.2 7.2 6.7 6.1 5.7 6

5.9 7

7.2 7.3 7.4 7.4 7.6 7.4 8

7.4 7.8 7.2 7.6 7.9 7.5 6.7 6.8 7.8 7.7 8

7.5 7.1 8.1 8.3 7.1 6.6 7.3 7.3 6.8 7.2 -

- - PH-

direct

Table 3.4 pH-direct Results

63

Soil #

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2 1 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

- - Soil #

- -

40 4 1 42 43 44 45 46 47 48 49 50 5 1 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 7 1 72 73 74 75 76 77 78 -

Table 3.5 pH-saturated Results

64

3.5 Oxidation-Reduction Potential (Redox Potential)

Like pH, the redox potential is measured in two different ways for each soil

sample. The first method is the direct measurement, redox-direct, in which the soi1 is

tested as it is received in the laboratoiy to represent the condi:ions found on site. Also,

this variable is required to calculate the corrosivity index according to the AWWA

standard. The second method involves testing the soi1 once it has been saturated with

distilled water. This measurement is referred to as redox-saturated. This method serves

to represent the conditions in which the soil is found during the linear polanzation tests.

r Digital voltmeter

r 30 ml plastic container with cap

r Distilled water

The oxidation-reduction potential is measured using a digital voltmeter. This

instrument measures the "driving force" or the "pull" of the soi1 on electron. These

electrons would be supplied by the oxidation of an anode placed in contact with the soil,

i.e. metal objects embedded in the soil. This potential is the electromotive force (emf) of

the cell, and it is a measure of the tendency of the soi1 to corrode a metal. The unit of

eleclrical potential is volts, V.

The first voltmeter measures the potential by drawing current through a wire of

known resistance ['4s151. However, when the current flows through a wire, the frictional

heating that occurs wastes some of the potentially useful energy of the cell. A traditional

voltmeter will therefore measure a potential that is less than the maximum cell potential.

The key to determining the maximum potential is to perfonn the measurement under

conditions of zero current, so that no energy is utilized. Traditionally, this has been

accomplished by inserting a variable voltage device, powered ffom an external source, in

opposition to the cell potential. The voltage on this instrument, called apotentiometer, is

adjusted until no current flows in the ceIl circuit. Under such conditions, the ce11

potential is equal in magnitude and opposite in sign to the voltage setting of the

potentiometer, and is the mnrimum ce11 potential since no energy is wasted in heating the

wire. More recently, advances in eleckonic technology have allowed the design of the

digital voltmeters, such as the one used in this project, that draw only a negligible amount

of current [14.'51. These instruments have since replaced potentiometers in the modem

laboratory due to their ease of use.

Redox-Direct

The AWWA recommends that the redox potential be determined for the soil, as

it is received in the laboratory. The platinum electrode is immersed into the soil, and the

redox value is noted once the value has stabilized. Le. the redox value does not Vary

above 1 mV per minute. The values of redox-diuect for the soil samples tested are listed

in Table 3.6.

Redox-Saturated

The slurry prepared for pH testing accordiig to the pH-saturated method is used to

test for the redox-saturated value. The platinum electrode is immersed into the sahirated

soil, and the value of the potential is noted once it has stabilized. The values of redox-

saturated for the soi1 samples tested are listed in Table 3.7.

In testing for the redox potential, an attempt was made to limit the exposure of the

soi1 to the ambient air. Redox tests on soil samples were always performed first, as soon

as the container of soi1 was opened, and this container was closed as soon as possible

afier retrieval of the soil sample. It has been observed in the laboratory that a soi1 whose

redox potential is below O mV, once left open to ambient air for half an hour to an hour, it

may later register a potential above 100 mV. It is essential to keep the soil container well

sealed, to ensure that the readiig taken is not affected by the exposure to the oxygen in

the air.

- - Redox- direct

(mv)

7

Redox- direct

(mV)

214 -3 8 150 184 178 118 175 191 219 232 200 183 180 200 220 23 1 203 210 20 1 224 -3 3 -64 178 240 134 14

-38 219 194 144 112 1 O4 164 170 154 187 192 214 169 -

- Soi #

- Redox- direct

(mV)

209 208 220 194 181 216 204 263 216 200 210 183 21G -49 132 185 208 8 1

228 228 274 260 228 219 228 185 193 171 155 175 190 147 121 183 229 180 260 225 218 -

- Redox- direct

(mv) - 160 230 230 184 190 50

237 281 320 156 192 208 190 185 197 181 167 188 180 147 178 148 191 219 165 195 189 155 139 150 130 246 247 154 28

217

- Table 3.6 Redoxdirect Results

67

Soi #

- 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2 1 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 -

- - Redox- aturated

(mv) - 196 188 225 168 258 194 252 256 218 198 153 165 149 121 140 156 176 186 180 214 264 229 163 191 206 187 179 162 155 154 171 111 155 165 226 115 270 204 197 -

- - Soi #

- 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 247

-

- Redox- idhnated

(mv) - 206 197 223 80 187 4 0 262 267 167 134 175 188 191 180 190 173 159 177 145 174 183 188 193 205 136 161 185 128 99 112 152 225 230 93 -15 222

=

Table 3.7 Redox-saturated Resulis

68

3.6 Resistivity, p

It is found that the nature of the electrolyte, in this case the soil, has a significant

influence on the rate of corrosion of a metal exposed to it. The types and the amounts of

the various dissolved salts in a soil, particularly those which ionize most readily, are

estimated by measuring the electrical resistivity of the soil. The lower the resistivity, the

more the electrolyte contributes to corrosion.

The resistivity of a soi1 is measured in two different ways. The first method

involves measuring the resistivity of the soil as it is received in the laboratory, pdirect.

The second method is to measure the resistivity of the soi1 once it has been saturated with

distilled water, p-saturated. The latter represents the worst case, when the soi1

conductivity is at its highest. Both measurement are necessary to calculate the corrosivity

indices according to A W 7 A and PACE.

Soil box

Ohmmeter

Four wires with clamps at both ends

The resistivity of a given object is calculated by making use of the relationship

between resistivity, resistance and geometry. Resistance, R, is the property of a body or

mass with discemible geometry, e.g. a piece of wire, or a block of soi1 of a given size.

Resistivity, p, on the other hand, is a characteristic property of the material, e.g. copper,

or a specific soil. While resisîance is a function of geomeby, resistivity is not

dependent on the geometry of the body Il1.

The resistance of a recîangular body of any substance, when measured between

parallel faces, is directly proportional to its length and inversely proportional to its cross-

sectional area. In other words, as the depth and width increase, resistance decreases. The

following equation shows the relationship between these variables

where R = the resistance of the rectangular body (Ohms)

p = the resistivity of the substance making up that body (Ohm-cm)

W = the width of the body (cm)

D= the depth of the body (cm)

L =the lengtli of the body (cm)

When measuring p, what is actually being measured is R, and p is then calculated

using Equation 3.1. R is measured using a soi1 box and an ohmmeter. The soi1 box is a

rectangular box with an open top, made of a non-conducting material (usually plastic)

wiîh metal ends and two metal pins inserted into the side of the box ['l. The box is filled

to the top with soil, such that the values of W, D, and L are known. It is then connected

to the ohmmeter, and the current is introduced by means of the two end plates ani the

potential is measured across the two pins. nie value of R is then calculated according to

the following relationship ['l:

The value of the resistivity, p, is then calculated automatically and displayed by the

ohmmeter.

p-saturated

The AWWA method suggests that the resistivily of the soil be determined when

the soil is saturated. l i s represents the woet possible case, i.e. when the conductivity of

the soil is a maximum.

A suficient quantity of soil is placed in a bowl, and distilled water is added

gradually in small quantities. The soil and water are mixed continuously to encourage

penetration of the water into the soil. Some expenence is needed to ensure that the soi1

has reached saturation, and extreme care must be exercised when adding water to avoid

supersaturating the soil. When a soil is supersaturated, the excess water may separate

6om the body of soi1 and it will not be transferred to the box with the rest of the soil.

Ions such as chlorides, which are found in this excess water and which give the water its

conductive properties, will be absent fiom the soil box. This will result in a higher

rcsistivity, which will not be tmly representative of the soil's ability to conduct ions.

When the soi1 is saturated, it is transferred to the soil box a linle at a t h e , where

it is compacted well to eliminate any air bubbles or voids, and to ensure uniformity and

reproducibility of the measurement. The box is then attached to the ohmmeter, and a

readiig is taken. The values of p-saturated for the soi1 samples tested are listed in Table

3.8.

According to the PACE method, the soil is tested for resistivity in the state in

which it is received in the laboratory. Therefore, the wet or dry soi1 is added to the soil

box and compacted. Once more, air bubbles must be absolutely avoided, as they will

result in higher values of p. The box is then wired to the ohmmeter, and the reading is

taken. The values of p-direct for the soil samples tested are listed in Table 3.9.

- - P -

ahnated

- 228 190 780 165

5300 4330 4770 2343 2097 1834 1284 1856 2396 1748 3580 367

6 1400 409

132600 5820 4020 90800 299 4080 3650 4960 3200 778 682 613 2076 1008 1112 1706 1040 2890 1591 2685 2275 -

- - Soi #

- 40 4 1 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 6 1 62 63 64 65 66 67 68 69 70 7 1 72 73 74 75 76 77 78 -

- Soi #

- 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 247

ePha

- - P -

ahnated )hmsm: - 1947 1377 7410 1571 1888 1417 2623 2181 2211 2359 3060 5980 5140 2625 250 87.2 102.6 81.9 1838 5780 6490 2309 4800 4720 11000 8220 34400 18310 9370 17150 3070 2589 2523 1572 2135 2269

- Table 3.8 p - saturated Results

- P - direct ohm@ - 3820 1410 9280 12550 31800 1987 2198 4120

62100

44700 10440 6040 9040 61800 14960 16610 11230 3830 1389 2121 2688 1131 2159 7140 1504 2078 1790 2064 623

146700 224.8 2449 7340 1901 8240 2599 -

- P -

direct ohm-cm: - 4100 3170 2218 82400

9460

129300 1943 1618 1247 1205 3820 3220 5660 3140 3160 1271 205.3 1482 1736 841 496 4000 2012 8230 67900 26970 40200 120700 77200 6040 11930 4080 8900 1247 2980 -

Sol#

40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78

Sol#

1 2 3 4 5 6 7 8 9 IO 1 1 12 13 14 15 16 17 18 19 20 2 1 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

- P -

direct ohmcm: - 1947 1683 13300 1577 5140 1429 3750 4400 7600 4850 90100 13360 27180 42000 iO58 568 779 28 1.4 5000

151800 7490 22340 16230 37600 27860

40900 30400 35100 3860 3460 2612 1548 2032 3870

P - direct

(0-1

280 228 1592 173 5300 4340 4780 2277 1940 1713 1223 1904 3550 1813 3580 1443 101300 833

220800 5820

180000 309 18570 3840 13820 10580 825 707 723 2235 1311 1568 3830 1210 7750 2593 10870 3210

Table 3.9 p - direct Results

73

3.7 SuIfide Content

The sulfide content of the soil is determined in two different ways. The first

method uses a solution of iodiie and 3% NajN, and the second a solution of HCI along

with a strip of Iead-acetate paper. These two methods were chosen because they were

recommended by AWWA and PACE, and are required to calculate the corrosivity

indices.

NecesSan, Eauioment

c 2 standard test tubes

c Concentrated HCl acid (15%)

A strip of lead acetate paper

% A solution of 12 (aq) + 3% Na3N

The AWWA procedure for testing for sulfides is to saturate a small quantity of the

soi1 with a solution of iodine and 3% Na3N, and to observe the resulting reaction. A

mal1 amount of soi1 is placed in a test tube, and the iodine solution is poured into the test

tube to top the soil. The mixture is then shaken well, and the degree of reaction is

obsewed and classified as either violent, normal, or absent.

This test is a qualitative one, and may be quite subjective. This is because the

reaction is never very violent, and it is ofien difficult to differentiate between the degrees

of reactions, especially between the normal and the violent. Although only visual

observation is recommended by AWWA, sound was also used to help distinguish

between a violent and a normal reaction. If bubbles can be heard to be exploding at a

quick pace, the soil is considered to undergo a violent reaction. If only a slight sound can

be heard (or none at all), and bubbles c m be seen, then the soil is classified as reacting

normally. If no bubbles are seen or heard, then the soi1 is assumed to contain no sulfides

at all.

The degree of the reaction was then used to establish the sulfide content of the

soil. If the reaction was classified as violent, then the sulfide content of the soi1 was

assumed to be high. Ifthe reaction was classified as normal, then the soi1 was assumed to

contain traces of sulfide. Fially, if no reaction was obsewed, then the soi1 was assumed

to contain no suifides. The sulfide content of the soi1 samples, as determined by AWWA,

are presented in Table 3.10, in which N represents no sulfides, T represents traces of

sulfides, and P represents the presence of sulfides.

HCI and Lead Acetate Paper

PACE recommends using concentrated HC1 in combiiation with lead acetate

paper to determine whether the soi1 contains sulfides. A small amount of soi1 is placed in

a test-tube, and then 15% HCl is added to top the soil. A strip of lead acetate paper is

introduced and held at the top, and the test tube is then covered at the top with the thumb

of the tester. The mixture is then shaken gently, and care is taken not to wet the indicator

paper. AAer a couple of minutes, the paper is obsewed for signs of a brown

discoloration, usually present along the edges of the paper. Any discoloration indicates

the presence of sulfides. Another indication of the presence of sulfides is the smell of

ronen eggs, characteristic of the H2S gas. However, as this product is extremely toxic, it

is highly recommended that one avoids breathiig il, and that the room be kept well

ventilated or, better still, that this experiment be carried out under a fume hood.

When the acid is added to the soil, it is very common to observe a violent

reaction, and a lot of bubbling. This may be the result of the reaction between HCI and

any carbonates that may be present in the soil. The bubbling is the result of the formation

of hydrogen gas, and does not indicate the presence of sulfides. Only the discoloration of

the lead acetate paper can correctly determine whether or not sulfides are present. The

sulfide contents of the soil samples, as determined by PACE, are presented in Table 3.1 1,

where N represents no sulfides, and P represents the presence of sulfides.

- - Suifide content ? d u e ) -

N T N T N P P P P P P P N N T N N

N P P P P N N N N N N N N N P N N N T T T - -

= Sulfide content &xüne) -

T T N N N N N N T T N N N N N N N N T T P P N T T P P N T T P P N N N T N T P -

- Suifide content 5odine) -

N T N N N N N N N N N T N P P N N N N T N N N N N P T P N N N P N T N T N N N -

=

Soil #

- 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 747

-

- Suifide content riodine) -

N N T T N P N T N P P P P T T T P T T P P P T T N T T P P P P T P P P T

- Table 3.10 SuIfide Content Results Usiug Iodinc Solution

76

3.8 Chloride Concentration

The test for chlonde content is not a part of either the AWWA or the PACE

procedures. It is one of the goals of this project to determine if knowledge of the CI- ion

content would permit us to evaluate the corrosivity of a soil better than it could be

determined if only the other parameters were known, i.e. p, redox potential, pH, etc.

The overall procedure can be summarized as follows: The sample is dried and

pulverized. The finest particles are kept and combied with distilled water. The mixture

is allowed to sit ovemight, permitting the iÏee C1' ions to enter into the water. The

potential of the solution is recorded with a chlonde-specific electrode, and the rccorded

value is compared with the pretabulated values to obtain the chlonde ion concentration

of the solution. The chloride concentration of the soil itself is then calculated.

3.8.1 Necessaw Equioment

Potentiometer

Chloride-specific electrode

Electrode wening agents, Le. solutions of 1M (PJH4)2S04 or 1M KNOJ

8 Ceramic bowl and hammer

30 ml plastic containers

2 mm and 200 pm sieves

200 ml beakers

8 Microwave oven

8 Scale

Powdered KCl

Distilled water

a J-cloth + an elastic band

Stop watch

The chloride ion concentration is calculated indiidly fiom the potential that is

measured using an ion-specific elccirode, i.e. an electrode that is sensitive to the

concentration of a particular ion. It is based on the p ~ c i p l e that the measured potential

of a solution depends on the concentration of the reactants and the products involved in a

ceil reaction 1'4.151.

An example of an ion-specific eleckode is the pH meter discussed in Section

3.4.1. Glass electrodes can be made sensitive to ions such as Na', K', NI&', and CI' by

changing the composition of the membrane. In this case, a CI' ion-specific electrode was

used to determine the potential resulting fiom the presence of Cl- ions only 114.151. Unlike

the pH meter, the potential is not converted automatically, but m u t be obtained through a

series of steps which are discussed in the following sections.

3.8.2 Samvle Prevaration

A 200 ml beaker is filled half with soil, and covered with a piece of J-cloth which

is secured in place with an elastic band. The sarnple is dried in a microwave at a high

temperature for 3-5 minutes, or as long as necessary to thoroughly dry the soil. Extreme

care must be exercised when handling the beaker as it reaches very high temperahues.

When the soil has cooled suficiently to permit handling, some of it is kansferred

to a ceramic bowl. The soil is pounded with a hammer for a few minutes to separate any

larger pebbles from the fmer soil. The pulverized soil is passed through a 2 mm sieve to

remove the pebbles that cannot be pulverized. The recuperated fine soil is then rehmed

to the bowl where it is pulverized further to a fine consistency. The soil is then passed

through a 200 pn sieve, and the soi1 recuperated is ûansferred to a clean, tarred 30 ml

plastic container. Approximately 5 g of soi1 should be recuperated. If the quantity is

insufficient, the above procedure can be repeated until such an amount is retrieved.

The exact weight, in grams, of soils in the tarred container is recorded, and

distilled water is added to the soil in a ratio of 2:1, i.e. 10 g of water are added to 5 g of

soil. The container is then capped, and shaken vigorously for 30 seconds. ï h e sample is

then allowed to sit ovemight.

The potential of the previously prepared sample is recorded with the aid of an ion-

specific electrode, which is sensitive to C1' ions only. W s electrode, when atîached to a

voltmeter, registers the potential of a solution due to the presence of only the Cf ions.

The electrode is rinsed thoroughly with distilled water, and filled with a wetting

agent. The wetting agents used in these expenments were 1M KN03, 1M (NH&S04, or

a commercially prepared wetting agent of unknown composition. The choice of wetting

agent appears to make no difference in the final results.

When filling the electrode, care must be exercised to enswe that no air bubbles

are present in the wetting agent. When the electrode is ready, it must then be calibrated

using solutions of known chlonde ion concentrations.

3.8.4 Pre~aration of Calibratine Solutions

In order to calibrate the electrode, the potential of different solutions of known CI'

concentrations are recorded. These solutions are prepared by simply adding KCI, or NaCl

crystals to distilled water in the correct quantities such that solutions with the desired

concentration of CI' ions are obtained. Solutions of 0.01%, 0.03%, 0.33%, 0.65%, and

1.3% CI' ions are required. Table 3.12 shows the weight of KCl to be added to 1 kg of

distilled water in order to obtain the desired concentrations, as well as the equivalent

concentration in ppm.

- - -

Table 3.12 Preparation of Csiibrating Solutions

3.8.5 Calibration of Electrode

Once the calibration solutions and the electrode are prepared, the electrode is

calibrated. This is done by altematively reading the potential of each of the calibration

solutions. The electrode is placed into a solution, and held upnght for a predetermined

amount of time, e.g. 1 minute is usually sufncient, but 3 minutes may be needed to

achieve stability of the reading. This time must be chosen pnor to taking the fmt

reading, and must remain the same for al1 subsequent readings. Each of the five

calibration solutions are tested in tum, fiom the most concentrated to the least

concentrated solution, and then in random order. In order to avoid contaminating the

calibration solutions, the electrode should be rinsed with distilled water and tapped dry

before taking the next readiig.

Each solution is tested twice, and the reproducibility of the potential is

determined. If the potentials are approximately equal, calibration is complete. If the

values Vary significantly, then the electrode should be checked closely for any problems

such as the presence of an air bubble in the wening solution of the electrode, the lack of

wetting agent due to a leak, etc. Further measurements are then taken until

reproducibility of the potentials is obtained, and the technician is confident of the results

obtained. An exarnple of the potentials obtained during one calibration exercise are given

in Appendix B.

3.8.6 Calibration Curve and Eauation

ïhe calibration curve is constructed fiom the potentials registered for each of the

five calibration solutions. For each solution (0.01, 0.03, 0.33, 0.65, and 1.3 % Cr) the

average potential is calculated. The values of the chloride ion concentration (%) are

ploned against the average potential values, and an exponential c w e is fined to the five

points. This curve, along with the correspondiig equation, will be used to obtain the C1'

concentrations of the solution of the samples. A calibration c w e , along with its

equation, is presented in Appendix B.

3.8.7 Testine. Soil Samales

The samples prepared the previous day are now ready to be tested, given that the

standards indicate that 2-6 hours are sufficient to allow al1 C r ions to enter into the

distilled water. The mixture of soil and distilled water has now separated into two parts.

The liquid part contains the Cl- ions, and the deposited soi1 particles. Care must be taken

not to disturb the settled layer before testing the liquid part.

The calibrated electrode is lowered into the 30 ml container, until the membrane

at the tip is fully immened in the solution liquid above the precipitate. The electrode is

then held upright for the tirne chosen during calibration (1-3 minutes), and the potential

of the solution is registered.

3.8.8 Determination of Concentration of Chloride Ions of Soil

Once the potential of the liquid fiaction of each sample is obtained, it must be

transformed into a value more intuitively understandable: percentage concentration, or

ppm of C r ions. This is very easily, and quickly done by readiig off the concentration

value in percentage terms fiom the calibration cuve, or by calculating it using the

calibration equation. An example of this is s h o w in Appendix B.

The variable of interest is the concentration of CI' ions in the soil, and not in the

liquid fiaction of the prepared sample. This value is obtained by simply doubling the

concentration of Cf ions in the liquid fraction. This is due to the fact that a ratio of 2:l

between the water and soil weights was used during the sample preparation.

Finally, the concentration of Cf ions of the soil, in ppm, is obtained by

multiplying the concentration in percentage by 10,000. The values in ppm are retained

for further data analysis, although the concentration in % could have equally been used.

The chloride ion concentrations of the soil samples are presented in Table 3.13.

- Soi #

- 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 7 1 72 73 74 75 76 77 78 -

- CI-

:ontent

@PI - 80

1835 155 168 81

2310 4592 1094 249 436 38 46 58 77 54 8 1 148 111 148 716 345 3257 123 758 448 2030 537 391 423 160 380 1345 7161 12556 210 56

407 36 157 -

- Soi #

- 79 80 8 1 82 83 84 85 86 87 88 89 90 9 1 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 1 O7 108 109 110 111 112 113 114 115 116 117 -

- CI-

:ontent

@Pm) - 172 190 222 17 5 O O 6 3 5 1

362 283 272 253 142 207 8 1 22 1 O9 265 5294 326 320

2067 5394 481 2712 340 191 42 1 1830 756 233 65 17 9

28 759 374 -

- Soi #

- 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 247

C-e

- - Ci-

m e n t

@P) - 243 347 59 160 282 310 268 419 328 258 125 52 34 82

9223 13652 22664 17754 759 75 53 29 149 30 30 28 14 26 72 47 95 154 155 73 35

1042

- Table 3.13 Cbloride Ion Concentrations

83

3.9 Linear Polarization

The final test is the h:ar polarization of a standard steel sample exposed to each

of the soil samples. The result of this test is the corrosion rate, in d y r . , that the steel

sample will undergo in the given soil.

Although very informative and precise, linear polarization is a test that is t h e

consuming, and that requires a very expenenced technician. Furthemore, the equipmeni

is quite expensive. Al1 this makes the test generally inaccessible, and encourages

corrosion engineers to depend on corrosivity indices such as those proposed by AWWA

and PACE, which can be calculated ftom the results of very simple and inexpensive tests.

The corrosion ce11 set up is made up of the following elernents:

a A working electrode: the steel specimen which plays the role of the anode during

anodic polarization, and the cathode during cathodic polarization,

a An auxiliary electrode: the graphite rod which plays the role of the anode in cathodic

polanzation, and the cathode in anodic polarization,

a A reference electrode: the CuICuS04 electrode which measures the potential of the

working electrode at any point during linear polanzation,

a An ionic conductor: the saturated soil sample which allows ions to travel fÏom the

cathode to the anode, and

a The potentiostat which controls the potential of the system, and which acts as the

electncal conductor between the anode and the cathode.

The procedure for testing the soil samples can be summarized as follows: the soil

sample is saturated and a glass jar is filled up to a specific height with the saturated soil.

The surface of the metal specimen (working electrode) is prepared according to a specific

procedure and immersed into the saturated soil, along with the reference and the auxiliary

electrode. The electrodes are wired to the potentiostat, and the corrosion rate is obtained

from Tafel and polarization resistance diagrams.

3.9.1 Necessarv huioment

Potentiostat: the CMSlOO Elecirochemical Measument Çystem by Gamy, Inc.

Graphite electrode (auxiliary electrode)

Cu/CuS04 electrode (reference electrode)

Standard metal specimen, and specimen mount

A hand drill and support

No. 200 and 400 sand paper

1 .O micron, agglomerate-6ee, alpha alumina powder by Leco

Acetone

Caliper

Glass jar

The reference electrode used was the CdCuSO, electrode, which consists of

metallic copper immersed in a solution of saturated copper sulfate. This means that,

instead of measuring the potential of the system against the standard of hydrogen ion

reduction (whose potential is 0.000 V by convention), the potential is measured against

the reduction of copper ions, cu2+. This half-ce11 reaction is the reduction of CU" ions:

In order to obtain the potential against the standard of hydrogen ion reduction, the

value of 0.337 V is subtracted fiom the value of the potential obtained against the

CuICuS04 electrode. For example, a potential of 0.400 V Vs CU'+ reduction , is

equivalent to a potential of 0.063 V against H+ reduction. Regardless of which electrode

is chosen, it is used to measure the potential of the working electrode at any given time

during the experiment.

The standard metal specimen used in the expenments consisted of a small

cylinder with an approximate height of 14.2 mm and a diameter of 9 mm. It a a s created

6om material cut 6om a ductile iron pipe removed 6om the ground for testing by the

authority of COREXCO, Montreal. The specimen was machined in the Civil Engineering

Materials Testing Laboratory. It \vas fomied with a "thread" m i n g through the center,

such that it could be screwed onto the erid of a rod. This rod consists of a long thin tube

through which runs a wire comecting the metal specimen at one end, to the potentiostat

at the other. This wire ensures that the steel specimen, which is later immened into the

soil, is in constant contact with the potentiostat. Figure 3.1 shows a schematic of the

working electrode made up of a steel specimen screwed onto the rod.

..a. - -",..."..'.6 1 to poentiostat Rod

I Steel Specimen

Figure 3.1 Componcnts of the Working Elcctrode

3.9.2 Trial Runs and Reproducibilitv of Kesults

Before testing any of the soils that were retained for further analysis, trial u s

were performed on expendable soi1 samples to determine the exact procedure to be

followed such that reproducibility of results is ensured. This process is very important,

because if the technician is unable to perform the test in a repeatable manner within a

prescribed tolerance, then the result would be unreliable.

The trial runs served the following purposes:

To identify the method of preparing the soi1 samples,

To identify the method of preparing the surface of the steel specimen,

0 To provide the technician experience and the ability to perform the tests quickly and

consistently,

0 To determine the scan rate and the scan range to be used in obtainiig the Tafel and

polarization resistance diagrams, and

To determine the time to be provided for the steel specimen to stabilize in the soil

pnor to polarization.

Trial runs were performed on three different soils. The complete procedure was

established and is presented in the following section. The results of the trial run

performed on sample No. 123 are presented in Appendix C. From these results, it can be

seen that the procedure established yielded reproducible results to a satisfactory degree.

The soi1 sample is tested under conditions of saturation. Soi1 is placed in a bowl,

and distilled water is added gradually in small quantities, and worked into the soil, until

the soil is saturated. Care must be taken not to ovenaturate the soi1 because the

conductive properties of the soi1 may be incorrectly estimated if the excess water bleeds

to the surface of the soi1 during the polarization testing. The soi1 is saturated in order to

represent the worst case scenario in which the soil's conductive properties are highest.

Furthemore, it eliminated one of the variables that differs between the samples, i.e.

moisture content. It should be noted that, if a soi1 is completely dry, linear polarization of

the steel specimen is not possible because the system is missing a key element: the ionic

conductor.

Once the soil is saturated, it is transferred to the mason jar, which is filled to a

specified height. The height requirement is intended to ensure reproducibility of the

cathodic area, which consists of the area of the graphite rod which is in contact with the

soil. If the graphite rod is immersed into the soil such that its end touches the bottom,

and the height of the soi1 is always the same, then the same area of graphite will be in

contact with the soil, i.e. constant cathodic area.

The soil must be observed for signs of air bubbles. Lightly shaking the jar may

consolidate the saturated soil and eliminate any air bubbles, which tend to increase the

overail resistivity of the soil. Furthemore, if the steel specimen is in contact with an air

bubble, the actual anode area will be smaller than what has been assurned, and therefore

the corrosion rate will be underestimated. The soil sarnple is prepared first and the

reference electrode and graphite rod are secured into place within the mason jar. The

steel specimen is prepared next.

3.9.4 Pre~aration of the Workiie Elechode

This section serves to outline the method of preparing the surface of the specimen

prior to each polarization sequence. As observed previously, surface preparation plays an

extremely important part in the corrosion process, i.e. it can greatly affect the rate at

which the corrosion w".l proceed. For example, the presence of a protective surface film

would result in a lower corrosion rate. If such a film is not properly removed, or if the

steel sample is exposed to ambient air after cleaning such that a protective film is allowed

to form prior to testing, then the results obtained would be rnisleading. For this reason,

the specimen must be prepared in a consistent manner each time to ensure reproducibility

of the results.

For each soil sample, the steel specimen is polarized four times. The surface must

be prepared thoroughly pnor to each of the tests. The first step in the surface preparation

is the sanding of the surface. In order to obtain a uniform sanding, an ordinary hand drill

is mounted securely on a stand and a screw, whose diameter is compatible with that ofthe

steel specimen thread, is inserted into the "nose". The steel specimen is then secured

onto the end of the screw. Two sandig papers are used: sizes 400 and 600. As the drill

rotates the specimen, it is sanded on al1 sides with the size 400 paper first, and then with

the sue 600 paper. The specimen is then sanded with alumina paste, which ensures a

smooth preparation of 1.0 p. The specimen is then removed fiom the drill, and screwed

onto the end of the working electrode rod. When a tight seal is ensured, the specimen is

rinsed thoroughly with acetone to eliminate any greases, and then Mise with distilled

water. The specirnen is then quickly immened into the saturated soil sample, and the

appropriate test is m.

3.9.5 Polarization of the Steel Soecimen

Once the soil sample has been prepared and the working, auxiliary and reference

electrodes have k e n immersed into the soil, the first of the four polarization tests is

initiated. The goal of this test is to obtain the Tafel diagram, and to extract fiorn it the

values of the Tafel constants, P, and Pa. The potentiostat used enables the technician to introduce the desired values of the

scan rate, the scan range, etc. The following variables were specified:

i 250 mV fiom Open Cicuit Potential. Eoc

Delay provided to attain E, 1000 s or 0.017 mVls (1 mVlmin)

IR drop compensation

Anodic area, i.e. metal surface area approximately 4.5 cm2 (subject to change)

Table 3.14 Values Specilied for Tafel Test

Once this test is cornpleted, a graph such as that illustrated in Figure 3.2 is

obtained. The values of P, and P. are obtained by plotting the anode and cathode lines

such that their slopes coincide with the dope of the Iinear Tafel regions.

P M * ,

Taiei C u m EOC 4 M W 7 S V

'jk96tfdtan 2Wl199512 10 20 h 4 5 2 l i n 2 E b 707g*uixO 27'RvEpur CWBm OFF D*ON Km*

Figure 3.2 Typical Tafel Plot

Once the values of p, and p, are obtained, the steel specimen is removed from the

soil and cleaned accordiig to the standard method. The specimen is then inserted into the

soil again, and the second test is initiated with the goal of determinimg the corrosion rate

of the steel sample by the method of linear polarization. ï h e following variables are

introduced into the program prior to polarization:

f 20 mV 6om Open Circuit Potential, Eoc

Delay provided to anain E, 1000 s or 0.0 17 mVls (1 mVlmin)

IR &op compensation On

Anodic area, i.e. metal surface area approximately 4.5 cm2 (subject to change)

Density of metal 7.87 g/cm'

Equivalent weight of metal 27.92 g

Table 3.15 Values Specilied for Linear Polarhtion Test

90

Once the test is completed, a curve such as that illustratcd in Figure 3.3 is

obtained. n i e value of %, IO, and the corrosion rate are obtained by plotting a line

whose dope coincides with that of the line in the region imrnediately sunounding the

point on the curve at which the current equals zero.

Once the corrosion rate is obtained, the saturated soi1 is discarded and the entire

process is repeated a second time with a îtesh sample of the same soil. n i e soi1 and the

steel specimen are prepared according to the specified methods, and the two tests are run

again to obtain new values of p, and p,, and then the corrosion rate. Table 3.16 gives the

values obtained for soi1 sample No. 96. The results obtained indicate that the procedure

followed yielded reproducible results. The values of the corrosion rate for each of the

soi1 samples tested are presented in Table 3.17.

I --

Figure 3.3 Typieal Linear Polsrizatioo Curve

Linear Polarization I Eoc (mV) -851 4,., (mv) -852.1 i,,, (A 10-6 A/cm2) 8.914 % (A 104-3 ohm c d ) 2.970

Table 3.16 Results Obtained for Soi1 Ssmple # 96

Comsi01 Rate

(-90

Table 3.17 Corrosion Rates

3.10 Calculating the corrosiviîy indices according to AWWA and PACE

The defmitions of the variables included in each of the corrosivity gids have k e n

discussed in Chapter 2, and reviewed in the previous sections of this chapter. This

section introduces the spreadsheets used to obtain the corrosivity indices quickly and

without error.

Figure 3.4 shows the spreadsheet used to calculate the corrosivity indices

according to AWWA and PACE. The values of the appropnate variables are entered in

lines A, C, and E, and the corrosivity indices are given automatically in lines D and G.

The corrosivity indeces of the soils are presented in Tables 3.18 and 3.19.

ANALYSIS OF SOIL CORROSIVITY

SOIL SAMPLE: JK-33

On'gin: St€anvtpot#lO Dale: WC6195 Descn'p(ion: Siltyclay,lightbrow

METHOD 1: AWWAC-105

* I f h pH is bewn6.5ard 7.5,ard sulides are preçentandlorthe redoxiç negadve,add 3poim: m c

METHOD 2: PACE

Boalvsls E

&2h 1 8 1.0 0.0 1.0 IF

INDEX 170.01~

Figure 3.4 Spreadshnt Used for Quick Calculation of Corrosivity Indices

95

Table 3.18 CorrosMty Indices According to AWWA

AWWA index -

Table 3.19 Corrosivity Indices According to PACE

CHAPTER 4: ANALYSIS OF EXPERIMENTAL

RESULTS AND DISCUSSION

4.1 Analysis of Preliminary Data

n i e statistical package SAS was used to analyze the data collected during the

experimental phase of this project. This data was presented in Chapter 3. Furthermore,

the information presented in this Chapter is selective and consists only of the material

deemed to be essential. Furthermore, Appendix D: Principles of Regression Anabsis is

included for the information of the reader, and it is recomrnended that Appendix D be

consulted pnor to readiig this chapter.

n ie analysis consists primady of regressing the variables, individually and in

combiiation, with the dependent variable. The dependent variable 01) in the analysis is

the corrosion rate obtained by the method of linear polanzation. This variable is denoted

'CorrRate', and it is considered to be the "bue" corrosivity of a soil. It was the

objective of this study to derive the relationships of the other variables with CorrRate,

both individually and in appropriate combiiations. Once the relationships between the

variables is understood, the importance of the chlonde content of a soi1 is evaluated, and

a decision is made on whether or no1 this variable provides suficient information to be

considered significant.

There is a total of 12 independent variables O(,), seven discrete and five

categoncal:

1. pHdir: pH of the soil, obtained by testing the soi1 in the state in which it is received in

the laboratory (discrete),

2. pHsot: pH of the soil, obtained by testing a portion of soil supenaturated with

distilled water (discrete),

3. Reddir: redox potential of the soil, in mV, obtained by testing the soil in the state in

which it is received in the laboratory (discrete),

4. R e m : redox potential of the soil, in mV, obtained by testing a portion of soil

supersaturated with distilled water (discrete),

5. Resdir: resistivity of the soil, in ohm-cm, obtained by testing the soil in the state in

which it is received in the laboratory (discrete),

6. Ressat: resistivity of the soil, in ohm-cm, once it had been saturated with distilled

water (discrete),

7. Chl: chlonde ion content ofthe soi1 in ppm (discrete),

8. Soilfype: categoncal variable representing soil m e (S for sand, SC for sandklay,

and C for clay),

9. Moisture: categoncal variable representing moisture content of the soil as it is

received in the laboratory @ for dry, M for moist, and S for saturated),

10. Stilfl: categoncal variable representing sulfide content obtained by testing the soi1

using a solution of iodine and Na3N (N for negative, T for trace, and P for positive),

11. Sul 'CI: categorical variable representing sulfide content obtained by testing the soil

using concentrated HCI and lead acetate paper (N for negative and P for positive), and

12. Drainage: categoncal variable representing ability of the soil to 'drain' water (E for

excellent, G for good, and B for bad).

Of the 12 variables, only 10 will be used in this analysis. Drainage will not be

included in any of the following analyses because the information it provides is alrnost

identical to that of the variable Soilfype. In the majonty of the cases, a sand will have an

excellent drainage ability, a sandlclay will have a good drainage ability, and a clay will

have a poor drainage ability. One of the two variables is therefore redundant, and it was

decided to retain the variable Soilfype. Furthermore, the variable SulfHCl will also be

eliminated fÏom the list because it is felt that errors were made during testing for this

parameter. As a consequence, the SulfHCl value is unavailable for many observations,

and this results in a decrease in the reliability of the results of the statistical analyses

obtained using this variable.

The analysis of the data is divided into the following sections:

Data Exploration

Transformation of Variables

Regressing Discrete Variables One At A Time

Correlation Matnx

RSQUARE Procedure

Includiig Categorical Variable

Variables Retained for Further Analysis

Determining Sigiificance

Discussion of Results

4.1.1 Data Exaloration

The first step in any analysis is the familiarization with the experimental data.

Each of the eight discrete variables is studied individually and the distribution of the

values are observed for signs of normality, outliers, skewness, etc. The distribution of the

data plays a very important role in ensuring that the results of regression analyses are

consistent. Furthermore, outliers are also very influential, and they must be identified and

observed during the course of the statistical tests that follow.

For each variable, the following information is extracted fiom SAS output files,

and examined:

a The number of observations, (N), the mean, the standard deviations, the variance, and

the skewness,

a The five highest and five lowest observations,

a The five quantiles, the range and the h i g e spread,

a The stem and leaf diagram, the box plot, and the normal distribution plot, and

a The outliers.

Figure 4.1 displays the information produced by SAS for the variable pHdir,

includig the stem and leaf diagram, the box plot and the normal distribution plot.

The pHdir values range between 4.2 and 8.8. The data are slightly negatively

çkewed, i.e. the mean is slightly smaller than the median. This generally indicates the

presence of outliers in the lower end of the distribution, and this is quite evident when

the stem and leaf diagram and the box plot are studied. There are seven outliers: one in

the upper end, and six in the lower. Besides the outliers, the data points seern to be well

distributed and the box plot appears to have a standard shape. Fially, the normal

distribution plot is not exactly linear, in fact it appears to be slightly curved. This

indicates a small deviation fiom normality. The usefulness of a transformation is

exmined in the next chapter.

Figure 4.2 displays the information produced by SAS for the variable pHsur,

including the stem and leaf diagram, the box plot and the normal distribution plot.

The pHsa! values range between 4.7 and 9.2. As withpHdir, the data are slightly

negatively skewed, with four outliers in the lower end only. Besides the outliers, the data

seern to be well distributed, with a relatively good box plot. n i e normal distribution

plot is a linle less c w e d than that ofpHdir, but a slight deviation fiom normaliîy is still

observable.

(9P 1 2 ' 6 P I l L ' 5 S $1 ( I L 1 6 ' 8 198 1 5 ' 5 6 ' 5 $5 15P 1 6 ' 8 (LZ 11.5 b.9 $ 0 1 IC 16 '8 P I 15 P ' 8 $06 15L 1 8 ' 8 I E Z I L ' b 9.8 8S6

sqo 2-@TH sqo asan01 6 ' 8 $66

L ' b UTU $0

I ' L 1 0 $52 L.L pan $OS 5 1 ' 8 CO ESL

I S l - < I d lWl=< ld O < WON

I i l l ' l d uean PX

SSJ GySo~Inx

ameyieli u n s

sa6n wns

Reddir

Figure 4.3a displays the information produced by SAS for the variable Reddir,

includig the stem and leaf diagram, the box plot and the normal distribution plot.

The Reddir values range between -528 mV and 320 mV. It may appear from the

various diagrams that the situation is unacceptable and that the distribution is not at al1

normal. This may not be the case. The very large extreme values (-528, -138 and 320

mV) may be the cause of the box plot and the normal distribution plot having such a

distorted form. The presence of these three observiitions force the diagrams to be drawn

with large intervals, and as such, the remainimg variables tend to be lumped together. A

clue to this can be drawn from the observation of the quantiles. If the extreme values are

ignored and only the values withh the hiige range are examined (behveen 4 3 and QI),

an equal number of observations are noted to be above and below the median. This is

characteiistic of the symmetnc normal distribution. In this case, the hiige range is equal

to 43-41 = 61.5 mV. Dividing this value in h o , gives a values of 30.8 mV. For a

normal distribution, the values obtained by adding and subtracting this value from the

median will correspond approximately to the values of 4 3 and Q1. The values obtained

by 187.5 f 30.8 mV are 218.3 mV and 156.7 mV. These values are very close to the

actual ones of 216.5 mV and 155 mV, and therefore, the variables are well distributed

within the hinge area. However, it cannot be concluded form the above test that the data

set is normally distnbuted.

In order to determine the normality of the vanable, the three extreme values are

removed and the process is repeated. Figure 4.3b displays of relevant information,

includig the stem and leaf diagram, the box plot, and the normal probability plot for

Reddir without the three extreme values. The data are slightly negatively skewed, with

13 outliers in the lower end. This is a somewhat high number. However, aside from the

outliers, the data points are well distributed and the box plot appears to have a standard

shape. Fially, the normal distribution plot is not quite linear, in fact it is significantly

curved. This indicates a deviation 60m normality which may be corrected by a

transformation in the next section.

Rebat

Figure 4.4a displays the information produced by SAS for the variable Rebat,


The Rebat values range between -475 mV and 282 mV. Once more, it may

appear fiom the various diagrams that the situation is unacceptable and that the

distribution is not at al1 normal. However, an examination of the quantiles results in

following: the value of half the hinge range is equal to 5912 = 29.5 mV and 174 i 29.5 =

206.5 mV and 144.5 mV. ïhese values are quite close to the true values of 196 mV and

137 mV, respectively. This was the result anticipated, and M e r analysis is therefore

warranted.

In this case, it appears that the cause of the distortion in the diagrams is the one

extreme value of -475 rnV. When this value is removed and the process repeated, the

resulting boxplot and normal distribution plot are greatly improved. The results obtained

are presented in Figure 4.4b. The data are slightly negatively skewed, with the outliers in

the lower end. However, aside fiom the outliers, the data points are well distributed and

the box plot appears to have a standard shape. Fially, the normal distribution plot is

fairly linear, except for the lower end outliers. This indicates a slight deviation from

normality.

O N P n N O P * N P N - N N N I

Resdir

Figure 4.5 displays the information produced by SAS for the variable Resdir,


The Resdir values range between 173 and 22080 ohm-cms. Once more, it appears

fiom the various diagrams that the situation is unacceptable and that the distribution is

not at al1 normal. Unlike the case of the redox potentials, the situation is not the result of

one or two extreme values. In fact, it appears that the entire set of values contributes to

the problem. This can be concluded form the examination of the quantiles. The value of

half the hinge range is equal to 925212 = 4626 ohm-cms and the predicted quantiles are

equal to 3750 + 4626 ohm-cms, that is 8376 and -876 ohm-cms. There is a very large

difference between these values and the actual quantiles reported in Table 4.5 and

therefore, it is not only the extreme points that contribute to the distortion of the

diagrams. but also the entire body of the values.

Furthermore, examination of the normal probability plot suggest that the majority

of the variables are concentrated below 5000 ohm-cms, but that there are a significant

number of points which are several orders of magnitude larger. It appears that a

logarithrnic transformation may be indicated. This will be studied in the next section.

Ressat

Figure 4.6 displays the information produced by SAS for the variable Ressat.


The Ressat values range between 73 and 183400 ohm-cms. As in the case of

Resdir, it appears that the whole set of observations contribute to the distorted shape of

the boxplot and normal probability plot. Examination of the quantiles rtveals the

following: half the hinge range is equal to 350412 = 1752 ohm-cms and 2259 I 1752 =

401 1 and 507 ohm-cms. These values are far fiom the calculated quantiles of 4800 and

1296 ohm-cms, respectively. Furthermore, examination of the normal probability plot

seems to suggest, as in the case of Resdir, that a logarithmic transformation may be able

to correct the normality problem.

Chloride

Figure 4.7 displays the information produced by SAS for the variable Chloride.


The Chloride values range between O and 22664 ppm. As in the case of Resdir

and Ressar, it appears that the whole set of observations contribute to the distorted shape

of the boxplot and normal probability plot. Examination of the quantiles reveals the

following: half the hinge range is equal to 42312 = 211.5 ppm and 190 i 211.5 = 401.5

ppm and -21.5 ppm. These values are far ftom the calculated quantile values of 481 and

58 ppm, respectively. Furthermore, examination of the normal probability plot seems to

suggest, as in the case of Resdir and Ressc!, that a logarithmic transformation may be able

to correct the normality problem.

CorrRate

Figure 4.8 displays the information produced by SAS for the variable CorrRote,


The CorrRate values range between 0.06 and 0.26 d y r . The data are positively

skewed, i.e. the mean is slightly larger than the median. This generally indicates the

presence of outliers in the upper end of the distribution, and this is quite evident when

the stem and leaf diagram and the box plot are studied. There are two outlien: 0.19 and

0.26 d y r . Besides the outliers, the data points seem to be well distributed and the box

plot appears to have a standard shape. Fially, the normal distribution plot is not exactly

linear, in fact it appears to be slightly curved. This indicates a small deviation ftom

normality. The usefulness of a transformation will be studied in the next section.

- o o o m n P N P 4 O P . " r < P N U ) - yo=- W N N

m N - o m - N

8 " N

n 0 1 C U U - 4 9 0 - - m C o l a - Eu] 3 4 0 S c & - - E E l l r n V A ~ t t

L E E Z Z O G Z z O k ; O P P U ) " P U ) O m m 2 2.1NW::Z;Z

m P P P - P Fin m m w m . N

DL W . r . , " N W .+ _ m . . * . O d " P 1 N N m "7

* 3 O

OOrnLD.? w - m 3 n O O i n r n P N O " m O ' m N N O i n N " I I N 3 O n 3 ' E

DI Y * 3 c C -",

a c o Y1 3U

X V C 4 O

a m f 1 3 . . z u s

I O Z D S m s s a s mou O Y 1 0 " ? 0 E I V O P in N - 2 ô 2

. , . , . * . , . , . , . , . * - a , , . , . , . , . * . , . , . , . , * + N I I I

. .

m m N N m c - P " . . . ? . . ~ N L i W O ~ O O r < Y I ~ O P r < O i O O

o 0 m m . O 0 0 , . O W W r n . . . d m . - Y I 0 0 0

m.. . N N 4

4.1.2 Transformation of Variables

There are rnany different ways to transform data. The values can be logged,

inversed, square-rooted, and so forth. Although a transformation can be applied to any

variable, each one seems to work best on a particular type of variable. Of al1 the

transformations, the one that may be of value is the logarithmic (or log) transformation,

which is typically applied to variables representing physical characteristics such as length,

weight and concentrations. The variables being studied represent concentrations (as in

the case of pH, redox potential and chlorides) and physical characteristics ( such as soi1

resistivity), and rnay be rendered 'normal' by the log transformation.

The variables which appear to be in most need of transformation: Chloride.

Resdir, and Ressat are studied first. The normal probability plots of these three variables

are very similar, and it is suspected that the same transformation can be useful for al1

three. The transformation proposed is the replacement of the original value with the

logarithm of that value. These new variables, referred to as LChl, LResdir and LRessat,

are presented in Tables 4.1, 4.2, and 4.3. The analysis performed on the original data

(Section 4.2) is repeated using the logged data, and the results are presented in Figures

4.9.4.10, and 4.11.

In the case of LResdir and LRessat, the distribution has greatly improved. The

shape of the boxplots is acceptable, and the normal probability plots are fairly linear. The

transformation is therefore considered a success and, fiom this point fonvard, the

variables LResdir and LRessat will be used instead of Resdir and Ressat.

In the case of LChl , the distribution has also improved dramatically. The boxplot

has a shape which is alrnost perfect, and the normal probability plot is very close to being

perfectly linear. This transformation is considered a success, and the variable LChl will

replace Chloride fiom this point fonvard.

The transformation of the remainimg variables: pHdir, pHsat, Reddir, and Redsat.

Like Chloride, these variables represent the concentrations: ions in the case of pH,

and oxygen in the case of the redox potential. However, unlike Chloride, the

concentration hm already been logged in obtaining the pH and the redox potential. n i e

formulae for obtainiing the pH and the redox potential, 4, of a solution are the following:

It can be clearly seen that the pH and the potential4 are not direct measurements

of the concentration, but represent the concentration indirectly given that these

concentrations have been logged. For this reason, it is considered unreasonable to

perfonn a second logarithmic transformation on these variables, which are already the

result of a logarithmic transformation. It is possible that a transformation will render the

data more attractive, but it must be understood why a transformation is performed. Any

data, through a series of transformations, may be made to exhibit characteristics of

'normality'. However, if these transformations cannot be justified or understood

intuitively, it is bener not to include them at all.

Finally, another variable which is considered a candidate for the log

transformation is CorrRate. The values of the new variable, denoted LCorr, are

presented in Table 4.4. The LCorr values were analyzed and the results are presented in

Figure 4.12. It c m be seen that the distribution of the data has not improved much. For

this reason, the original variable will be retained for the analysis to follow.

In conclusion, the following discrete variables will be used 6om this point

fonvard: pHdir, pHsut, Reddir, Redsat, LResdir, LRessat, LChl, and CorrRate.

LChl (ppm) - 3.813 3.994 3.474 3.951 2.464 2.173 2.255 1.255 0.845 1.204 1.477 1.380 1.000 0.699 1 .O4 1

1.623 1.903 2.117

3.294 2.201 1.544 1.954 1.892 2.806 2.857 3.3 11 2.467 2.816 2.744 2.072 2.771 2.525 2.913 2.342 2.43 1 -

- - LChl @Pm) - 1.903 3.264 2.190 2.225 1.908 3.364 3.662 3.039 2.396 2.639 1.580 1.663 1.763 1.886 1.732 1.908 2.170 2.045 2.170 2.855 2.538 3.513 2.090 2.880 2.651 3.307 2.730 2.592 2.626 2.204 2.580 3.129 3.855 4.099 2.322 1.748 2.610 1.556 2.196 -

>le4.1 \

- Soi1 #

- 79 80 8 1 82 83 84 85 86 87 88 89 90 9 1 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 -

ues of LC

- Soil #

- 1 2 3 4 5 6 7 8 9 1 O 1 1 12 13 14 15 16 17 18 19 20 2 1 22 23 24 25 26 27 28 29 30 3 1 32 33 34 35 36 37 38 39 -

- Soil #

- 40 4 1 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 7 1 72 73 74 75 76 77 78 -

- Soil #

- 79 80 8 1 82 83 84 85 86 87 88 89 90 9 1 92 93 94 95 96 97 98 99 1 O0 101 102 1 O3 104 105 106 107 108 109 110 1 1 1 112 113 114 115 Il6 117 -

es of LR

- Soil #

- Il8 Il9 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 247

-

Table 4 3 Values of LResrol

122

- m . 4 w w - N n . w n m . . - . . . a - N N "

Soil #

- 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2 1 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 -

Soil #

- 40 4 1 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 6 1 62 63 64 65 66 67 68 69 70 7 1 72 73 74 75 76 77 78 -

Soil #

- 79 80 8 1 82 83 84 $5 86 87 88 89 90 9 1 92 93 94 95 96 97 98 99 1 O0 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 -

Soi #

- 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 247

- Table 4.4 Valua of LCorr

124

rnN"YIN ""3 in0 I O N Y 3 I . . . . . . 0 0 3 3 0 0 0 , 8 ,

m w w P P m m m m 0 0 ~ 3 N : I I I I I I I I 8443-3

l , , , , z

4.1.3 Remession of the Individual Variables

Now that the distribution of each variables has been studied, it is tirne to study the

distribution of the residuols arising fiom the regression of the independent variables with

the dependent variable CorrRate. n i e first step is to regresç each X variable individually

and to observe the residual distribution. Here, the goal is to study the distribution of the

residuals for signs of anomalies, and to keee track of the outliee on Y. In the previous

section, the outliers obtained were outliers on the X variable only. In this section, the

outliers on the residuals are examined, i.e. outliers on the model chosen to fit the data.

It is most probable that the distribution will not be perfectly normal. This usually

suggests that another X variable should be added to the model to account for the variance

which was not accounted by the f i s t variable. In a later section, a complete model

(consisting of a cornbiiation of X variables) is regressed against CorrRate and a normal

distribution is anticipated. If this does not occur, it will be assumed that model is not

correct, and the search for the missing X variable to be added to the model will continue.

For each variable, the following information was extracted and examined:

a Number of observations N, SSE, PRESS statistic, R2, R2-adjusted, F-ratio,

a Outlien identified by a z-score > 3,

a Outliers identified by a Cook's Distance (CD) >l,

a Plot of residuals vs. predicted value of y ( e vs. y ),

Stem and Leaf diagram, box plot and normal distribution plot constructed from the set

of residuals.

pHdir and pHsat

Tables 4.Sa and 4.5b display the information related to the variables pHdir and

pHsut, respectively. Furîhermore, Figures 4.13a through 4.13d display the SAS output

for the variablespHdir and pHsot, includiig the e vs. y plot, the stem and leaf diagram,

the box plot, and the normal probability plot for the residuals.

Residual Informatiou: VariablepHdir 1

N SSE

PRESS statistic R'

R'-adjusted F-Ratio

Outlien with z > 3 Outliers with CD > 1

Table 45a Residual Characteristies:pHdir

Residual Information: Variable pHs@

PRESS statistic

Outliers ~4th z > 3 Outliers \sith CD > 1

Il

Table 4.5b Residual Characterktics: pHs01

Only 74 observations were available for regressing the variable pHdir against

CorrRate. ï h e increase in the error sum of squares was 8%, which is quite acceptable.

This value is obtained by the following equation:

% increase in the error surn of squares = PRESS - SSE SSE

The above percentage gives an idea of the ability of the equation to fit foreign data, i.e. to

fit data which were not a part of the observations used to create the equation itself.

Another way to measure this is to compare R2. to R2. The decrease in R2 is called the

shrinkage, and it represents the decrease in predictive power of the equation when used

on the population as a whole. in this case, the shrinkage is 9%, which is acceptable.

The F-ratio is used to determine if the variable is significant in predicting y. ï h e

value obtained for pHdi is equal to 10.61, which is substantially higher than the critical

value of 4.00 (see Figure F.l) and, as suspected, pHdir is considered significant in

predicting CorrRate.

The outliers are identified by two different methods: by the z-score of the residual

and by Cook's Distance (CD). When the z-score of a residual is larger than 3 andor

when the value of CD is larger than 1, the observation is considered a residual. For every

variable, observation #72 is considered an outlier with a z-score > 5. Furthemore,

observation #149 has a z-score close to 3 and should also be examined in future analyses.

Finally, normality of the residuals is determined using the e vs. y plot, and the

normal distribution plot. The e vs. y plot is used to identifi any underlying trends in the

distribution of the residuals. if the plot shows a particular pattern in the residuals, this

generally indicates that the proposed model is not a complete one, i.e. that not al1 the

variance in the system has been accounted by the model proposed, and that the addition of

another variables may be necessary. This can also be determined by studying the shape of

the cuve of the normal distribution plot. A straight line is characteristic of a normally

distnbuted set of values, and therefore, any non-linear c w e would indicates a deviation

from normality which may be corrected by the addition of another variable. in the case of

pHdir, the e vs. y plot does not show any clear trends in the distribution of the points.

However, it does show that the majority of the residual values are between i 0.05 mmlyr,

and that one point in particular (#72) is much higher than the nom. This can also be seen

on the boxplot where observation #72 is clearly an outlier, and on the normal probability

plot where the outlier is located far above the c w e . Fially, the cuve on the normal

probability plot is slightly c w e d which, as suspected, indicates that some other variable

should be added to the model ofpHdir alone.

in the case of pHsut, 74 observations were analyzed and revealed an F-ratio of

4.12. This value is larger than the critical value of 4.00 and, as such, pHsat is considered

to be significant in predicting CorrRate. Furthermore, there is an 8% increase in the error

sum of squares, and a 24% decrease in R~. From the above results, it is clear thatpHsat is

not as good aspHdir in predicting CorrRate, even though the distribution of the residuals

is very similar. However, because one variable is k i n g studied at a tirne, it cannot be

concluded that pHsat will not be better in predicting CorrRate when it is used in

combination with other variables. This will be discussed in a later section.

C - m w m - - Y ) P m - N T P O 3 m m o r - e W N P m n o - 0 m - r " L I . O . m n o ? ? Y & ? 193 0 0 1 0 0 0

X U C 4 n * L I - * I O S O Z s s s s s - 0 0 O Y I O Y I O E l u O P Y I N - :ô20

Reddir and Redsat

Tables 4.6a and 4.6b display the information related to the variables Reddir and

Redsat, respectively. Furthemore, Figures 4.14a through 4.14d display the SAS output

for the variables Reddir and Redsot, including the e vs. y plot, the stem and leaf diagram,

the box plot, and the normal probability plot for the residuals.

R Residual Information: Variable Re&

N SSE

PRESS statistic R2

R2-adjusted F-Ratio

Outliers with z > 3 Outliers with CD > 1

- --

Table 4.6a Residual Cbaracteristics: Reddir

Residual Information: Variable Rea3at I

N SSE

PRESS statistic R~

R~-adjusted F-Ratio


Table 4.6b Residual Cbaracteristics: Redsal

Only 74 observations were available for regressing the variable Reddir against

CorrRate. The F-ratio is equal to 0.324, which is far below the critical ratio of 4.00 and

which indicates that Reddir is not significant in predicting CorrRate. This is also

indicated by the value of 0.0044 for R*, which shows that the correlation between Reddir

and CorrRate is very low. n i e results obtained fiom Reddir appear to indicate that this

variable will not play an important role in future analyses. This will be s h o w to be true

in the following sections.

The e vs. y plot shows that the majority of the residuals are between k0.05 mm/y.

There are two points which appear to be located far fiom the rest: observation #72 and

#149. According to the z-score and Cook's Distance, the only bue outlier is observation

#72, but #149 should also be observed because of its hi& z-score. Fially, the normal

probability plot indicates a deviation fiom normality, as well as the presence of the two

outliers which are located far fiom the line.

In the case of Redsat, the results indicate that this variable performs worse than

Reddir. The F-ratio of 0.0012 clearly shows that this variable is not significant in

predicting CorrRate, and the extremely low value of R2 indicates that there is little

correlation between Redsat and CorrRate.

@ D O L I n N m " Y i r - m 4

Y I - 0 0 0 N N ? m Y - 0 . . . . O E C . , 0 0 0 0 . O . - < Y 3 0 0 1 1 1 0 1 Y l c o o ? 9 U

X V C .4 O 4 n w i . 4 Z U a E O E O Z

O P YI N 4 2 0 :

" , o . . m n i O Y i Y i 0 P m m ? P .<D

- 0 1 1 Y I N m m .r N O W I I

9 7 9

" m N m 4 m r - w u m m w m P N O n w m C i n N O " " " N O . O

. O 0 . O . 0 . . O 1 0

0 0 1 1

3 v i o r n m m m w m 3 P O O N W "O... Y n Y ) i n w < O Cm", c Y I N O - ' o m o a " " 0 0 0 O N 0

0 89888 .3g 0 1 1 1 0 0 1

LResdir and LRessai

Tables 4.7a and 4.7b display the information related to the variables LResdir

and LRessat, respectively. Furthemore, Figures 4.15a through 4.15d display the SAS

output for the variables LResdir and LRessat, including the e vs. y plot, the stem and leaf

diagram, the box plot, and the normal probability plot for the residuals.

Residoal Information: Variable LResdir

N SSE 1

PRESS statistic R2 1

R~-adjusted F-Ratio

Outliers with z > 3 Outliers with CD > 1 I Table 4.7a Residual Characteristics: LResdir

Residoal information: Variable LRersnf

N SSE

PRESS statistic R2

R2-adjusted F-Ratio


Table 4.7b Residual Characteristics: LRessal

in the case of LResdir, 70 observations were used. The F-ratio obtained is 2.553,

which is below the cntical F-ratio of 4.00 and indicates that the variables is not

significant in predicting CorrRate. ïhis is contrary to the expectation, considering the

importance of resistivity in the corrosion process.

There is a 39% decrease in the R2 value and a 14% increase in the error sum of

squares, when applying the equation to the population as a whole. These values are

somewhat higher that expected. Furthermore, observation #72 is identified as an outlier,

with observation #149 also has a high z-score.

The e vs. y plot shows that 90% of the residuals are between 10.04 d y r , with

only two residuals being above 0.06 mdyr. These two residuals also appear on the

normal probability plot as points located off the curve, as seen in Figure 4.16d. The

curve of the normal probability plot also suggested a slight deviation form normality.

in the case of LRessat, 74 observations were analyzed. The F-ratio of 2.598 is

also below the cntical value and therefore, contrary to what is expected, the analysis

shows that LRessat is not significant in predicting CorrRate. Furthermore, there is a 39%

decrease in the R2 value and a 4% increase in the error sum of squares. Once more,

observation #72 is identified as an outlier, with observations #149 and #42 exhibiting

nutlier behavior.

The e vs. y plot shows that 95% of the residuals are between 10.04 d y r , with

only two residuals above 0.06 d y r . in general, the results suggest that LRessat is better

that LResdir in predicting CorrRate. This is the expected result because CorrRate was

obtained by testing a soi1 which has been saturated with distilled water and, as such,

whose resistivity during testing was better represented by Ressat than by Resdir.

4 P N " , N W m m - .A c m - ~ w o i m U Y I N - - - N O - C P d 0 0 1 1 N O 4 : 2 9 9 0 9 m g 9 o ; O y ' y

- - - - - L I N - < I N " O m N - - m U w d - N m z m r < m m m

n m o - m m " W 4 . O O O i E I O . . . . Y 0 0 0 0

* O * m P - O N O m P * m m m o m m -

m m - n o - P 0 0 NI* 0 o n w o . '?Y 7 m o . o . - 0 0

O m o 0

LChl

Tables 4.8 displays the information related to the variable LChl. Furthermore,

Figures 4.16a and 4.16b display the SAS output, includiig the e vs. y plot, the stem and

leaf diagram, the box plot, and the normal probability plot for the residuals.

Residual informatiou: Variable LChl

N SSE

PRESS statistic R~

R~-adjusted F-Ratio


Table 4.8 Residual Characteristics: LChl

In total, 73 observations were used to analyze the relationship between LChl and

CorrRate. The F-ratio of 8.572 is much higher than the critical value of 4.00 and, as

such, this variable is considered very significant in predicting CorrRafe. The decrease in

R~ is only 12% and the increase in the error sum of squares is 7%. These value are quite

acceptable. Furthermore, observations #72 and #149 are identified as outliers, with #42

exhibiting outlier behavior. This can also be seen on the e vs. y plot, where 95% of the

residuals are between rt0.04 M y r , with only 'e two outliers above 0.06 M y r .

Fially, the normal distribution plot shows a slight deviation fiom normality which may

be corrected by the addition of another variable to the model.

The results analyzed up to this point suggest that the pHdir variable performs the

best, followed by LChl ,pHsut and LRessar. The variables which appear to add the least

information are Reddir and Redsar. It is not at al1 surprising that the pH valuc plays such

an important role, as the reduction of H' ions is one of the iwo reactions expected to

contribute to the corrosion problem. The other reaction expected is the reduction of 0 2

and, as such, the insignificant role of Reddir comes as a surprise.

The influence of the chloride content was also expected. Chlorides have a dual

effect on the corrosion rate. Firstly, they decrease the resistivity of the soi1 because they

are ions and conductive by nature, and secondly, they inhibit the formation of the

protective passive layer on the steel specimen. On the other hand, what is very surprising

is the insignificant role played by the variable LRessat. It was thought that because

resistivity indirectly measures the chloride content, as well as the general ionic content of

the soil, that LRessar would provide almost as much information as LChl . However, this

has not been s h o w yet.

4.1.4 Correlation Matrix

In the previous section, the relationship between CorrRate and each of the

independent variables was studied. One could easily proceed and regress each variable in

turn with al1 the others in order to identifi the extent to which the variables are

intercomlated. An alternative to this is to study the correlation matiix of the set of

independent vanables, plus the dependent one. The correlation matrix for the data under

study is presented in Figure 4.17.

The ideal situation is one in which the correlations between the dependent

variable (CorrRare) and each independent variable are high, and the correlation behveen

the independent variables themselves is low. This would result in the least amount of

multicollinearity, i.e. redundant information, and would lead to a situation where each

variable that is added to an equation would provide new information and would serve to

significantly increase the effectiveness of the equation.

An examination of the correlation between CorrRaie and the independent

variables will quickly reveal that the results are the same as those obtained in the previous

section: the pHdir variable perfonns the best, followed by LChl , pHsot and LRessar. The

variables which appear to add the least information are Reddir and Redsar. Another

important point is the sign of the correlation between CorrRafe and LResdir. One would

expect that the corelation would be negative, i.e. the higher the resistivity, the lower the

corrosion rate, but this is not the case. However, it must be kept in mind that the variable

LResdir was obtained by testing the soi1 in the state in which it was received in the

laboratory, which means that some soils were tested when dry and others were tested in a

saturated condition. The results obtained are therefore misleading.

Another important fact observed fiom the correiation matrix is the presence of

high correlations betweenpHdir and pHsa!, Reddir and Redsaf, and between LResdir and

LRessaf. As each of these pairs essentially measure the same soi1 property, it is not at al1

surprising to see high correlations. This indicates that the information provided by the

variables is essentially the same, and that only one of the two variables needs to be

included in a model. The choice of the variable to be retained will be discussed later.

Another correlation of particular interest is that between LChl and each of the

resistivity variables, LRessot and LResdir. The correlation between LChl and LRessaf is

very high, and this indicates that the two variables essentially provide the same

information. Although LChl is the better of the two variables, it remains to be

determined whether or not the extra information provided by LChl is sufficient to

consider this variable significant when the variable LResllir is already known.

The correlation matrix provides information about the interaction of the variables.

This is the first step in the determination of a model which describes the corrosion

phenomenon well. The next step consists of comparing the possible models, which will

be done using the RSQUARE procedure in SAS.

4.1.5 RSOUARE Results

The RSQUARE procedure in SAS is used to obtain a list of the 10 best 1-variable,

2-variable, 3-variable models, etc. This procedure is considered better than the stepwise,

fonvard, and backward regression procedures because it does not present one final model

as the best model. instead, it provides the analyst with a set of models that perform best,

and allows the analyst to compare the models and to select the one shown to be the most

logical.

Table 4.9 presents several 1, 2, 3 and 4-variable model which are made up of a

combination of variables which is considered acceptable by the analyst. in the case of the

one-variable model, the results obtained are similar to those obtained in the previous

exercises. As expected, the variables which appear to be correlated best with CorrRate

are the pH variables, and LChl . However, it is swprising that the variable LResdir

performs better than LRessat, and that LResdir appears in almost al1 the best 2 and 3-

variable models, when LRessat appears in r few. Furthermore, these two variables often

appear in the same models, which suggests that the information provided by each of the

variables is not necessarily repetitive. And finally, even though LResdir often appears

together with LChl, LRessat never does. These results will be considered M e r in a

later section.

Another important result is that pHsot andpHdir never appear in the same model.

This suggests that the information provided by one variable is not necessary when the

other variable is already in the model. This was an expected result. Furthermore, the

models containing pHdir almost always perform better than those containing pHsat. For

this reason, it can be safely concluded thatpHdir outperformspHsat.

Finally, Redrat does not appear in any of the models and, although Resdir does, it

does not appear to play a very important role. This is certainly a surprishg result that will

be considered further in a later section.

Possible

Variables in Mode1

PHDIR PHSAT

LCHL LRESDIR

LRESSAT

REDDIR REDSAT

LRESDIR LCHL

PHDIR LCHL PHDIR LRESDIR

PHSAT LCHL PHDIR LRESSAT

PHDR REDSAT

PHDIR REDDIR

PHSAT LRESDIR

PHDIR LRESDIR LCHL

PHSAT LRESDIR LCHL PHSAT LRESSAT LRESDIR

REDSATLRESDIR LCHL

REDDIR LRESDIR LCHL REDSATLRESSAT LRESDIR

REDDIR LRESSAT LRESDIR

PHDIR REDDIR LRESSAT LRESDIR

PHDIR REDDIR LRESDIR LCHL

PHSAT REDDIR LRESDIR LCHL

1,2,3, and 4 - Variable Models

4.1.6 Cateeorical Variables

Up to this point, only discrete variables have been considered. The influence on

the categoncal variables such as Soiltype, Moisfure and Sulflhave been ignored. One

way of includiig categoncal variable in the analysis is to transform each one into a set of

dummy variables which can then be treated like discrete variables (see Section D.5). The

results of such an analysis are not very obvious, and for this reason, a simpler exercise

will be performed to investigate the general effect of the categorical variables.

This exercise consists simply of calculating the correlation matrix and performing

the RSQUARE procedure on the data which has been sorted. For example, to examine

the effect of the variable Soiltype, the data is first sorted into the three categories: sand,

sandlclay, and clay. Then, for each of these three categories, the correlation matrix is

calculated and the RSQUARE procedure is performed. The correlation matrices

obtained for the variable Soiltype are presented in Figures 4.18a through 4.18~. The

following important points are extracted form the matrices:

a For clays, the variables which perform best arepHdir and LChl . Conversely, LRessat

performs very poorly. Furthermore, the variable Reddir performs quite well.

For sandclays, pHdir performs best, followed by LChl aid LRessaf which perform

equally well. The redox variables appear not to be very helpful.

For sands, the variables which perform best are LChl and LRessat. The pHdir

variables seems to perform poorly.

The above results suggest that the variable Soilfype plays an important role. For

example, the pH of a soi1 seems to be more important when the soi1 is a clay, and the

resistivity is more important when the soi1 is a sand. Furthermore, both variables are

important when the soil is a sandclay, and the chloride content appears to be important

irrespective of the soil type. It is therefore concluded that the variable Soiltype should be

included in future analyses, and as such, this categoncal variable will be transformed into

a set of three dummy variables and treated in the same manner as the other discrete

variables.

- d m PN.. - n m O m .

. O

O C l N m O N n O m O W . . O O

W m N m n N m m m O T .

. O

- n N N m N N n v O - . . O

W O N N - N - - " O - . . O O

- n N m . 4 - N O - 0 W . . O O

" " N N O N N O m O m . . O

O N O N

H ? . O "

I E

O N - N P - m 0 P, d - . . O

".a N C W N m O m O m i

. O

m m - N P N - * O O m . . O

O

m .+ N m N N P m m O - . . O O

L n " N m P N W" - 0 - . . O

- 0 N 4 " N m O m O m .

. O

O N O N

O O 0 . . O d

" " N N O N N O

2 ? . O

C :: Di

m 3 N O N . + O N O N N O O

h? X ? 8 0 ;O

O N L n d N O N " O N O N O 0 0 n o 0 . m . . O . O * O

W P N V N N m - N m m N m m O "

0 9 :? 8 0 & O

" N N w - N m . 4 - m w N C - m - P N N O N . - . & O & O

" r i N m m N m N N N C N P m r ( " " 9 ? . O . O

O O

" n N W m N N D I N m m - N m - " - 0 m o - . W .

& O & O

5 3 YI 10

P P cl A

P P n O n P N ' O 4 m m O

'29 X ? & O ;O

O N - P r > O N P N i O m m 0 0 0 ) o 0 . - 7 .

io o o

W N N m o n n d N * " " W O w m - 0 - m W . 0 .

o 0 à 0

N ~ N m m n W n N m m 3 m 0 W N m o O P

m . & O - . & O

" m N m o n O m N - 0 3 - O N m 'O N W N . 4 .

o 0 o o

" m N m w m n w N " N d N O w m " O " N W . m .

8 - à 0

" m N O N - P W N N P " m o m m m o n " L n . W .

à 0 à 0

O m N - A n DOON W N " - 0 - O

2 9 . O n ? & O

Correlacion Analysis

Pearson Correlation Coefficients / Prob > IR1 under Ho: Rho-O / Number of Observations

PHDIR PHSAT REDOIR REOSAT LRESDIR LRESSAT LCHL

PHDIR 1.00000 0.61974 -0.03843 -0.03715 -0.06464 -0.13583 -0.01416 0.0 0.0001 0.7826 0.7897 0.6660 0.3419 0.9223

54 51 54 54 47 51 50

PHSAT

REDDIR -0.03843 -0.06473 1.00000 0.82414 0.38833 0.37746 -0.41076 0.7826 0.6452 0.0 0.0001 0.0070 0.0063 0.0030

54 53 54 54 4 7 51 50

REDSAT -0.03715 -0.23635 0.82414 1.00000 0.22604 0.40600 -0.48817 0.7897 0.0884 0.0001 0.0 0.1266 0.0031 0.0001

54 53 54 54 47 51 50

LRESDIR -0.06464 -0.02610 0.38833 0.22604 1.00000 0.74594 -0.57108 0.6660 0.8633 0.0070 0.1266 0.0 0.0001 0.0001

47 4 6 4 7 47 47 47 15

LRESSAT -0.13583 -0.26105 0.37746 0.40600 0.74594 1.00000 -0.885a7 0.3419 0.0671 0.0063 0.0031 0.0001 0.0 0.0001

51 50 51 51 47 51 48

U H L -0.01416 0.20294 -0.41076 -0.48817 -0.57108 -0.88587 1.00000 0.9223 0.1620 0.0030 0.0003 0,0001 0.0001 0.0

50 49 50 50 4 5 4 8 50

Figure 4.18b SAS Outpuk Correlation Mntrix for Sand Snmplcs

- - - --

Y n ~ m n o m m m m n n m ~ m m o o m ~ m m O m -UII r w r w o c W Y C n n s n m r o r

n m m - W " '" O O m . . g":""N: : 9 g i o & O & O ,O & O ,O . O ;O

i N o r n N n O N " , Cr", r - N - 4 . 4 - O UI N m m 5 O ~ P n m p m n r NP Y O P ~ O P O r m m r " m r P I m m W O 0 0 O A W " o m " - n

O . . g ? 2 9 Z ? " O O . - - N . . O ,O & O ,O ;O . O X 0 X 0 O

c o m r n r o r n n ~ m w r - r O w - - r 3 0 m

2 Z = P N ~ P U I ~ P m o r O r m o r n n c O r m m C r - m o O

m - n r o n w n m o 0 0 WC)

W ' . O . - . O . P . g ? m . O - N .

4 O . O < O ;O ;O ,O ;O = X 0 O

O - N L P C o m n n n r o r r - o r - N w m m n o P n n r o m r N - P O P m o r W O P w w r

0 P W n a UIW m - 0 m o w o m - m - 3 w n r N m 0 0 m o m o r N W " . 3 . 0 . 0 . O . P . œ . O O

W . - . A O $ 0 Io i" O Io 4 "

c N O U I w O r - - w O P n n c ~ m w r r n n n m $ 2 g P g4P -0,. O P N d C mm,. m N c - - 0 1 m o O

0 " 7 - N O m o m., r m n m m N m W i n o m o m . d . w . y & ;x 2; 0 . 0 .

X 0 X 0 O . O

5 n g z mg: g : O - n - - U I o w o m n m

D P m N U I O 0 - r N U I P m n r " m "

D m o w 4 0 0 m o " W m m r m N r n - , , O : 9 !2P 2 7 zL4 O b

.& . O & O ' O & O & O & O 8 0 0 . . O

O w m w W O P m r r n r w n ~ n n n m !j O P m m r - m P U I n c m r c - n m r d m - ~n n r n w - r w o n z q w N O w P m o m 0.-

D W 4 . N . 4 . O . N . 9 8 & O ;O ' O ;O ;O ,O ;O

œ O n o m n ~ o m 0 4 - o m r N O - .. O P ",OP m m " n U I P n o r P P C O r n P 22: 0 O N O P m 3 m P W

P 8 9 % 9 Z 9 21 W., 2; ES: "" 4 . A . 0 ? 9, & O ' O & O < O ,O ;O ; O

W

2 ; 2 !j g 8 2 V1 i 8 I I n e 2 2 5 8

The results obtained fioz?. the RSQUARE procedure indicate that the variables

LChl and pHdir play a very important role, and that both the resistivity variables

conhibute new information. In some cases, LRessat performs bener than LResdir, and

in other the opposite is tme. Furîhermore, the variables LChl and LResdir often appear

in the same model, whereas LRessot rarely appears in a model wi?h LChl . It is becoming

clear that the variables LRessat and LChl provide overlapping information, and that

LResdir appears to represent some other inherent characteristic of the soil.

AAer a similar analysis involving the categoncal variables Moisture and Sulf7, it

was concluded that these variables do not introduce new information, nd it was decided

that they will not be considered in future analyses. In the case of the variable Moisture, it

is not at al1 surprising that the variable is unnecessary. Moisture is a measure of the

saturation state of the soil as it was received in the laboratory. This is not the state in

which the soil was tested to obtain the value of CorrRote, the dependent variable.

Therefore, the variable Moisture cannot be usehl in predicting the value of CorrRate

when it is so wholly unrelated to the conditions under which CorrRote was obtained.

The variable Sulji appeared to provide no new information, and was considered

not io be useful. However, it is believed that this conclusion is not one that would

necessarily apply to future studies. The sulfide content of a soil is generally considered

an important influence on corrosivity. in fact, sulfides (s*) are very corrosive and can

cause severe damage to metal surface. The fact that the variable SuIf7 does not play an

important role in this analysis may be a result of errors in the testing procedure. Another

reason why the effect of sulfides is minimized may be due to the fact that sulfides are

only measured quantitatively. Perhaps sulfide content should be measured with more

precision, as in the case of chlonde content. This sirnply means that it may be

insuficient to qualiQ a soi1 as containing either no sulfides (N), trace amounts of sulfide

(T), or a lot of sulfides (P). in the case of chlonde content, it was observed in Section

4.1.2 that the variable characteristics, pnor to the logarithmic transformation, were simply

unacceptable. The variable Chloride did not exhibit normality, and could not be used in

regression analyses. Like the chloride content, sulfide content is a concentration, and it

may be necessary to perform a similar transformation on this variable before it can be

used. This will be discussed further in a later section.

4.1.7 Variables Retained For Further Analvses

Up to this point, al1 of the discrete variables were included in the analyses.

However, it has become clear that some variables perform better than others. From this

point on, only the variables which are considered useful in predicting the dependent

variable CorrRofe will be retained. This step is one of "cleaning-up". Many different

variables were measured during the experimental portion of this project, but the reason

they were measured should be remembered. Not al1 of the variables can provide

information on the phenomenon measured by the variable CorrRate.

The first step is to choose between the pH variables. Which one of the two

represents most accurately the conditions under which the variable CorrRate was

obtained? PHdir was obtained by testing the soil as it was received in the laboratory,

whereas pHsat was measured when the soil was supersaturated. It is felt that the method

used to obtainpHsat is unacceptable. When the soil is mixed with water at a ratio of 1:1,

the soil is greatly beyond saturation. However, the soil tested for CorrRafe is just barely

saturated. in most cases, the soil received in the laboratory is already moist, and only a

small amount of water is added prior to testing for CorrRate. For this reason, pHdir is

considered to be the most representative variable of the two.

in the case of Reddir vs. Redsat, it is felt that Redsaf is not a good measure of the

oxidation-reduction potential of the soi1 because of the large quantity of water added to

the soil pior to testing. The water added contains a certain amount of oxygen which will

influence the readiig. For this reason, the variable Reddir is considered the most

representative variable of the two.

in the case of LResdir and LRessat, it was decided to include both variables in

future analyses. Although LRessat is the variable which represents the condition of

CorrRate testing most accurately, the variable LResdir appears to introduce information

that LRessat does not. It is suspected that LResdir represents some inherent property of

the soil which inîiuences its resistivity. This suspicion arises fiom the fact that the

majority of the soils obtained in the laboratory are already moist, i.a. they have a very

similar moisture content. However, they do not have a moisture content which optimizes

their conductive properîies until they are saturated, Le. until the movement of ions is

optimized. It may be possible that LResdir measures some conductive property of the soil

which is not the result of the movement of the ions in the soil. It is further suspected that

the property represented by LResdir may be the soil type, or the soil content. Perhaps

certain soil particles are inherently more conductive than others and that LResdir

measures this phenomenon. It is for this reason that both LRessat and LResdir are

retained for further analyses.

In summary, the following variables are retained for further analyses: CorrRare,

LChl , pHdir, LResdir, LRessat, Reddir, and Soiltype.

4.2 Consideration of Chlorides in Predicting the Corrosion Rate

4.2.1 Determinine Sienificance

Now that the general behavior of each variables is understood, it is time to

determine the significance of the chosen variables. The term signiycance refers to the

statistical significance of a variable. A variable is considered significant if the

information provided by this variable is sufficiently important, such that its addition to a

set of other variables increases the ability of the set to explain the phenomenon under

consideration. Determinimg significance consists of studying the results of a series of

ANOVA tables and detemining, at each step, whether the variable added is significant.

An ANOVA table pemits rapid calculation of the F-ratio and the correlation coefficient,

R~ (see Section D.4).

The variables on which attention is focused are the following: pHdir, LChI,

LRessat, LResdir and Soilrype. The goal of the analysis is to answer the following hvo

questions: '1s the variable LChl necessary when LRessat is already included in the

model?', and 'Does the variable LResdir provide the same information as Soiltype and, if

so, should it be included in a model containing the variable Soiliype?'.

The following information is entered in the ANOVA table, and is required to

determine significance of a model R:

a A benchmark model, a, to which rnodel R is compared,

a The error sum of squares, SSE, of each of the two models,

a The degrees of fieedom, DOF, of each of the two models, and

a The cntical F-ratio with which the calculated F-ratio is compared.

The SSE and the DOF of the possible models are listed in Table 4.10. The critical

F-ratio is obtained from Tables D.l . For the sake of simplicity, the cntical F-ratio will

Variable in Mode1

Intercept

Intercept + Soiltype Intercept + pHdir Intercept + LChl

Intercept + Soilîype + pHdir Intercept + Soiltype + LChl

Intercept + Soiltype + LRessat Intercept + Soiltype + Reddir Intercept + Soiltype + LResdir Intercept + pHdir + LResdii

Intercept + Soilspe + pHdu + LChl Intercept + Soiltype + pHdu + LRcssat Intercept + Soiltype + pHdir + LRedi Intercept + Soiltype + pHdir + Reddu

Intercept + Soilspe + pHdi + LChl + Lressat

DOF SSE

Table 4.10 Possible Modelr with Corrnponding SSE and DOF Values

161

be taken as 4.00 when we are comparing two models with a difference of 1 DOF, and

3.15 when comparing two models with a difference of 2 DOF's. These values correspond

to an a value of .O5 and a DOF of 60 for the R-model. This is a conservative choice.

There are an infmite number of ANOVA tables which can be constructed &om

these variables, but only the tables considered relevant are presented here. n i e model

tested (R) and the model against which it was tested (o) are presented along with the F-

ratio and the result of the significance test, i.e. whether it is significant or not.

The first step of the analysis is to determine if the variable Soiltype is significant.

This variable is selected first because of the need to determine if the variable LResdir

represents some aspect of Soiltype. The fdlowing ANOVA table results:

The resuits indicate that the variable Soiltype is indeed significant. The next step is to

determine which variables cm be added to Soiltype significantly. Each variable is tested

in tm, and the results indicate that only pHdir, LChl and LRessat can be added, with

variable pHdiï pirforming best.

The next step consists of determinimg which variables can be added to Soiltype

and pHdir significantly. Each of the remaining variables was tested in tum, and the

SOURCE DOF SSE MS F R~

results indicate that both LChl and LRessat can be added. As suspected, LResdir cannot

be added to pHdir and Soihype with significance. It would be interesting to see whether

or not LResdir could have been added topHdir if Soiltype were not already in the model.

The following ANOVA table results:

Difference ----- -------------- ---- interce*+ Soilwe(S2) --

intercept(o)

--- 2 0.00752 0.00376 4.70 0.12 68 70

0.05433 0.06185

0.00079

The results indicate that LResdir could have been added to pHdir if Soiliype were not

already in the model. It appears that these two variables contribute similar information.

Perhaps LResdir measures some property of the soi1 independent of conductivity related

to the ion content. LResdir was determined by testing a soil which was not saturated with

water, but was only moist. It may be possible that, in this state, resistivity measures the

conductivity of a soi1 due to particle charge, conductivity of certain types of soil particles,

or even the air content. Perhaps these soil properties are taken into account by the

variable Soiltype and, as such, one variable is unnecessaty when the other is present in

the model.

As stated earlier, both LRessat and LChl can be aaded to pHdir and Soilfype

sigiificantly, witn LChl performing better than LRessat. The question that m u t now be

answered is whether or not LChl is necessary when LRessai is already in the model, and

vice versa. The corresponding ANOVA table, in which the model pHdir+Soiltype+LChl

+LRessat is compared to the model pHdir+Soiliype+LChl follows:

Difference l 0.00000 0.00000 0.00 0.00 intercept + Soiltype + pHdir +

LChl + LRessat R - 0.04032 0.00062 intercept + Soiltype + pHdir +

LChl (o) 0.04032

It is very clear that the variable LRessar need not be added lo the model containhg

LChl . 1s the addition of LChl to a model containimg Lressat necessary? The following

ANOVA table illustrates the situation:

The results indicate that LChl need not be added to the model when LRessat is already

included. Although either one of the two variables can be added to the pHdir+Soiltype

model, when one of the two variables is included in the model, the other is not needed.

The better variable is LChl , but it is also a variable which is much more difhult to

obtain. As it was described in Chapter 2, obtainiig the chloride concentration is more

time and effort consuming. It was one of the objectives of this thesis to detennine

whether or not the determination of chloride concentration is essential to estimate the

corrosivity of a soil. If so, a suggestion would be made to incorporate the chloride ion

concentration into the existing grids to estimate corrosivity, i.e. PACE and AWWA.

It was initially suspected that the concentration of conductive chloride ions was

already incorporated in the resistivity measurement, but the effect of chloride ions is so

important that m e r research is warranted. In fact, is strongly believed that the effect of

chloride ions is very important and that, even though the presenr results show that

chloride ion concentration need not be added to the existing grids, the measurement of

chloride concentration provides invaluable information to the potential corrosion

problem.

in conclusion, the model which appears to provide the most information with the

least amount of redundancy is the following: Soiltype + pHdir + LRessar. For the

moment, it is suggested that the variable LChl need not be added to the existing

corrosivity grids. However, it must be remembered that the variable LChl performed

bener than the variable LRessat and that the only reason that the suggestion of replacing

LRessar with LChl was not made was because the chloride content is more difficult and

time consuming to obtain. If a soi1 testing laboratory is equipped to test for the chloride

ion content, then it is recommended that they do so. The information provided by this

parameter can be invaluable in certain cases.

4.2.2 The Effect of Removintr Outliers

Outliers can have a very large influence on the results obtained using regression

analysis. For this reason, outliers must be identified and studied carefully. It is incorrect

to simply eliminate an outlier from a data set simply because its behavior is different from

the rest of the data. in fact, in a data set made up of 150 observations, the presence of

one or two deviant observations is not unusual. A popular way of dealing with outliers is

to perform two separate studies, one includiig the outliers and the other excludiig them.

The results are then compared and the f i a l conclusions are drawn.

in this case, the outliers have been identified as observations # 72 and # 149.

These observations have unusually high corrosion rates which the variables studied were

unable to explain fully. This does not mean that they must be eliminated from the set.

On the contrary, these two soils should be studied further because they may provide

information unlike al1 the other soils. However, for the sake of completeness, the analysis

was repeated on the data set without the two outliers.

The conclusions drawn from the results obtained are very similar to those

obtained from the complete data set. Certain differences did appear and they warrant

some attention. These difference are:

a The variable Soiltype does not appear to be very significant, and is not included in the

f ia1 model.

a in general, the variable LRessat performs more poorly and, as a consequence,

a The variable LChl appears to be even more important than before.

The conclusion that would have been drawn fiom the above data set would be that

the variable LRessar is not usefu! in determining corrosivity, and that only LChl and

pHdir can predict the corrosivity of a soil. These results are difficult to accept. How can

the variable LRessat be useless? It is well known that the resistivity of a soil is a key

indicator to its corrosivity. Then why is this parameter badly represented in this data set?

Furthermore, can the two deviant observations be eliminated without M e r study? It is

felt that, in this preliminary analysis, the outlien should not be ignored. It is also

suggested that these two soils be studied M e r to determine what variable(s), which

have not been studied up to this point, are responsible for the corrosive nature of the soils.

Finally, it has already been determined that the variable LChl performs better than

LRessar. The goal of this report was not to convince the industry to begin testing for

chloride content. Any legitimate corrosion testing Company is already aware of the

importance of chloride ions in the corrosion process, and is most probably already testing

for this parameter. The goal of this report was to determine whether or not the

suggestion should be put fonvard to add this parameters to the existing grids. The results

obtained up to this point do not suggest this conclusively.

4.3 Power Analysis

The power of a statistical test is the probability of finding a variable significant

when it is in fact so. in cases when a variable is not significant, it is important to

determine the power of the statistical test. A low power may be the reason why a variable

did not prove to be significant, and consequently, the analyst rnay choose to disregard the

results obtained. in this case, the variable LChl proved not to be significant when the

variable LRessat was already in the model. The power will therefore be checked to

ensure that the probability of finding the variable significant is adequate. A power of .70

is generally considered acceptable.

When the variable LChl was added to the model consisting of pHdir, LRessar and

the two soil type variables, the value of the power parameters were as follows:

K1=4

ks= 1

K = 5

The value of N, the number of observations, is taken conservatively as 70 and the value

of L is determined to be 9.6 (see Equation D.13). The power is obtained by interpolating

between values obtained fiom Table D.8. The power of this statistical test is 0.87, which

means that there is an 87% chance of hd ing LChl significant, if it is so. ïhis result

indicates that the power of the statistical tests is not responsible for fmding LChl

insignificant when LRessat is already in the model.

CHAPTER 5:

CONCLUSIONS AND RECOMMENDATIONS

In total, 153 soils were tested for the following: pH, oxidation-reduction

potential, sulfide content, resistivity, soil type, drainage ability, moisture content, and

chloride ion content. Of these, 75 soils were tested uçing the method of linear

polarization, an accelerated electrochemical test used to evaluate the corrosion rate of

ductile iron embedded in soil. ïhis testing method proved to be a powefil tool in the

evaluation of soi1 corrosivity, and the applications in this field appear endless (see Section

5.2 for more details on possible future work).

However, certain limitations of linear polarization testing must be remembered.

The corrosion rate obtained using this method is the corrosion rate of the soil as it is

found during testing. Any future changes to the soil, or the presence of any extemal

influences affecting the corrosion rate, cannot be accounted for by the method of linear

polarization. For example, the following possibilities cannot be accounted for:

The presence of stray current corrosion,

0 Galvanic attack of a ductile iron pipe when connected to copper service laterals,

The future migration of chlofide ions from the surface to the depth of the embedded

metal, and

The potential establishment and proliferation of sulfate-reducing bacteria.

In sum, only the corrosivity ofthe soi1 itself is measured, and as such, this parameter must

be considered one part of a complete study in the determination of the corrosion potential.

5.1 Summary of Results

Each soi1 was tested accordiig to the AWWA Cl05 and PACE 82-3 Standards,

and additionally, the chloride ion content and the corrosion rate were obtained using the

linear polarization test. This data was analyzed using the Statistical Analysis System

(SAS) and the following results were obtained:

The variable which plays the most important role in the prediction of the corrosion

rate is the pH of the soil. Furthemore, of the two pH testing procedures, saturated vs.

unaltered, the method in which the soil is tested in its unaltered state proved to be the

bener predictor of the corrosion rate.

a The chloride ion content proved to be an excellent predictor of the corrosion rate.

Second only to pH, this variable was highly correlated with the corrosion rate, and

appeared in al1 the best predictor models. The information provided by the chlonde

ion content and the resistivity of the soil when saturated overlapped, and as such,

when one variable is included in a predictor model, the other variable is insignificant.

Although chloride ion content outperfomed soil resistivity, the additional information

provided by this variable was not significant enough to suggest that it be added to a

model containimg soil resistivity. Furthemore, the power of the significance test was

examined and was shown to be acceptable.

a Soil resistivity was retained instead of the chlonde ion content, because this variable

is currently being used in the industry standards, and it can be determined rapidly and

easily. Conversely, chlonde ion testing is t h e consuming and requires an

experienced technician. Furthemore, soil resistivity can be measured in-situ, whereas

chlonde ion content can only be measured under laboratory conditions. For these

reasons of practicality, it is suggested that chloride ion content not replace resistivity

in the existing standards.

a The important role of chloride ion content in the predictions of the corrosion rate bas

been established. It is therefore strongly recommended that this variable be tested

whenever possible. Although not included in the industry standards, the information

provided by this variable will provide a bener understanding of the soil and its

corrosive properties.

a Soil resistivity proved to be a good predictor of the corrosion rate, although not as

good as expected. Of the two testing procedures, unaltered vs. saturated, the

resistivity measured when the soi1 was saturated with distilled water represented the

bener of the two in predicting the corrosion rate.

The variable representing the resistivity of the unaltered soil, i.e. measured as

received in the laboratory, proved to be an insignificant predictor of the corrosion rate

when the soil s p e was included in the predictor model. It appears that when the

resistivity of a moist soil is measured, the result may indirectly represent some

inherent resistivity which depends on the soil type, e.g. the Lherent conductivity of

clay particles vs. that of sand particles.

The variable which proved to be the least important in the prediction of the corrosion

rate is the oxidation-reduction potential of the soil. This is a very surpriskg result,

given the importance of this parameter in the corrosion process. It is strongly

suspected that the method by which the soil is handled, the time the soil is exposed to

air before testing, and the addition of distilled water, al1 served to alter the potential of

the soil, and as a consequence, only a small correlation between the oxidation-

reduction potential and the corrosion rate was observed.

All statistical analysis steps were performed on two data sets: one containing al1

observations, and another from which outliers were excluded. Although the

numerical results varied, the conclusions drawn were essentially the same.

5.2 Recommendations for Future Wnrk

0 First and foremost, a new set of soil samples should be created in the laboratory. Soil

samples should have a predetermined pH, chloride content, sulfide content, oxidation-

reduction potential, resistivity, and clay content. This will enable the researcher to

study the effect of varying one parameter at a time, which is not possible in a set of

randomly selected soi1 samples.

0 The accuracy of the method of linear polarization can be studied further. Soil

samples obtained from sites of pipe failures can be tested using this method, and the

results can be compared to the actual record of 'lears to break" or "breaks per year".

This requires a well organized, long term study in which a minimum of 100 soi1

samples must be analyzed and the background information related to the cortoded

pipe m u t be gathered. Furthermore, a thorough knowledge of the other influencing

phenomena will enable the researcher to identify cases of stray current corrosion.

galvanic attack, and sulfide attack, which may cause the pipe to fail prematurely, and

which are not measurable using the method of linear polarization.

a The tests used to determine the sulfide content should be investigated further.

Inconsistencies in the results obtained fiom the two tests suggest that the tests may be

improved for future use of the AWWA and PACE standards. It is also strongly

suggested that the sulfide ion content be determined more accurately in future

laboratory experiments, and that the results obtained using the tests described in this

report may not be suficient to represent the true effect of sulfide ions in the corrosion

process.

a Study the various methods currently being used to determine the chloride ion content

(e.g. reading potentials, and titration). Determine which method is most accurate,

which is least subjective, and which is the least subject to human error. This project

is a very important one because it will create a proper base for further research in the

area of chloride content. Furthermore, in the testing procedure outlined in this report,

only the finest particles of soi1 were retained for the chloride ion test. The effect of

retaining only the fmest particles, as opposed to using a more representative

specimen, should be studied further.

a Based on the method of linear polarization, develop an in-situ test for corrosion rate.

a Repeat the study undertaken in this report with the following changes:

a Sulfide test is replaced by the actual sulfide ion concentration,

Soils are tested for pH, oxidation-reduction potential and resistivity

immediately after, or before, Iinear polarization testing, in order to ensure that

al1 test are performed under the same conditions,

Use the chlonde ion test which yields the most reproducible results, and which

is subject to the least human error,

Upgrade the soi1 type classification to include more types of soils, e.g. organic,

and silt, and

Include the various types of metals in the study.

a Consider incorporating temperature and moisture content in the existing grids to

account for seasonal variations in these factors. Using the method of linear

polarization, laboratoiy testing can be done to determine the effect of temperature and

moisture content on the corrosion rate, and the knowledge of insitu conditions will

enable the engineer to determine the potential nsk for corrosion with more accuracy.

Using the method of linear polarization, study the variation of the corrosion rate NI

rime. Time-dependent phenomena such as the development and subsequent

proliferation of a sulfate-reducing bacteria colony, can be studied by creating the

proper environment, and testing for the corrosion rate at given intervals.

BIBLIOGRAPHY

Parker, M.E., "Corrosion by Soils", NACE Basic Corrosion Course, National

Association of Corrosion Engineers, Houston, Texas, 1969, p. 6-1.

Uhlig, H.H, and Revie, R.W., Corrosion and Corrosion Conbol, John Wiley B;

Sons, New York, 1985.

Funahashi, M., and Young, W.T., "Investigation of E-LOG I Tests and Cathodically

Polarized Steel in Concrete", Proceedings of NACE Conference CORROSION-94,

paper no. 301, National Association of Corrosion Engineen, Houston, Texas, 1994,

p. 30111.

Sehgal, A.D., Kho, Y.T., Osseo-Aszre, K., and Pickering, H.W., "Reproducibility

of Polarization Resistance Measurements in Steel-in-Concrete Systems",

Corrosion, Vol. 48, No. 9, September 1992, p. 706.

Feliu, S., Gonzalez, J.A., Andrade, C., and Feliu, V., "Polarization Resistance

Measurements in Large Concrete Specimens: Mathematical Solution for a

Unidirectional Current Distribution", Materials and Structures, Vol. 22, 1989, p.

199.

Macdonald, D.D., Urquidi-Macdonald, M., Rocha-Filho, R.C., and El-Tantawy, Y.,

"Determination of the Polarization Resistance of Rebar in Reinforced Concrete",

Corrosion, Vol. 47, No. 5, May 1991, p. 330.

Lavrenko, V.A., a7d Shvets, V.A., "Determination of the Corrosion Activity of Soi1

in Relation to Steel by the Polarization Resistance Method", Institute of Problems

of Material Science, Academy of Sciences of the Ukraine, Kiev, Translated fiom

Fiziko-Khimichesknya Mekhanika Materialov, No.3, May-June 1992, p. 108.

Rogers, W.F., "Statistical Predictions of Corrosion Failures", Proceedings of

NACE Conference CORROSION-89 (New Orleans), paper no. 596, National

Association of Corrosion Engineers, Houston, Texas, 1989, p. 59611.

Fontana, M.G., and Greene, N.D., Corrosion Engineering, McCaw-Hill, New

York, 1967.

[IO] Wakelin, R.G., and Gummow, R.A., "The Effect of Copper on the Corrosion of

bon Watennains", Proceedings of NACE Conference CORROSION-90 (Las

Vegas), paper no. 383, National Association of Corrosion Engineers, Houston,

Texas, 1990, p. 38311.

[ I l ] ASïM, Standard Guide for Examination and Evaluation of Pitting Corrosion,

ASïM Specification G 46-94.

[12] Sears, E.C., "Cornparison of the Soi1 Corrosion Resistance of Ductile bon Pipe and

Gray Cast iron Pipe", Materials Protection, Vol. 7, No. 10, October 1968, p.33.

[13] De Rosa, P.J., and Parkinson, R.W., "Corrosion of Ductile iron Pipe", Water

Research Center External Report TR 241, United Kingdom, October 1986.

[14] Segal, B.G., Chemistry Experin~eni and Theory, John Wiley & Sons, New York,

1985.

[15] Zumdahl, S.S., Chemistry, D.C. Heath and Company, Massachesetts, 1989.

[16] Ailor, W.H., Handbook on Corrosion Testing and Evaluation, John Wiley & Sons,

New York, 1971.

[17] Oldham, K.B., and Mansfeld, F., "On the So-Called Linear Polarization Method for

Measurement of Corrosion Rates", Corrosion, Vol. 27, No. 10, October 1971, p.

434.

[18] Mansfeld, F., and Oldham, K.B., "A Modification of the Stem-Geary Linear

Polarization Equation", Corrosion Science, Vol. 11, 1971, p. 787.

[19] Stem, M., "A Method for Determinhg Corrosion Rates From Linear Polarization

Data", Corrosion, Vol. 14, 1958, p. 440.

[20] Townley, D.W., "Determination of Maximum Scan Rate for Linear Polanzation

Measurements", Corrosion, Vol. 47,No. 10, October 1991, p. 737.

[21] Oldham, K.B., and Mansfeld, F., "Corrosion Rates from Polarization Curves: A

New Method", Corrosion Science, Vol. 13, 1973, p. 813.

[22] ASTM, Standard Practice for Calculation of Corrosion Rates and Related

Information fiom Electrochemical Measurements, ASTM Specification G 102-89.

[23] ASTM, Standard Reference Test Method for Making Potentiostatic and

Potentiodynamic Anodic Polarization Measurements, ASTM Specification G 5-87.

[24] Fitzgerald III, J.H., "Evaluating Soil Corrosivity - Then and Now", Proceedings of

NACE Conference CORROSION-93 (New Orleans), paper no. 4, National


[25] Stroud, T.F., "Corrosion Control Measures for Ductile Iron Pipe", Proceedings of

NACE Conference CORROSION-93 (Las Vegas), paper no. 585, National


(261 Stevens, J., Applied Muhivariate Statistics for the Social Sciences, Lawrence

Erlbaum Associates, Mahwah, New Jersey, 1996.

[27] Draper, N.R., and Smith, H., Applied Regression Anaiysis, John Wiley & Sons,

New York, 1966.

[28] Montgomery, D.C., and Peck, E.A., Introduction to Linear Regression Ana!vsis,

John Wiley & Sons, New York, 1982.

[29] SAS Institute, SAS / STAT User's Guide, Volume 1, Version 6, SAS Institure bc.,

North Carolina, 1990.

[30] SAS Institute, SAS / STAT User's Guide, Volume 2, Version 6, SAS Institue Inc.,

North Carolina, 1990.

[31] Coben, J., "A Power Primer", Psychological Bulletin, Vol. 112, No. 1, 1992, p.

155.

APPENDM A:

DERIVATION OF POTENTIAL EQUATIONS

A.1 Equation for +N,.

The reduction of Zn is represented by the following equation:

The Nerst potential is obtained by substituting the appropriate values into the

following equation:

= +$ + 2.303 RTInF * log [a&[a,d] ( A 4

For the reduction of Zn, the following values are substituted into Equation A.2:

R = 8.314 Jldeg mole,

0 T = 298 Kelvin,

0 n = 2 electrons transferred,

F = 96500 C Ieq,

r [h,] = [zn2'] , and

[ared] = [Zn(s)] = 1, because the concentration of a solid is equal to 1.

Once the above values are substituted, Equation A.2 becomes:

hZn = hZnO + 0.059212 * log [zn2+]

A.2 Equation for 4~h.c.

The reduction of Cu is represented by the following equation:

The Nerst potential is obtained by substituting the appropnate values into the

following equation:

+N = c" + 2.303 RTInF * log [a.,,]I[a,d] (A.2)

For the reduction of Zn, the following values are substituted into Equation A.2:

R = 8.3 14 Jldeg mole,

T = 298 Kelvin, - n = 2 electrons transferred,

r F = 96500 C Ieq,

[h,] = [cu2'] , and

[a,,d] = [Cu(s)] = 1, because the concentration of a solid is equal to 1.


hcu = hcuO + 0.059212 * log [CU*']

A.3 Equation for eNB+

The reduction of H+ is represented by the following equation:

The Nerst potential is obtained by substituting the appropnate values into the

following equation:

= +d + 2.303 RTInF * log [a.,J/[a,,d] 64.2)

For the reduction of H', the following values are substituted into Equation A.2:

O n = 2 electrons transferred for each Hl released,

[a.,,] = [ p l 2 , and

[a,d] = partial pressure of Hz(g) = 1 atm, because the reduced species is a gas under

normal pressure.


h = ~ H + O + 2.303 RT/2F * log [ P l 2 / 1

Furthemore, given the following two facts:

a reduction of H+ is taken as the baseline potential, i.e. h + O = 0, and

a pH = - log [H+],

the fmal equation becomes:

APPENûIX B:

TESTING FOR CHLOFUDE ION CONCENTRATION

B.1 Creating a Concentration vs. Potential C u w e

The first step in creating a concentration vs. potential curve is to record the

potentials of five calibrating solutions of known concentration: 0.01%, 0.03%, 0.33%,

0.65%, and 1.3%. This is done a minimum of two times, and the average potential for

each calibrating solution is calculated. The standard deviation of the concentrations

obtained for each calibrating solution are then compared to the maximum values

permitted: 1.5 for 0.01%, and 1.0 for the remainimg solutions. An example of the

potentials obtained during a calibration exercise are presented in Figure B. 1.

For each calibration solution, the average potential is plotted versus the known

chlonde ion concentration, and a curve, such as the one presented in Figure B.2, is

obtained. 7 . e curve which fits the five data points best is an exponential one. The

equation of this curve is presented in the upper nght hand corner of Figure B.2.

The chloride ion concentrations of the soil samples tested are obtained from the

curve or from the equation in Figure B.2. The measured potential of the soil sample is

located on the curve and the corresponding concentration is obtained either fiom the

equation, or from the cuve itself. For example, for a potential of - 40 mV, the

following concentrations are obtained:

a From the curve, the concentration is approximately equal to 0.82 %

0 From Equation B.2, the concentration is equal to 0.1622 e = 0.816 %

The choice of which method to use depends on the degree of precision required.

LECïRODE: JACQUES-CARTIER

ATE: MAY 26,1995

AhWLES: JK-01 ïû JK-82

Trial 1 Trial 2 Trial 3 Trial 4 Trial 5 Trial 6 Trial 7 Trial 8 Tria1 9 Trial 10

'ALUES TO BE PLO'ITED: Potential Chloride (mv) Concentration

(%) -51.3 1.3 -34.8 0.65 -16.9 0.33 40.2 0.03 70.1 0.01

Figure B.1 Poteotisls Obtained for Calibrsting Solutions: Series 1

SERIES 1: Chloride Concentration vs Potential

Chbilde Conunmfhn (74

9 4 ;

Figure B.2 Calibration CUNC for Stries 1

APPENDiX C:

TRIALS FOR REPRODUCIBILITY

Trial nins were performed on the various soil samples in order to establish a complete

procedure for testing subsequent soil samples. Soi1 sample No. 123 is presented for

analysis. The corrmion rate was obtained twice, through two independent sets of tests.

Each test is composed of two parts: the Tafel test from which the values of P. and P, are

obtained, and the Liear Polarization test from which the corrosion rate is obtained.

The results indicate that the corrosion rate of a metal sample placed in soi1 No.

123 is equal to 0.1 19 mm/yr. This result was obtained fiom both Trial No.1 and Trial

No.2. The results of each test are presented in Figures B.3 through B.6. As it can be

seen, the results obtained fiom the hvo trial runs are very sirnilar, indicating that the

procedure followed yields reproducible results.

Tafel Cuve 'jkl23tLdia' 27n11995-12:59:34

Figure 8.3 Trial No. 1 : Tafel Results

Figure 8.4 Trial No.1 : Llnear Polarkation Raulb

Figure B.5 Trial No. 2 Tafel Results

Figure B.6 Trial No. 2 : Linear Polarhtion Rnults

APPENDlX D: PRINCIPLES OF

REGRESSION ANALYSIS

This chapter presents the techniques used in a n a l m g the data presented in

Chapter 3: Procedures and Apparatus, and includes the follouing:

Data exploration

Simple Iinear regression analysis

Data transformation

Multiple variable regression

Categorical data

Outliers

Variable Selection

Mode1 Validation

Power

The SAS Statistical Package

D.1 Data Exploration

Prior to begiming statistical analysis of the data using sophisticated computer

packages and advanced statistical techniques, it is vely important to be familiar with the

data set. Each variable should examined individually, and the following quantities

should be obtained for each 126291 :

a the usual descriptive statistics: number of data points, mean, standard deviation,

variance, skewness, etc.

quantiles including the median,

a stem-and-leaf diagram and box-and whisker plot, and

9 the nomal probability plot

Quantities such as the number of data points, the mean, and the standard deviation

of a variable are easily calculated; they provide considerable information about the data.

They are also the values that will be needed in subsequent calculations, e.g. the number of

data points, N, is a quantity that plays an important part of almost every statistical

calculation: it is used to determine significance in the ANOVA table. It is necessary to

calculate the power of a statistical test, and plays a key role in selecting the number of

variables that will make up an equation.

The quantiles obtained (median, upper and lower hinges, etc.) provide

considerable information about the range of values of a given variable. Are the values al1

within a limited range, or are they spread out? Are there any values that are remarliably

different form the general trend? Quantiles are also used to calculate the range outside

which a value is considered an outlier. The identification of outliers is extremely

important in statistical analysis.

Stem-and-leaf diagrams, such as the those presented in Figure D.l, are

constmcted with the values of a given variable, and can help sumrnanze the distribution

of the data in a visual way, which is usually more easy to understand [26291. Furthermore,

it makes the calculation of the quantiles, quantile ranges and outliers quick and easy.

Stem Leaf e 16 6 1 14 12 61 2 10 36 2 8 679 3 6 01148 5 4 O 1 2 0033683 7 O 2345 4

-0 75 2 -2 8497 4 -4 87 2 -6 900866 6 -8 30 2

-10 O 1 -12 80 2 -14 5 1 -16 S 1

----+----+----+----+

nnltiply Stem.Leaf by IO**-1

Figure D.l Stem and LeaiDiogram

It is almost always a good idea to display numerical information graphically

where it is possible. The s!ern-and-leaf diagram is an excellent tool for the graphical

display of the distribution of the data, but it contains more information than its ofien

needed. The box and whisker plot aims to display the elementary information (median.

hinges and outliers) pphically 126291. A box and whisker plot is illustrated in Figure

D.2.

Stem Leaf S 16 6 1 14 12 61 2 IO 36 2 8 679 3 6 01148 5 4 O 1 2 0023683 7 O 2345 4 -0 75 2 -2 8497 4 -4 87 2 -6 900866 6 -6 30 2 -10 O 1 -12 80 2 -14 5 1 -16 5 1

----*----*----+----+

Uultiply Srem.Leaf by 10--1

Figure D.2 Stem and Leal Dingram and Boxplot

The normal probability plot is a graphical display that permits the analys! to

determine if the data values art. distributed nomally. The observaticns are arranged in an

increasing order of magnitude and then plotted against expected normal distribution

values. The plot should resemble a straight line if nonnality is tenable 126291. Figure D.3

shows a normal probability plot.

Normal Probability Plot

Figure D.3 Normal Probabiliîy Plot

D.2 Simple Linear Regression Analysis

In simple tenns, l iear regression analysis is the action of fining a straight l i e to

the data. Simple regression analysis involves a dependent variable, Y, and an

independent variable, X. For each value xi observed, there is a correspondingvalue of y,.

The goal is to denve an equation that will link the values of xi and y,, with the least

amount of error possible. This is better understood graphically. Figure D.4 shows a plot

of Y vs. X. It is our goal to fhd the line which runs through these points such that the

vertical distances between the line and the y values are miniiized, i.e. the values of e,

are minimized. The equation relating the values of each observation of y and x of the

following fom:

y , = ( P o + P ~ x , ) + e , (D.1)

where (p, + Pi x, ) is the portion of y, predicted by the straight line, and e, is the portion of

y, that the straight l i e fails to predict, which is the error or the residual. The term Po represents the intercept, i.e. the value of y at the point where the straight line meets the

Y-axis. The term pi represents the slope of the straight line.

How can one

determine if the line

represents a good esthate of

the relationship between X

and Y ? The answer lies in

the study of the residuals, ei.

The method most commonly

used to calculate the

magnitude of the error of the

equation, is to add the

squared values of each of the

Y

PO I 0

O x, X

individual error. This value Figure D. 4 Y vs. X Plot

is referred to as the Error S m of Squares. or SSE:

SSE = z e,2 P .2 )

It should be noted that when the errors are squared, the effect of the larger values are

emphasized. The result of this is that outliers, whose residuals are high, can have an

enormous effect on the value of SSE and, consequently, on the best fit line. It is,

therefore, important that outliers not be ignored, but studied closely. The outliers will be

discussed later.

For the case when the line is to be chosen such that the SSE is minimized, there

exist a closed form solution for the parameters P,and PI, given by 126291:

where x' = the mean of the x values,

y' = the mean of the y values,

N = the number of observation of x and y,

L(xy) = the sum of the product of x and y for the N observations,

Lx =the sum of the x values for the N observations,

Ly = the sum of the y values for the N observations, and

Lx2 = the sum of the x2 for the N observations.

Besides SSE, other quantities are computed to determine the measure of fit of a

line. The variance of the estimate. S2 , represents the average of the squared errors

{L~,?(N-2)}, and the standard error of the estimate, Syh, is simply the square root of the

variance [26291.

Up to this point, it has been assumed that the variable X has an infiuence on the

value of Y, and an equation has been chosen which includes the variable X. It is not

always the case that a variable X provides any information about the behavior of the

variable Y.

Figure D.5 illustrates such

a case. The slope of the line, PI, is very small or even

insignificant. In fact, the straight

line which best represents the

points seems to be the horizontal

line which runs through the mean

value of y, y'. It will not always

be so obvious that a variable X is

insignificant in predicting Y. So,

how can one determine if the Figure D. 5 Example of an Inaignilicant Predictor

variable X is significant? This is done by comparing the SSE of the mode1 including X,

to the SSE of the model bfised only on the mean value of y, . This last model is referred

to as the benchmark model.

The equation relating X and Y using the benchmark model based only on the

mean value of y, i.e. the horizontal line, is the following:

where Po is equal to y'. The value of the error sum of squares is referred 10, in the case of

this model only, as SSY.

A measure of fit that is very cornrnonly used is the squared multiple correlation,

R2. This value represents how well a model predicts Y in cornparison to the benchmark

model. R' is calculated as follows [26291:

R'= SSY-SSE P.6) SSY

The value of R2 is always positive, and ranges between O and 1. For example,

R2 = 0.10 means that taking X into consideration in predicting Y will result in a 10 %

decrease in the error sum of squares, i.e. a 10 % irnprovement in the prediction of Y.

However, considering that this improvement is compared to a model which is based on

nothimg but the mean, it does not appear to be such an important improvement. 1s the

improvement important enough to consider the variable X significant? This significance

is determined by examinimg another important parameter in statistics: the F-ratio.

Pnor to introducing the F-ratio, it should be mentioned that the benchmark model

used in this computation does not have to be based only on y'. A model being tested can

be compared to any model that is a submodel of itself. This simply means that the model

being tested is an extension of the benchmark model. For example, if it is required to

prove whether or not a variable X3 can be added significantly to a model which includes

the variables XI and X2, then the benchmark model is the one that contains XI and X2,

and the model to be tested is the one containimg al1 three. From this point onward, this

benchmark model will be termed the w-model, and the model beiig tested as the Q-

model.

Like R', the F-ratio is a measure of the improvement of one model over another,

however the F-ratio takes into account the number of variables that were added to obtain

this improvement. For example, it is certainly bener if an investment of $100 yielded a

retum of $10 000, rather than if this return were obtained firom an investment of $1000.

The investment made can be compared to the variables added to obtain a better

prediction. It is preferable that fewer variables be added and, as such, the significance of

a model m u t be determined by taking into account the number of variables added. This

is done by including the degrees offeedom in the equation. The degree of fieedom of a

model is equal to the number of observations, N, minus the number of parameters k ing

fined by the model. The value of the F-ratio is calculated as follows [26291:

The ideal situation is a large drop in the SSE accompanied with a small &op in

the degrees of freedom. This will result in a large F-ratio. F=l is considered the baseline

performance. If the ratio is close to 1, this signifies that almost no significant

improvement was made, i.e. there has been no return on the investment made. Table D.l

shows the critical values of F that m u t be obtained in order to consider the R-mode1

significant. The term df for Numerator refers to the &op in the degrees of 6eedom in

going 6om the w-mode1 to the R-model. The term df error refers to the degrees of

6eedom of the R-model. If the F-ratio calculated is larger than the appropriate critical

value, the R-mode1 is considered significant 126291.

A tool that is used to allow the analyst to quickly calculate the values of R', F,

and SylX, and check al1 the relevant information at a single glance is the analysis of

variance table, or ANOVA table.

Figure D.6 shows a typical ANOVA table. The degrees of 6eedom and SSE of

the w-model, and the R-mode1 are entered into the table. The degrees of 6eedom and

SSE of the Diff-model, i.e. difference model, are obtained by subtracting the entries of the

R-mode1 6om those of the o-model. The mean squares (MS) for each model are

obtained by dividing the SSE by the degrees of 6eedom, and the value of SylX can be

obtained by taking the square mot of the mean squares. The R' value is obtained by

dividing the SSE of the Diff-mode1 by the SSE of the w-model, and the F-ratio is obtained

by dividing the MS of the Diff-mode1 by the MS of the R-model. The ANOVA table

permits rapid calculation of the relevant parameters, and presents the information in an

organized format.

(1) DOF (Diff) = DOF(R) - DOF(w) ( 5 ) MS (w) = SSE (w) 1 DOF (a) (2) SSE (Diff) = SSE(R) - SSE(w) (6) F = MS (DIFF) 1 MS (R) (3) MS (DIFF) = SSE @IFF)IDOF @FF) (7) R' = SSE O F F ) 1 SSE (O) (4) MS ( 0 ) = SSE (n) I DOF (n )

Figure D.6 ANOVA Table

Table D.l Critiral Values for

194

Table D.l (cont'd) Critical Valucs for FI2']

Table D.l (cont'd) Critical Values for F '261

196

Table D.l (cont'd) Crincal Values for F 1261

D.3 Data Transformations

Linear regression is used to calculate the best-fit equation relating a set of x

variables to a set of y variables, Le. Po, PI, and e, are determined. Once this equation is

obtained, the next step is to determine if this model is significant. When a model is being

tested for significance, there are certain assumptions that m u t be checked in order to

ensure that the result obtained is credible. Ifthese assumptions do not hold true, then we

cannot depend on the results obtained from the test.

n i e following fhree assumptions m u t be checked [26291:

r The set of residuals for al1 x values are normally distributed, with a mean value of

zero and a standard deviation of a.

As it can be seen in Figure D.7, when al1 the individual residuals, e,, are ordered and

ploned, they must exhibit normality. A normal distnbution has a mean of zero, and a

standard distnbution of O, with only 20% of the residuals falling outside of O f 20.

A tool which is very helpful in determining normality is the normal distribution plot,

as discussed in Section D.1.

l

Figure D.7 Normal Distribution

r For each individual x value, the y values are normally disiribuied, with a mean value

of zero and a standard deviation of a.

As it can be seen in Figure D.8, the y values must be distributed normally for each

value of x. Figure D.9 shows an example of data failing to meet this criterion. Again,

the normal probability plot is a useful tool in determining normality ofthe y values.

Figure D.8 Normally Distributed Y values

--

Figure D.9 Y Values Not Distributed Normally

r The residuals are distributed independenrly from one another.

This staternent implies that the behavior of the residuals independent fiorn one

another, e.g. the reason that one point has a high residual has nothing to do with the

fact that another point value has a high residual. However, this is not ofien true.

Most commonly, there is another factor not yet accounted, which would link the two

phenomena.

The first two assumptions

usually go hand in hand. If one holds

tnie, usually the other will as well.

When the raw data is received, Y is

usually ploned versus X and the

characteristics of the resulting cuve

are studied. The ideal situation is that

the plot resembles the one presented

in Figure D.lOa. Ideally, the

relationship between X and Y is a

linear one. However, this is not often

the case. When a non-linear

(a)

Figure D.lOa ldeal Y vs. X Distribution

relationship exists, such as those presented in Figures D.lOb and D.lOc, linear regression

cannot be used. Forhmately, with an appropnate transformation most relationships can

be made linear.

The most commonly used transformation is the logarifhrn of the data, either of

the x variable, or the y variable, or both, if necessary. It is generally ageed that the data

that responds best to this transformation is the data representing physical magnitudes such

as weight, temperature, concentration, length, etc. Furthermore, the data must be non-

negative, with values which are not very close to zero.

Other transformations include the following 126291:

reciprocal, where xi becomes llxi: usually for physical measurements,

a square root, where xi becomes dx,: usually for fiequencies,

arcsine, where x, becomes arcsinedxi: usually for proportions, and

log odds, where xi becomes log{ x 1 (1-x) 1: usually for proportions, where no O or 1

values are present.

Any data can be manipulated to eventually appear linear, however, the key to a

simple, effective transformations is the knowledge of the phenomenon being studied.

Figure D.lOb,c Non-linear Relalionsbips Behveen X and Y

D.4 Multiple Variable Regression

Multiple variable regession involves one dependent variable, Y, and two or more

independent variables, XI, X2, etc. The equation relating Y to the X's is of the following

form:

YI= Po + PI XII + B2 x12 + ... + PL; X A + e, (D.8)

The value of the error sum of squares, SSE is calculated accordig to the following

equation:

SSE = Z (y, - Po - PI X,I - xa - ... - Pli x,k )2 0 . 9 )

The parameters of the equation are obtained easily by making use of certain basic

principles of matnx algebra. This lengthy calculations are reserved for s o h a r e packages

such as the SAS, which will be introduced in a later section.

The concepts presented in Section D.l on simple linear regession also apply to

multiple variable regression. Although more difficult to visualize, multiple regression

can be thought of as fitting a line through a set of points in a three dimensional space, or

one of a higher dimension. The residual can be thought of as the distance in the y-

direction between a point in space and the line.

As in simple regession, the SSE is a measure of the accumulated error of a model

predicting Y. In multiple regression, SSE is used to determine which combination of

variables best predicts Y. This can be best explained using an example:

n i e gas consumption (Y) of 45 automobiles ir studied. The independent variables

considered to best predict gas consumption are the weight of the automobile (W), and the

automobile length (L). The analyst m u t determine which of the two variable best

predicts the gas consumption, and whether or not both variables should be used together.

The analyst begins by obtainiig the equation, and the SSE, of al1 the possible

combinations of L, W, and the intercept O. The following results were obtained:

Table D. 2 Inlormation about Possible Models

The results indicate that the best one-variable model is the one consisting of only

the weight, because it has the smaller SSE. The best two-variable model is the one

consisting of weight + intercept. Finally, the best (and only) three-variable model is the

one consisting of al1 three variables. It is obvious that the lowest SSE is obtained for the

model in which al1 three variables are involved. This will alwqw be the case. However,

the analyst must decide whether or not adding a variable produces a decrease in the SSE

which is significani. The first step is to decide between the one-variable model, and the

two-variable model. The following ANOVA table shows al1 the relevant information:

The F-ratio obtained is larger than the critical F-ratio, therefore the addition of

the weight is significant and so therefore, the two-variable model is retained. The next

step is to compare the two-variable model with the three variable model to determine if

the variable L is significant. The following ANOVA table shows al1 of the relevant

information:

n i e F-ratio obtained is smaller than the critical F-ratio, therefore the addition of L

is not significant. This means that the best model to predict gas consumption is the one

containing only weight and the intercept.

It sliould be noted, that the one-variable model containimg only the intercept had

been used initially, and checked whether is was significant to add L to the model, the

answer would have been affirmative. Continuing the exercise to check whether adding W

to the two-variable model would be significant, it would have been noted that it would

not be so. The conclusion would have been that the model consisting of automobile

length and the intercept was the best model. An explanation to the significance of the

various models follows.

When two dependent variables, such as L and W, provide redundant information,

it seems easy to understand that only one of the two variables will be needed in the

model. One of the two might be bener than the other, as W is in this case, but in the

absence of this variable, the second one may provide almost as much information. This

concept is called multicollineari/y, and it can be bener understood with the aid of the

following Vem diagrams presented in Figures D.ll. Figure D.l l a shows the case of one

independent variable, X. If each of the two circles is of unit area, the shaded area

represents R*, or R ~ ~ ~ , i.e. the propoition of the variance in Y that can be explained by

the variable X. In the case when X is insignificant in predicting Y, the shaded area is

very small. Conversely, the higher the correlation between X and Y, the larger is the

shaded area.

Difference Intercept + W + L (R)

Intercept + W (a)

1 42 43

1 29 30

1 0.69 0.70

1.45 < F,

Figure D.1 lb shows the case when two independent variables are involved. ï h e

total proportion of the variance in Y that can be explained by the two variables. Xi and

X2, is equal to the total shaded area. This value is called the squared multiple correlation,

R ~ , , ~ . The proportion of the variance of Y accounted for by X2, with XI partialled out,

is indicated by the shaded area in Figure D.1 lc. This represents the extra information

that X2 provides when Xi is already in the equation. It is referred to as the squared partial

correlation of X2 with Y and with XI partialled out, R ~ ~ , , . It is easy to see that the less

XI and X2 overlap, the higher the usefulness of each of the two variable, and the larger is

the proportion of Y accounted on an overall basis [261.

Figure D.ll Venn Diagrams lor 1 and 2 independent variables'16'

A good tool for examinimg the extent of overlapping is to shidy the correlation

matrix of a set of variables, includiig the dependent and the independent variables. One

such matrix is presented in Table D.3. The ideal situation is to have high correlations

between the Y variable and each of the X variables, and to have low correlations between

the X variables themselves. This will most probably result in a large part of the variance

of Y being accounted, Le. a large growth in R' as each of the variables is added to the

model.

Table D.3 Correlation Matrix

D.5 Categorical Variables

The techniques studied so far take only discrete variables into account. Discrete

variables represent a specific quantity, e.g. a pH of 7.4 or a chioride content of 4763

ppm. Parameters such as the mean, standard deviation, and SSE can be calculated for

such variables, and a linear equation can be determined. But how can this be achieved for

categorical variables which represent a category instead of a specific quantity, e.g.

soiltype (sand, clay, or sand/clay), sulfide content (positive, trace, or negative) ? This is

obtained by "expanding" the categorical variable into an appropriate number dummy

variables. This can be best explained with an example.

The variable Soilgpe is a categorical variable with the following classes: sand,

clay, and sand/clay. In the present form, the variable cannot be studied in the same way

as the discrete variables. For this to be possible, categorical variables such as this one

must be "expanded" into a set of dummy variables. The number of dummy variables to

be created will depend on the number of classes. in this case three classes exist and the

analyst can choose to use either two or three variables. in the two variable case, the

dummy variables will assume the following values [261:

Table D.4 Dummy Variables for Soiltype: 2 Variable Case

In this case, the class 'clay' is considered the reference category against which the

behavior of 'sand' and 'sandklay' are assessed. In the three variable case, no reference

category exists, and the dummy variables will assume the following values:

Table D.5 Dummy Variables for Soiltype: 3 Variable Case

Sand

SandIClay

Clay

The difference between the discrete and categorical variables is the effect they

have on the final equation relating Y to the X's. For example, if the discrete variable

'pH' and the categorical variable 'soilSpe' are used to predict the variable 'CorrRate', the

following overall equation would result (for the mode1 with three dummy variables):

n i e variable pH has an effect on both Po and Pi , i.e. on the intercept and the dope

of the l i e . However, the dummy variables cm be viewed as having a direct effect on

only the intercept because they assume a value equal to O or 1. For example, for a sand

the equation would become:

CorrRate = (Po + Pr) + Pi pH 0.11)

1

O

O

O

1

O

O

O

1

In essence, the term p, representç the 'jump' in CorrRate resulting 6om the soi1

being a sand. Similarly, P,, and P, represent the jumps resulting 6om a sandlclay and

clay, respectively.

The technique of creating dummy variables for coding categoncal variables is

used to extend the use of multiple regression analysis to include variables that could not

be included othenvise. Other techniques are also available, e.g. interaction variables, and

non-linear combiations relating x and y. These methods did not prove to be useful in

this project, but could be beneficial in future research on the subject. References 29 and

30 should be consulted for M e r information on these techniques.

D.6 Outliers

Outliers are data points that split off, or are very different 6om the rest of the data.

They can occur because of two fundamental reasons: (1) a data recording or entry error

was made, or (2) the subjects are simply different 6om the rest. The first type of outlier

can be identified by always listing the data and checking to ensure that the data has been

entered accurately. The amount of time it takes to list and check the data for accuracy is

well worth the effort, and the computer time is minimal.

Statistical procedures in general can be quite sensitive to outliers. This is

particularly true for the regression techniques. It is very important to be able to identify

outliers and then decide how to consider them. This is quite important, because the

results of the statistical analysis m u t reflect most of the data, and not to be highly

influenced by just one or two errant points f261.

Outliers can have a very large effect on the correlation coefficients, R,. Figure

D.12 shows graphically how the inclusion of an outlier can drastically change the

interpretation of the relationship between X and Y. In case A, there is no relationship

without the outlier, but there is a strong relationship with the outlier. Convenely, in case

B the relationship changes 6om strong, without the outlier, to weak when the outlier is

included.

Figure D.12 Eîîert olOutliers on R' [261

Besides the graphical method, outliers can be detected by studying z scores. For

each variable being studied, the z score can be calculated as follows:

(D. 12)

where z,, = the z score of observation i for variable j,

x , = the recorded value of observation i for variable j,

pJ = the mean value of the observations of variable j, and

a, =the standard deviation of the observations of variable j.

If the variable is approximately nürmally distributed, then z scores with absolute

values near 3 should be considered as potential outliers. This is because, in a distribution

which is normal, about 99 % of the scores should lie withii three standard deviations of

the mean. Therefore, any z score value larger than 3 indicates a value very unlikely to

occur. Of course, if the number of observations is large (Say >100), then simply by

chance, it may be reasonable to expect a few subjects to have z scores of over three.

However, the above rule is generally considered reasonable lZ6'.

Up to this point, the measurement cf the outliers on the predictor variables, Xj,

have been considered. The Z scores can also be calculated for the residuals obtained

when a model is fitted to the data. These standardized residuals are used for fmding

observations whose predicted y values are quite different from their actual y value, i.e.

they do not fit the model well. As in the previous case, an observation whose

standardized residual is greater than three in absolute value is considered an outlier 1261.

Altematively, an outlier can be defmed as a point, which if deleted, can produce a

substantial change in at least one of the regression coefficients. That is, the prediction

equations with and without the point are quite different. A quantity that measures this

change is the Cook's distance (CD). Unlike the z scores which identi6 the outliers on Y

or on the X's individually, Cook's distance measures the combined effect of a point being

an outlier on Y and on the set of predictors. Cook and Weisberg (1982) indicate that a

CD, > 1 would generally be considered too large, and would therefore identify probable

outiiers [261.

Once the outliers are identified, a decision m u t be made on whether or not the

errant point should be eliminated fiom the set. ïhis action m u t not to be taken lightly,

and without serious consideration. if one fin& after further investigation of the outlying

points thzt an outlier was due to a recording or entry error, then of coune, the appropriate

correction should be implemented and the analysis m u t be repeated with the corrected

data. However, if the errant data is due to an instrumentation error, then it is legitimate to

drop the outlier. However, if none of these appear to be the case, then one should not

drop the outlier, but report two analyses (one including the outliers and the other

excluding it). Outliers should not necessarily be regarded as 'bad'. As a matter of fact, it

has been argued that outliers can provide some of the most interesting cases for further

research [261.

D.7 Variable Selection

The number and type of variables, which should be included in a model, needs to

be considered. Most ofthe methods of model selection are strongly based on the concept

of multicollineanty and semipartial correlations, which were introduced in Section D.4.

Prior to introducing the techniques for mode1 selection, it mus1 be emphasized that

the single most important tool in selecting a subset of variables for use in a model is the

knowledge of the area under study. Furihermore, it is important for the investigator to be

judicious in the selection of predictors. If too many variable are used, the prospects of

cross validation may be influenced negatively. The analyst can exercise hislher judgment

in the creation of new variables fiom the existing ones. if, for example, the analyst

knows that two different variables essentially measure the same thing, a new variable may

be created by averaging them, or by adding the z scores of the two. An alternative is the

removal of one of the variables fiom the set.

A quantity which measures the extent to which a variable provides redundant

information is the variable infIationfactor ( VIF). which is based on the calculation of

the correlation between the independent variables only Each independent variable is

regressed in tum against the remainimg X's, and the correlation is obtained. A high

correlation indicates that the remainiig X variables account for a large amount of the

variation in the variable under study. This means that the variable provides little

information that the remaining variables do not already provide, i.e. it provides redundant

information. It is suggested that a variable be removed if VIF > 10. Variables should be

eliminated one at a tirne, and the new VIF values should be calculated prior to the

removal of any subsequent variables [261.

The methods most commonly used to select a mode1 are the forward, backward

and stepwise selection procedures. Al1 these procedures involve examining the

contribution of a predictor with the effect of the other predictors partialled out, or held

constant. Through the use of semipartial correlations, as was obtained in the ANOVA

tables presented in Section D.4, the correlations among the predictors are disentangled

and the unique variance of each predictor related to the variance of y is determined.

The automobile example of Section D.4 is a good example of the fonvard

selection procedure. The first predictor that enters the equation is the one with the

highest simple correlation with y. If this predictor is significant, the predictor wiîh the

largest semipartial correlation with y is considered, etc. At some point, a given predictor

will not be significant and the procedure will be terminated. In the forward selection

procedure, once a variable enters the equation, it is not removed [261.

The stepwise procedure is basically a variation of the forward selection procedure.

However, at each stage ofthe procedure, a test is made for the least useful predictor. The

importance of each predictor is constantly reassessed, and a predictor that may have been

the best entry candidate earlier may now be superfluous, and is removed.

The backward selection procedure involves the removal of predictor fiom an

equation initially containing al1 the predictors. At each step, the partial F-ratio is

calculated for every predictor. The smallest value is compared to the critical F-ratio, and

the appropriate variable is removed. The new equation is computed, and the process

continues until al1 insignificant variables are removed.

The forward, stepwise and backward selection procedures do not necessarily

propose the sarne final model. In general, the stepwise procedure is considered the best

of the three methods because it verifies al1 of the variables at each step and removes the

one(s) that are redundant. A mistake that is commonly made by analysts is to consider

the final mode1 proposed by these methods as the besr model possible. This is not the

case. The model proposed may be just one of the many models which provide the best

prediction for Y. For this reason, these methods are limited in their use. The one

technique that appears to offer the analyst with the most choice in the model is the R-

square procedure. This technique does not propose a model, but simply lists the 10 best

combinations of one-variable, two-variable, three-variable models, etc., ranked according

to their overall RZ value. n i e analyst can then compare behveen the various

combinations with high R2 values, and is free to use hisher judgment is selecting the

most reasonable model.

It is generally agreed that the number of variables, k, to be included in an equation

depends on the number of observations, n, in the data set. The rule of thumb proposed is

to chose k such that n/k > 10 [261. Another criterion often used is Mallows' C,. This

measure was introduced by Mallow (1973) as a criterion for selecting a model. It

measures the total squared error, and chooses the model(s) where C, = p, where p = k+l.

For these models, the amount of underfitting andor overfitting is minimized, i.e. there are

neither too many nor too few predictors in the equation. Mallows' Cp is given in the

output file created whenever a SAS program is used to propose a model [16'.

It is suggested that al1 the of above methods be examined individually prior to

deciding on a final model. However. ultimately it is the knowledge of the researcher of

the phenomena under study that will ensure that the best and the most reasonable model

is selected.

D.8 Model Validation

It is cmcial for the researcher to obtain some measure of how well the regression

equation will predict on an independent sample of data, i.e. can the equation be

generalized? There are essentially three ways of validating a model: data splitting,

computing the adjusted R2, and the PRESS statistic.

Data splitting involves randomly splitting the available data into two parts

(roughly 113 and 213). The regression equation is denved using the so-called derivation

data (2/3), and then applied to the other set of data, the validation data. The predicted

values of y for the validation data are compared with the recorded y values, and the

correlation between the two sets is calculated. This correlation rrpresents how well the

equation works on an independent sample of data Iz6'.

The adjusted R' value measures the shrinkage in predictive power. Shrinkage

refers to the decrease in R~ as it is measured in the sample with the equation derived 6om

it, versus what it would be in the population as a whole using the same equation.

Certainly the equation will not predict as well. The adjusted R2 value estimates how well

a prediction equation derived from one sample, would work on the population sample, i.e.

the theoretical sample consisting of al1 possible data points. It does not indicate how well

the denved equation will predict for the other samples 6om the same population. The

adjusted R' value of the population is compared to the R2 value of the sample, and the

percentage of decrease is noted

in many cases, there is not enough data to permit random splitting. One can still

obtain a good measure of the predictive power by the use of the PRESS statistic

bredicted residual sum of squares). in this approach, the y value for each observation is

set aside and a predictive equation is denved with the remaining data. This is done for

each of the n observations, and as a result, n prediction equations are derived and n true

errors are determined. The PRESS statistic is simply the sum of the squares of these

errors. Unlike the SSE, the PRESS statistic is more representative of the true error

because the equation of the line was obtained without the observation under study, Le. the

line was not fined to this particular point prior to computing the error.

D.9 Power

Type 1 error, or the level of significance (a) is the probability of rejecting the null

hypothesis when it is me , i.e. fmding a variable to be significant, when in fact it is not 12&

311. The a level set by the experimenter is a subjective decision, but it is usually set at .O5

or .O1 to rninimize the probability of making that kind of error. There is, however,

another type of error that can be made in conducting a statistical test: type II error,

denoted P, which is the probability of accepting the nul1 hypothesis when it is false, i.e.

finding a variable to be significant when in fact, it is not. Not only can either of these

errors occur, but they are inversely related. An example of the two-group problem with

15 observations follows:

Table D.6 Relationrhip behveen a, P. and Power '26'

The entries in the last column, (1- P), is called the power of the experiment, and it

is the probability of rejecting the null hypotheses when it is false, i.e. fmding a variable to

be significant when it is. Depending on the circumstance, power analysis can be

undertaken before or after the data has data has been collected and analyzed. For

example, if a researcher is going to invest a lot of time and money in canying out a study,

then he or she would certainly want to have a hi& power, Le. a high probability of

finding what they are looking for if it is really there. Altematively, if a researcher has

already cornpleted a study and has found that a certain variable is insipificant, it is

important to know whether or not the power was high enough. If the power was low, the

chances of fmding significance may have been too low, and as such, significance was not

found even though it may have been there. A low power may lead the research to make

false conclusions about the significance of a variable [261.

The power of a statistical test depends on the following factors ['? - The a level set by the experimenter,

c The sample size n, and

The effect size, i.e. to what extent is the effect ofthe variable observable.

Power is heavily dependent on the sample size. For example, for a medium effect

size and an a = .05, the power of a test for different values of n is presented in Table D.7.

- --

Table D.7 Relationsbip between n and Power lZ6l

As the above example suggests, when a sample size is large, power is rarely a

problem. It is only when small sample sizes are evaluated that power cm influence the

results obtained.

The effect size is usually classified as small (f a 0.2), medium (f ' = 1.5), or

large (f ' > 3.5) ['Il. A large effect size is usually associated with a phenornenon which,

when present, is very easy to detect. In general, the effect size of phenomena are

considered medium. The equation relating the sample size n, the effect size f , and the

number of variables in the R-model K. is ['Il:

n = L + K + l (D. 13) F

where L is a parameter which depends on the a value chosen, the difference in the

number of variables between the R-model and the o-model (k~), and on the power of the

statistical test. L is obtained by consulting tables ruch as Table D.8 for a = .O5 ["l.

It is generally considered tme that a sample size of 50 or Iarger is sufficient to

detect a medium effect, i.e. the power of the test would be approximately 0.70.

D.10 The SAS Statistical Package

The Statistical Analysis System (SAS) was selected for use in this project because126.2%301.

** It is very widely distributed,

*O It is easy to use,

*O It can be used for a veIy wide range of analyses, fiom very simple statistics to

complex multivariate analyses, and

** It is a well documented package, having been in development and use for over two

decades.

Essentially, the SAS program reads a file created by the analyst and performs the

various analyses requested. Stmcturally, a SAS program is composed of three

fundamental blocks: the staternents setting up the data, the data lines, and a series of

procedure (PROC) statements which describe the statistical analyses to be performed on

the data entered 129.301.

For a list of the procedures and a complete description, it is suggested that the

reader refer to two volumes: the SASISTAT USER'S GUIDE, VOLUME 1 and 2 [29.301.

The most preferred volume in this project is VOLUME 2 which contains the fundamental

regression procedures. However, it is suggested that both volumes be consulted to fully

understand the scope of this statistical package, and becorne familiar with al1 of the

possible techniques that may be used to analyze the data.

Table D.8 Values of L for a = 0.5 13']

prediction of soil corrosivitynlc-bnc.ca/obj/s4/f2/dsk3/ftp04/mq29604.pdf · suggested: ph,...

Documents