prediction of soil corrosivitynlc-bnc.ca/obj/s4/f2/dsk3/ftp04/mq29604.pdf · suggested: ph,...
TRANSCRIPT
PREDICTION OF SOIL CORROSIVITY
USING LINEAR POLARIZATION
by
EUGENIA KALANTZIS
Department of Civil Engineering and Applied Mechanics
McGill University, Montreal
May 1997
A THESIS SUBMITTED TO THE FACULTY OF GRADUATE STUDlES
AND RESEARCH IN PARTIAL FULFlLLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF
MASTER OF ENGINEERING
O Eugenia Kalantzis, 1997
National Library l*l of Canada Bibliothèque nationale du Canada
Acquisitions and Acquisitions et Bibliographic Services sewices bibliographiques 395 Wellington Sbeet 395. rue Wellington Ottawa ON Kt A ON4 OMwa ON KIA ON4 canada CaMda
The author has granted a non- exclusive licence allowing the National Library of Canada to reproduce, loan, distribute or sel1 copies of this thesis in microform, paper or electronic formats.
The author retains ownership of the copyright in this thesis. Neither the thesis nor substantial extracts î?om it may be printed or othenvise reproduced without the author's permission.
L'auteur a accordé une licence non exclusive permettant a la Bibliothèque nationale du Canada de reproduire, prêter, distribuer ou vendre des copies de cette thèse sous la forme de microfichelfilm, de reproduction sur papier ou sur format électronique.
L'auteur conserve la propriété du droit d'auteur qui protège cette thèse. Ni la thèse ni des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation.
This report presents the results of a study on the benefit of chlonde ion testing in
the prediction of soil comsivity, which is determined using the method of iiiear
polarization.
Existing indusüy standards such as AWWA Cl05 and PACE 82-3 are currently
being used to evaluate the comsivity of soils. ïhese standards consist of various tests,
whose results permit the calculation of a comsivity index. ïhe following tests are
suggested: pH, oxidation-reduction potential, suifide ion content, resistivity, drainage
ability, soil type, and moisture content. Up to this point, no standards have incorporated
chloride ion testing into theû testing procedure, even though the effect of chlonde ions on
the corrosion rate is well documented. It is the goal of this project to determine whether
there is enough evidence to suggest that chlonde ion content be introduced into existing
standards.
In total, 153 soils were tested following the AWWA Cl05 and PACE 82-3
standards, as well as for the chlonde ion content. Of these, 75 soils were tested using
linear polarization to determine the "huee' corrosivity of the soils.
The analysis results showed that the information provided by the chlonde ion
content was not significant enough to suggest that this variable be added to the existing
grids. This is due to the fact that soi1 resistivity, which is a required test in both
standards, accounts for the presence of chlonde ions. However, it should be noted that
the chlonde ion content is a bener predictor of corrosivity than soil resistivity, and it is
suggested that chlonde ion content be tested whenever possible.
Ce rapport présente les résultats d'une étude sur la nécessité de déterminer la
teneur en chlorures dans I'évaluation de la corrosivité des sols, cene dernière étant
obtenue par la méthode de la polarization linéaire.
Présentement, les normes utilisées par l'industrie sont basées sur des grilles
d'évaluation permettant le calcul d'un index de corrosivité, e.g., les grilles AWWA Cl05
et PACE 82-3. Chaque g N e d'évaluation est constniite à partir d'une série de tests,
notamment le type de sol, le pH, le potentiel rédox, la teneur en sulfures, la résistivité, le
drainage, et l'humidité du sol. lusqu'à date, aucune norme n'inwrpore la teneur en
chionires dans sa grille d'évaluation, même si l'effet des chlomres sur la vitesse de
corrosion est bien documenté. l'objectif de cette étude est donc de déteminer si la
teneur en chionire devrait être introduite dans les grilles d'évaluation.
Au total, 153 spécimens de sols varies ont été testés selon les grilles AWWA
Cl05 et PACE 82-3, ainsi que pour la teneur en chionires. De ce nombre, 75 ont été
testés par la méthode de la polarization linéaire, et la corrosivité de ces sols a été ainsi
formellemnt déterminée.
Après l'analyse des résultats des tests, il a été determiné que l'information
additionnelle fournie par la teneur en chionires n'a pas été significative pour suggérer
que ce paramètre soit incorporé dans les grilles existantes. Ceci est dû au fait que la
résistivité du sol, qui est une mesure déjà incluse dans les deux normes, représente
indirectement sa teneur en chionires. Par contre, il est important de noter que la teneur
en chionires est une variable qui prédit mieux la corrosivité du sol que la résistivité.
Cependant, i l est fortement suggéré que la teneur en chlorures soit évaluée et étudiée si
possible.
iii
TABLE OF CONTENTS
ABSTRACT
RÉsm LIST OF FIGURES
LIST OF TABLES
NOTATION AND ABBREVIATiONS
ACKNOWLEDGMENTS
1. INTRODUCTION
2. CORROSION AND CORROSION CONTROL
2.1 Principles of Electrochemical Corrosion
ii ... 111
viii
xi ...
Xl l l
xvii
1
4
4
Necessary Elements for Corrosion 4
Physical Foms of Corrosion 7
2.1.2.1 Uniform Attack 7
2.1.2.2 Galvanic Attack 8
2.1.2.3 Crevice Corrosion 1 O
2.1.2.4 Pitting Corrosion 11
2.1.2.5 Erosion Corrosion 11
2.1.2.6 Selective Leaching 12
2.1.2.7 Stress Corrosion 13
Why Do Metals Corrode? 14
Deteminhg the Rate of Corrosion 16
The Exchange Current Densiîy, i, 22
Detemination of r,, and 4, 23
2.1.6.1 Activation Polarization 23
2.1.6.2 Concentration Polarization 25
Effect of Varying Parameters Using Polarization Diagrams 27
2.1.7.1 PO2 and H' Concentration 27
2.1.7.2 O2 Solubility
2.1.7.3 Multiple Corrodents
2.1.7.4 Galvanic Attack
2.1.7.5 Passivity
2.1.7.6 Chloride Content
2.2 Measuring Corrosion Rates
2.2.1 Tafel Extrapolation
2.2.2 Liear Polanzation
2.3 Soi1 Corrosion and Its Effects on Underground Infrastmcture
2.3.1 Differential Aeration Cells
2.3.2 Galvanic Anack
2.3.3 Selective Leaching
2.3.4 Stress-Corrosion Cracking
2.4 Standards for Determining Corrosivity of Soils
2.4.1 AWWA Cl05
2.4.2 PACE 82-3
3. PROCEDURES AND APPARATUS
3.1 Soil Samples
3.2 Soil Type
3.3 Drainage AbilityIMoisture Content
3.4 pH
3.5 Oxidation-Reduction Potential (Redox Potential)
3.6 Resistivity, p
3.7 hif ide Content
3.8 Chloride Concentration
3.8.1 Necessary Equipment
3.8.2 Sample Preparation
3.8.3 Electrode Preparation
3.8.4 Preparation of Calibrating Solutions
3.8.5 Calibration of Electrode
3.8.6 Calibration C w e and Equation
3.8.7 Testing Soil Samples
3.8.8 Determination of Concentration of Chloride Ions of Soil
3.9 Linear Polanzation
3.9.1 Necessary Equipment
3.9.2 Trial Runs and Reproducibility of Results
3.9.3 Sample Preparation
3.9.4 Preparation of the Working Electrode
3.9.5 Polarization ofthe Steel Specimen
3.10 Calculating the Corrosivity Indices According to AWWA and PACE
4. ANALYSIS OF EXPERIMENTAL RESULTS AND DISCUSSION
4.1 Analy~is of Preliminary Data
4.1.1 Data Exploration
4.1.2 Transformation of Variables
4.1.3 Regression of the Individual Variables
4.1.4 Correlation Matrix
4.1.5 RSQUARE Results
4.1.6 Categoncal Variables
4.1.7 Variables Retained For Further Analysis
4.2 Consideration of Chlorides in Predicting the Corrosion Rate
4.2.1 Determiniig Significance
4.2.2 The Effect of Removing Outliers
4.3 Power Analysis
5. CONCLUSIONS AND RECOMMENDATIONS
5.1 Summary of Results
5.2 Recommendation for Future Work
APPENDM A: DERIVATION OF POTENTIAL EQUATIONS
A.l Equation for Czn A.2 Equation for b c u
A.3 Equation for b+
APPENDIX B: TESTING FOR CHLORIDE ION CONCENTRATION
B.l Creating a Concentration vs. Potential Curve
APPENDIX C: TRIALS FOR REPRODUCIBILITY
APPENDIX D: PRINCIPLES OF REGRESSION ANALYSIS
D.l Data Exploration
D.2 Simple Linear Regression Analysis
D.3 Data Transformations
D.4 Multiple Variable Regression
D.5 Categorical Variables
D.6 Outliers
D.7 Variable Selection
D.8 Model Validation
D.9 Power
D.10 The SAS Statistical Package
vii
LIST OF FIGURES
The Two Stages of Crevice Corrosion
Pining Corrosion
Erosion Corrosion
Microstructure of Gray Cast Iron
Intercrystalline Crack
Transcrystallie Crack
Stable and Unstable Positions
Schematic of a CdZn Banery
4 vs. log 1 for CdZn Banery
Polarization Diagram for Corrosion in Acidic Solution
Polarization Diagram for Corrosion in Neutra1 Aerated Water
Dependence of I,,, on the Value of 1,
Activation Polarization Diagram
Concentration Polarization Diagram
Distribution of H' in Time
Effect of Varying PO2
Variation of 02 Solubility with NaCl Concentration
Effect of Multiple Corrodents
Galvanic Anack
Polarization Diagram of a Metal Exhibiting Passivity
Variation of 1, and 1, with Potential4
Tafel Curve
Schematic of Setup for Tafel Test
Tafel Curve Obtained by Varying Q
Tafel Regions
Obtaining i,, fiom the Tafel C w e
L i e a r Polarization Curve
Components of the Working Electrode
Typical Tafel Plot
Typical Linear Polarization Curve
Spreadsheet for Quick Calculation of Corrosivity Indices
SAS Output: Univariate Procedure using pHdir
SAS Output: Univariate Procedure usingpHsat
SAS Output: Univariate Procedure using Reddir
SAS Output: Univariate Procedure using Reddir,
with Extreme Values Removed
SAS Output: Univariate Procedure using Redsar
SAS Output: Univariate Procedure using Redsat,
with Extreme Values Removed
SAS Output: Univariate Procedure using Resdir
SAS Output: Univariate Procedure using Ressar
SAS Output: Univariate Procedure using Chloride
SAS Output: Univariate Procedure using CorrRate
SAS Output: Univariate Procedure using LChl
SAS Output: Univariate Procedure using LResdir
SAS Output: Univariate Procedure using LRessar
SAS Output: Univariate Procedure using LCorr
SAS Output: Univariate Procedure usingpHdir Residual
SAS 0utput:pHdir Residual vs. Predicted Value of CorrRare
SAS Output: Univariate Procedure usingpHsat Residual
SAS 0utput:pHsat Residual vs. Predicted Value of CorrRaie
SAS Output: Univariate Procedure using Reddir Residual
SAS Output: Reddir Residual vs. Predicted Value of CorrRoie
SAS Output: Univanate Procedure using Redsat Residual
SAS Output: Redsat Residual vs. Predicted Value of CorrRate
SAS Output: Univanate Procedure using LResdir Residual
SAS Output: LResdir Residual vs. Predicted Value of CorrRate
SAS Output: Univariate Procedure using LRessat Residual
SAS Output: LRessat Residual vs. Predicted Value of CorrRate
SAS Output: Univariate Procedure using LChl Residual
SAS Output: LChl Residual vs. Predicted Value of CorrRate
SAS Output: Correlation Matrix
SAS Output: Correlation Matrix for Clay Samples
SAS Output: Correlation Matnx for San6 Samples
SAS Output: Correlation Matrix for SandClay Samples
Potentials Obtained f?om Calibrating Solutions: Senes 1
Calibration Curve for Senes 1
Trial No. 1: Tafel Results
Trial No. 1: Linear Polarization Results
Trial No. 2: Tafel Resilts
Tnal No. 2: Linear Polarization Results
Stem and Leaf Diagram
Stem and Leaf Diagrarn and Boxplot
Normal Probability Plot
Y vs. X Plot
Example of an Insignificant Predictor
ANOVA Table
Normal Distribution
Normally Distnbuted Y Values
Y Values Not Distributed Normally
Ideal Y vs. X Distribution
Non-Linear Relationship Between X and Y
Venn Diagrams for 1 and 2 Independent Vanables
Effect of Outliers on R'
LIST OF TABLES
Elecîromotive Senes
Galvanic Senes for Seawater
Typical p Values
Soil Type Results
Drainage Ability Results
Moisture Content Results
pH-direct Results
pH-saturated Results
Redox-direct Results
Redox-satwated Results
psaturated Results
p-direct Results
Sulfide Content Results Using lodine Solution
Sulfide Content Results Using HCI and Lead Acetate Paper
Preparation of Calibrating Solutions
Chloride Ion Concentrations
Values Specified for Tafel Test
Values Specified for Linear Polanzation Test
Results Obtained for Soil Sample No. 96
Corrosion Rates
Corrosion indices According to AWWA
Corrosion Rates According to PACE
Values of LChl
Values of LResdir
Values of LRessat
Values of LCorr
Residual Characteristics: pHdir
Residual Characteristics: pHsut
Residual Charactenstics: Reddir
Residual Charactenstics: Redrar
Residual Characteristics: LResdir
Residual Characteristics: LRessat
Residual Characteristics: LChl
Possible 1,2,3, and 4-Variable Models
Possible Models with Corresponding SSE and DOF Values
Cntical Values for F
Information about Possible Models
Correlation Matnx
Dummy Variables for Soilfype: 2 Variable Case
Dummy Variables for Soilfype: 3 Variable Case
Relationship Between a, P, and Power
Relationship Between n and Power
Values of L for a = 0.5
xii
NOTATION AND ABBREVIATIONS
Type 1 error
Type II emor
Point of intersection of a line with y = O
Slope of a line
Overvoltage
Current density
Current density at the anode
Current density at the cathode
Corrosion current density
Exchange current density
Mean
Resistivity
Summation
Standard deviation
Potential
Nerst potential
Standard Nerst potential
Corrosion potential
Oxidation potential
Reduction potential
Benchmark mode1 in a significance test
Model being tested in a significance test
Ohm
Degrees Celsius
Activity
Amperes
xiii
ANOVA
&x
a d
atm
AWWA
C
cal
CD
Chloride
cm
CorrRare
CP
dec
deg
DOF
Drainage
Analysis of Variance
Activity of species being oxidized
Activity of species &mg reduced
Atmosphere
American Water Works Association
Coulomb
Calories
Cook's distance
Variable representing chloride content of soi1
Centimeter
Variable representing the corrosion rate of a metal in a soi1
Mallow's number
Decade
Degrees
Degrees of fieedom
Variable representing the drainage ability of a soi1
Correlation coefficient
Residual
Electron
Equivalent
Faraday's Constant
Effect size
Current
Corrosion Current
Joule
Kelvin
Number of variables in the R-mode1
Mass transfer coefficient
kilogram
liters
xiv
M
mm
MU+
Moisture
mV
N
N
P
pHdir
PO2
PPm
PRESS
PROC
R2adjurted
Reddir
Redox
Redsat
Variable representing the logarithm of the cliloride concentration of a soi1
Variable representing the logarithm of the corrosion rate of a metal in a soi1
Variable representing the logarithm of the resistivity of a soil, when measured
as received in the laboratory
Variable representing the logarithm ofthe resistivity of a soil, when measured
afier saturation with distilled water
Metal
Millimeter
Metal ion
Variable representing the moisture coptent of a soi1
Millivolt
Number of obse~ations
Solution normality
Number of variables in a mode1
Variable representing the pH of a soil, when measured as received in the
laboratory
Variable representing the pH of a soil, when measured afier saturation with
distilled water
Partial pressure of oxygen
Parts per million
Predicted residual sum of squares
Procedure statement in SAS
Adjusted partial correlation
Variable representing the reduction potential of a soil, when measured as
received in the laboratory
Oxidation-reduction
Variable representing the reduction potential of a soil, when measured afier
saturation with distilled water
Variable representing the resistivity of a soil, when measured as received in
the laboratory
Ressat Variable representing the resistivity of a soil, when measured afler saturation
with distilled water
b Polarization resistance
SandIClay A soi1 composed of a mixture of soil and clay particles
SAS
SSE
SuIfHcr
Suljl
T
v VIF
wt. %
X'
xd
x, Y
Y'
Y=.
Siatistical Analysis System
Error surn of squares
Variable representing the result of the sulfide content test using HCI and Lead
Acetate Paper
Variable representing the result of the sulfide content test using the iodine
solution
Temperature
Volt
Variation Inflation Factor
Weight percentage
Mean of X,
Dummy variable
Independent variable
Dependent variable
Mean of Y
year
xvi
ACKNOWLEDGMENTS
First and foremost, 1 would iiie to express my gratitude to my supervisor,
Prof. Saeed M. Mina, whose unending guidance and encouragement proved invaluable in
the successhl realization of this research program.
1 am also deeply indebted to Mr. Nourrediie Kadourn of COREXCO,
Montreal, for suggesting the research topic, and for devoting considerable attention to its
progress. Furthemore, 1 am very grateful to Mr. Gérard Benchétrit and to COPEXCO,
Montreal for the unrestricted access to equipment and materials, without which this
project would not have been possible.
Fially, 1 would like to thank my family and fiiends for their support and
encouragement.
The research project was supported by the Natural Sciences and Engineering
Research Council's PGS-A Scholarship held by the author.
xvii
CHAPTER 1:
INTRODUCTION
The corrosion of underground infrastructure is a very widespread problem.
Stmctures such as water mains, natural gas pipelines, and gasoline storage containers are
only some of the many structures affected by soi1 corrosion al1 around the world. When
a nahiral gas pipeline or a gasoline storage container fails, there is a high danger of fire
and subsequent explosion. Furthemore, the environmental darnage caused by such
failures is oflen devastating and irreparable. Failure of water mains can be equally
dismptive, as Canadians depend on drinking water for domestic, industnal and fire
fighting purposes. The physical integrity of the water distribution system is an essentiel
component for the health and economic well being of Canadians.
Every year, $200 million are spent on renewing iron water mains in Canada. The
majority of the problems occur on water mains made up of cast or ductile iron, which
account for 70% of the water mains. The fundamental cause of the detenoration of the
pipes is soi1 corro~ionl'~. ïhere is therefore a great need to determine the causes of soi1
corrosion, and to establish a quick and easy method of evaluating the corrosivity of soils.
There has been much research done in the field of corrosion and, in particular, soil
corrosion. Certain standards are now in use by the industry to determine the extent to
which a soi1 is considered corrosive. Standards such as that of the American Water
Works Association (AWWA C 105) and PACE 82-3 are widely used to determine
whether or not a metal subjected to a given soi1 will suffer detenoration. In al1 of these
standards, certain soi1 charactenstics are measured and a standard grid allows the
technician to calculate a corrosivity index for the soil. The term grid refers to the
established method of calculating the corrosivity index, and is composed of the test
results in combiiation with the appropnate points allocated to each. However, none of
the standards take into account the chloride ion content of the soil. It has been argued
that chloride ion content is measured indirectly through the measurement of the soi1
resistivity, which is incorporated in some f o m in al1 the grids. This variable accounts for
the total ion content responsible for the conductive nature of the soil. However, chlonde
ions have a dual role in the corrosion process. They not only promote corrosion because
they are conductive by nature, but they also inhibit passivity of the metal, i.e. they inhibit
the formation of an oxide layer on the metal surface which protects the metal from
corrosion[21. For this reason, it is suspected that the ineasurement of chloride ion
concentration will permit prediction of the corrosivity of a soi1 more accurately than is
possible without the knowledge of this parameter. It is the main goal of this research
program to determine whether the chloride ion concentration can provide the
information that the variables already being tested in the standards do not provide. If the
answer is affumative, then this variable can be recommended for incorporation into the
existing grids, or a new grid be created to adequately account for the soil chloride ion
content.
The linear polarization test (an accelerated electrochernical test which can be
used to evaluate the corrosion rate) will be used to determine the soil corrosivity. This
method has been used extensively in the examination of steel corrosion in reinforced
c~ncrete['.'.~.~], and has recently been used in the investigation of soil corrosion 17]. in this
project, the variable obtained using the method of linear polarization is considered the
"true" corrosion rate of the pipe in the given soil, and it will be compared with the other
soi1 characteristics. The following soil characteristics are measured: soil type, drainage
ability, pH, oxidation-reduction potential, sulfide content, resistivity, and chloride ion
content. The above variables are analyzed using the statistical package SAS, and the
relationship between the soi1 characteristics and the "true" corrosion rate will enable the
analyst to determine the extent to which each soil characteristic predicts the actual
corrosion ratels1.
The objectives of this project are the following:
O To study the method of linear polarization (applications and limitations), and to
determine the extent to which it can be used in the field of soi1 corrosion.
O To become familiar with the AWWA and PACE standards for soil testing, and to
outline the limitations and advantages of each standard.
r To study the relationship between the soi1 characteristics and the corrosion rate of the
soil, and to determine which variables play the most important role in the corrosion
process. What is the role of the variables which are expected to be the rnost
influential? What is the importance of the chlonde ion content of the soi1 in
predicting the corrosion rate?
To determine whether the chlonde ion concentration provides information that the
variables already 'bcing tested in the standards do not provide and, if so, to suggest
that this variable be incorporated into the existing grids or that a new grid be created
to include this variable.
The report is divided into two main sections: the rneasurement of the soi1
characteristics, and the analysis of the collected data using SAS. Chapter 2 introduces the
basic phenornena underlying the corrosion process, and provides the background
information essential to understand the variables being studied and their role in the
corrosion process (Chapter 2: Corrosion and Corrosion Tesring). In Chaprer 3:
Procedures and Apparatus, the rnethods and equipment used to rneasure the vanous soi1
characteristics are presented. The statistical analysis of the experirnental data obtained in
Chapter 3 is presented in Chapier 4: Analysis of Erperimental Results, and the results are
discussed. Finally, conclusions and recommendations for future work are made in
Chapter 5: Conclusions and Recommendations.
CHAPTER 2: CORROSION AND
CORROSION TESTING
To fully understand the factors that conhibute to corrosion in a particular
environment, a thorough howledge of the various corrosion mechanisms is essential. A
sound knowledge of the basic principles will allow the corrosion engineer to predict the
aggressiveness of a given environment, to alter the environment to decrease its corrosivity
to a particular material, to protect the materials from corrosion, or to choose materials
which will not be affected by the existing aggressive environment.
The basics of electrochemistry with respect to corrosion of metals in aqueous
media are briefly reviewed, along with the information deemed essential to
understandiig the variables selected for this study, and their role in the corrosion process.
The second section of this report introduces the reader to the method of linear
polarization, and examines the principles underlying the determination of the corrosion
rate. The following section introduces the causes and effects of soi1 corrosion, and the
final section discusses the AWWA and PACE standards currently being used by the
industry to determine the corrosivity of soils.
2.1 Principles of Electrochernical Corrosion
2.1.1 Necessarv elements for corrosion
Corrosion can take various forms, and can occur under different circumstances.
However, there are certain constants in al1 corrosion processes. Four elements must be
present for corrosion to occur: an anode, a cathode, an electrical conductor, and an ionic
conductor [2.7e91.
The anode consists of a metal (Fe, Cu, etc.) which is oxidized in the presence of
an oxidizing agent, or a corrodent. The metal, denoted by M, undergoes the following
reaction:
M + Mn' +ne' Oxidation of metal M (2.1) Anodic Reaction
It is the anode that undergoes damage. The metal M dissolves, releasing ions (Mn') and n
electrons. Some examples of metal oxidations are:
The cathode can consist of a metal, or a solution nch in oxygen or hydrogen ions.
While the anode is undergoing oxidation, the cathode is undergoing reduction. During
reduction, the cathode or corrodent is consuming the electrons released by the oxidation
of the metal. The two corrodents that are of major importance are the acidic solution, and
the neutral aerated water (e.g. rainwater or sea~ater)[~-' .~]. The reduction equations are as
follows:
Acid solution: 2 H ' + 2 e ' + H z Reduction of H' (2.3a) Neutral Aerated Water: 112 0 2 + H?O+ 2 e' + 2 O K Reduction of 0 2 (2.3b)
The complete corrosion equation is obtained by combiniig the equation of the
oxidation of metal M with one of the above reduction equations. in an acidic
environment, the complete equation becomes:
A product of this reaction is hydrogen gas, which can often cause problems such as
hydrogen blistenng, or hydrogen embrittlement of metals [2.91.
in neutral aerated water, the complete reaction is as follows:
n i e term 112 O2 in the above equation refers to the dissolved oxygen present in the
water. Furthermore, the products of the above reaction often combine to form a
precipitate:
If the metal M represents iron (Fe), then Fe(OH)2 or m t is precipitated when oxygen is
the corrodent.
Another element essential for corrosion to occw is an electrical conductor, which
allows electrons to move fiom the anode, where they are released, to the cathode, where
they are consumed [2.7.91. If this movement of electrons cannot proceed, then the
reduction reaction would stop. Furthemore, the anode would now be negatively charged
due to the presence of the electrons released, and this disequilibnum would stop any
further oxidation and release of electrons [2.7.91.
In the case when a piece of metal is the site of both the anodic and cathodic
reactions, or when the two sites are located on separate pieces of metal which are in
physical and electncal contact with one another, then the metal itself is the electrical
conductor. However, if the two sites are found on separate pieces of rnetal, then any
metal wire connecting the two will act as the electrical conductor through which the
eiectrons will move [2.7.91.
The last essential element in the corrosion process is an ionic conductor, or the
electrolyte. The electrolyte, which is the aqueous solution in contact with both the anode
and the cathode, allows the movement of ions fiom the anode to the cathode thus
ensuring electrical neutrality and allowing the corrosion process to continue 12.7.91.
in summary, the f o u essential elements to the corrosion process are the anode, the
cathode, the electrical conductor, and the ionic conductor. The anode is the site where
damage occurs as the rnetal is oxidized and electrons and ions are released. The electrons
travel fiom the anode to the cathode via the electrical conductor, which is usually the
metal itself, or a metal wire connecting the two sites. The cathode is the site where
electrons are consumed while oxygen or hydrogen are reduced. As the ions move fiom
the anode to the cathode via an ionic conductor, which is an aqueous solution
simultaneously in contact with the anode and the cathode, electrical neutrality is
established. Corrosion cannot occur unless al1 of these four elements are present.
2.1.2 Phvsical forms of corrosion
Corrosion can take various forms. The most common forms of conosioii are the
following 12*91 :
Uniform attack
Galvanic attack
Crevice corrosion
Pitting corrosion
Erosion corrosion
Selective leaching
e Stress corrosion
2.1.2.1 Uniform Affack
Uniform attack is the most common form of corrosion, making up 80-90% of the
cases in practice [2.91. It is normally characterized by a reaction which proceeds uniformly
over the entire surface of the metal. Al1 points on the surface corrode at a sirnilar rate
because every point acts altematively as an anode and a cathode. There is not one fixed
point acting as the anode, therefore not one fixed point of deterioration. This form of
corrosion is easiest to predict, and can be prevented or slowed down most easily.
2.1.2.2 Galvanic Attack
Galvanic attack occurs wlien two different metals are placed in electrical contact
in a corrosive environment [2.9.'01. If the two metals are not in contact with one another,
they would each corrode at their own rate. However, when they are placed in electrical
contact, the more anodic of the two metals suffers accelerated corrosion (anodic reaction)
while the corrosion rate of the more cathodic metal decreases.
In order to determine which of the two metals will corrode, the electromotive
series, which is an ordered list of each elements accompanied by their reduction
potential, is consulted. Table 2.1 is a reproduction of this senes.
Standard Potenfial Elcctrode Reaction @(in volts) at ZS'C
AU" + 3e- = Au pi:- + 2e- = pi Pd2' + 2e'= Pd Hg:' + 2e- = Hg Ag- + e- = Ag Hg:" + Ze- = 2Hg Cu' + e- = Cu Cu" + te- = Cu 2H' + 2e- = H: Pb'- + 2e- = Pb Sn2' + 2e- = Sn Mo'^ + 3e- = Mo Ni:' + 2e- = Ni Co" + 2e- = Co n- + e- = n In3- + 3r' = In Cd:' + 2e- = Cd Fe" + 2e- = Fe Ga" + 3e- = Ga Cr" + 3e- = Cr C i ' + 2e- = Cr Zn2* + 2e- = Zn Nb" + 3e- = Nb Mn:' + 2e- = Mn Zr" + 4e- = Zr Ti:' + 2e- = Ti Al3' + 3e- = AI Hf" + 4e- = Hf U" + 3e- = U Be:' + 2e- = Be Mg" + 2e- = Mg Na' + e- = Na
1.50 Ca. 1.2
0.987 0.854 0.800 0.789 0.521 0.337 0.000
-0.126 -0.136
Ca. -0.2 -0.250 -0.277 -0.336 -0.342 -0.403 -0.440 -0.53 - 0.74 -0.91 -0.763
Ca. -1.1 -1.18 -1.53 - 1.63 - 1.66 - 1.70 - 1.80 - 1.85 -2.37 -2.71
Table 2.1 Elrctromotive Strier "'
8
A shorîcoming of the electromotive series is that is fails to take into account any
alloying, or the effect of the formation of protective films which occur in the various
environments. A more practical alternative to the electromotive senes is the galvanic
series which is specific to a given environment. Table 2.2 indicates the galvanic series
for seawater.
Acliw (Read down) Magnesium 18-8 stainlcss steel. typc 305 (active) Magnesium ailoys 18-8. 3% Mo slainless steel. type 316
(active) Zinc Lcad
Tin Aluminum 5052H Muntz metal Aluminum 3004 Manganese bronze Aluminum 3003 Naval b a r s Aluminum 1100 Aluminum 6053T Nickel (active) Alclad 76% Ni-16% Cr-7% Fe (Inconel 600)
(active) Yellow brass
Cadmium Aluminum bronze Red brass
Aluminum 2017T Copper Silicon bronze
Aluminum 2OXT 5% Zn-ZE Ni. Bal. Cu (Ambrac) 70% Cu-3m Ni
Mild steel 88% Cu-2% Zn-IWt Sn ~comoosition G-
Wrought iron bronze)
88% Cu-3% Zn-6.5% Sn-1.5% Pb tcomp.
Cast iron Nickel (passive) Ni.Resist 76% Ni-16% Cr-7% Fe (Inconel 600)
(passive) 1 3 5 Chromium stainlcss steel. 71% N i - 3 N Cu (Monel)
type 410 tactivel Titanium 18.8 stainless steel. typc 305 (passive)
50-50 lead-lin solder 18-8. 3% Mo stainless steel. type 316 (passive)
Nable (Read cp)
Table 2.2 Calvaniç Series for Seawater 12]
2.1.2.3 Crevice Corrosion
Crevice corrosion is highly localized, and reflects the site at which it occurs. As
the name implies, corrosion occurs at crevices (openings of about 1 mm), or at points of
contact between the two surfaces [2.91. The opening is suflicient to allow the corrodent to
enter, but not large enough to allow the corrodent to flow. Corrosion occm in two
stages, which are illustrated in Figures 2.la and 2.1b.
I I
Water .tapa.t , 1 WZ//
--
Figure 2.1 The Two Stages of Crevice Corrosion
In stage 1, unifom attack occurs in the crevice. However, afier some t h e the
stagnant water is depleted of the dissolved oxygen, and stage 2 begins. Within the
crevice, the reduction reaction cannot proceed because the dissolved oxygen is depleted.
However, the oxidation of the metal continues. The electrons released in the crevice
travel through the metal to a site outside the crevice where dissolved oxygen is present.
The result is that the crevice continuously acts as the anode and suffers corrosion, while
the remaining metal acts as the cathode and suffers no furîher damage.
The danger associated with this fonn of corrosion is that it is unpredictable, and
that the damage proceeds undetected because its location is well hidden. Furthemore,
the rate at which the crevice metal detenorates is quite high when the crevice area is
small with respect to the surface area in contact with the corrodent. This occurs because
the crevice metal (anode) must produce electrons at a rate to satisfy the demand of the
entire cathodic area.
2.1.2.1 Pifring Corrosion
Pining corrosion is a highly
localized form of corrosion. It
generally starts on horizontal
surfaces which can hold water under
gravity, and at a surface discontinuity
(scratch or dent), and grows
downward. As in crevice corrosion,
two local sites are involved [2.9."1.
The stagnant water within the pit is
depleted of oxygen, and the tip of the Figure 2.2 Pining Corrosion
pit becomes the anodic site. Electrons move through the metal to the surface of the metal
which is in contact with aerated water (cathodic site) and enables the reduction reaction to
proceed.
Pining is one of the most destructive forms of corrosion. It can cause equipment
to fail because of perforation and it can be extremely dangerous when it occurs on vessels
whose contents are under pressure. Furthermore, it can be difficult to detect because the
corrosion products ofien cover the pits, which continue to grow undetected. Figure 2.2
illustrates schematically a metal undergoing pining corrosion.
2.1.2.5 Erosion Corrosion
Erosion corrosion is normally associated with moving çlunies [2.91. Solids in the
slurry erode (or scrape off) the protective oxide layers which form on metal surfaces.
These protective surface films provide metals such as aluminum, lead, and stainless steel
with their ability to resist corrosive
environments. Corrosion occurs in the
areas where the protective layer has been
scraped off. The exposed metal is anodic
to the metal protected by the surface film
and, therefore, suffers corrosion as s h o w
in Figure 2.3. This fonn of corrosion is
usually accompanied by surface striations,
i.e. gooves following a distinct direction.
Anodic Siics m 1 Mctal Surface 1
Figure 23 Erorion Corrosion
2.12.6 Selecrive Leoching
Selective leaching is the removal of one element from a solid alloy. It occurs
when an alloy is composed of two elements far apart fiom one another in the
electrochemical series. The more anodic of the two metals will be the anode and will
suffer accelerated corrosion, leaving behind the more cathodic metal [2.91.
An example of a metal subject to selective leaching is brass which is made up of
copper and zinc. Zinc, the more anodic
metal, is "leached out" and the resulting
material is a porous copper matnx.
Another example of selective
leaching is the well known phenomenon
of graphitization of gray cast iron. Gray
cast iron is composed of a network of
graphite within a matnx of iron or steel.
Figure 2.4 shows the microstnicture of
gray cast iron. The graphite is in the
form of flakes connected in such a way
Figure 2.4 Mirrortruîturc of Gray Cas1 lroo "'
that the material is able to hold its shape as the iron dissolves 12.9.12s131. This dissolution
occurs because graphite is cathodic to iron and a galvanic ce11 develops. lron dissolves
Ieaving behind a porous mass consisting of graphite, voids and rut , which can be easily
cut with a knife. in contrast, the graphite in ductile or malleable irons is in the shape of
nodules or spheres, and a porous matrix cannot form. As such, these matenals are not
subject to graphitization.
2.1.2.7 Stress Corrosion
Stress corrosion is the result of the combiied effect of a weak applied or residual
tensile stress and a weak corrodent [2.91. Each of these two components alone would not
be problmatic, but together they accelerate the rate of corrosion. It has been observed
that, in most cases, no corrosion would occur when a metal subjected to a weak corrodent
is not subjected simultaneously to a tensile stress. Stresses f?om 5-70% of the yield
stress are sufficient to cause severe damage [2.91. Another point of interest is tha: the
corrodent is metal specific, i.e. not ail corrodent will affect al1 metals. For example, a
weak chloride solution will cause severe damage to stainless steels, but will not affect
plain carbon steel at all. In addition, a weak nitrate solution will damage plain carbon
steels, but will not affect stainless steel at al1 [2.91.
Like pitting corrosion, the crack starts at a surface pit or scratch, and moves
downward. n i e crack follows an anodic path. One example of an anodic path is that of
zinc in brasses, which is an alloy of zinc and copper. An anodic path can also be created
when an element of an alloy precipitates at either the grain boundary or within the grain
itself leaving one of the two areas anodic to the other. When the grain boundary is
anodic to the grain, the crack is said to be intercrystalline [2.91. When the grain itself is
anodic to the boundary, then the crack is said to be transcryçtalline 12.91. Figures 2.5a and
2.5b illustrate the difference between the two types of cracks.
When cracks begin to form, the reduced cross-sectional areas are unable to
withstand the design loads. Furthemore, solid corrosion products which often
accompany the corrosion process cause additional stresses by their expansive nature. As
the cracks grow under the combiied action of corrosion and stress, the tensile stress in the
uncracked section grows exponentially and can lead to sudden unexpected failures '2,91.
Figure 2Ja Intcrcrystnlline Crack Figure 2.Sb Tnnscrystallinc Crack
2.1.3 Whv do metals corrode?
The electrochemical series indicates whether a metal is more anodic compared to
another, and it can provide the potential 4 of a reaction. But what does this potential
represent and why does a metal corrode in the first place?
Corrosion of a metal occurs because ofthe element's tendency to attain the natural
state, which is the ionic form. The metallic form of most elements is unstable, and there
is a potential for these metals to be oxidized:
M + Mn+ + ne' +oxidation (2.1) unstable + nahxal state. stable ore
This potential c m be compared to the potential energy of a sphere when held at an
elevated position 19'. As seen in Figure 2.6, at position 1 the sphere equilibrium is
unstable and it possesses potential energy. Some ofthis energy will be used up as the bal1
moves to position 2, a point of lowcr potential energy. This is the spontaneous direction
for this particular system. Movement fiom position 2 to position 1 would not occur
spontaneously in nature. Energy fiom an extemal source must be provided for such a
movement to occur.
Figure 2.6 Stable and Unstable Positions standard most often used is the reduction
of hydrogen ions:
2H'+2e-+H2 where 4=0.000V (2.3a)
Similarly, electrochemical
reactions are accompanied by a potential
4, indicating the potential for the reaction
to proceed spontaneously. It must be
noted absolute that value, the potential but a relative 4 is not one. an
Potentials of reactions are always
measured with respect to a standard. The
By convention, the value of the potential of this equation is chosen to be equal to
zero volts, and the potential of other reactions are measured against this standard.
Another convention adopted is the used of reduction potentials, &,d, instead of oxidation
potentials, 4,,, in tables such as for the electrochemical and galvanic series. To obtain
the value of a particular oxidation reaction, the value of h e d is simply multiplied by -1.
For example:
fl I
Position 2
The potentials listed in Tables 2.1 and 2.2 are termed half-ce11 potentials, because
they accompany only half of the overall reaction. A complete reaction is made up of two
reaction halves. One reaction-half is a reduction reaction, and the other is an oxidation
reaction. For exarnple, for the following two reaction halves:
It is useful to determine which of the two reaction-halves will be reversed such
that the potential of the entire system will be non-negative, i.e. proceed spontaneously. It
is easily noted that if Equation 2.7a is reversed, the total potential of the system will be
equal to 4, + 4IrCd = -(-0.440 V) + 0.000 V = 0.440 V, which is positive. The system
will spontaneously behave according to the following equation:
When the two reaction halves are combined, the one with the srnaller reduction potential
will be reversed and the element will undergo axidaiion.
Retuming to the two most conunonly encountered corrodents, the acidic solution
and neutral aerated water, it is evident from the electrochemical senes that the reduction
potential of both hydrogen ion reduction and oxygen reduction is higher than most metals
of interest to engineers:
The combination of one of the above corrodents with a metal whose reduction potential is
lower than that of the corrodent will result in the oxidation, or corrosion, of that element.
2.1.4 Determinina the Rate of Corrosion
Examination of the electrochemical or galvanic senes enables one to determine
whether or not a metal will corrode in a given environment. But of more interest to the
corrosion engineer is the determination of the raie at which this corrosion will proceed.
Corrosion rates are determined by studying the polarization behavior of the two
reaction halves. As seen previously, the two reaction halves are the following:
ANODIC REACTION: M + Mn+ + ne- Oxidation of metal M
CATHODIC REACTION:
Acid solution: 2H++2e '+H2 Reduction of H' OR Neutral Aerated Water: 112 0 2 + HzO+ 2 e' + 2 O K Reduction of Oz
In order to fully understand the above corrosion system, an analogy will Grst be
made with the copperlzinc banery [2.7.9.'48'51. AS seen in Figure 2.7, a CdZn banery is
made up of a copper rod immersed in a solution of CU'' ions (a solution of CuSOd), and a
Zn rod immersed in a solution of 2n2' ions (a solution of ZnSO4). The two solutions are
c o ~ e c t e d by a diaphragm which allows the passage of ions, enswing electncai neutraliiy.
From the electromotive series, it is observed that the reduction potential of Zn is lower
than that of Cu, and therefore Zn will be anodic to Cu and suffer oxidation. The two
resulting equations are:
Zn + Zn2' + 2 e- Oxidation +& = 0.76 V (2.9a) cu2+ + 2e- + Cu Reduction = 0.34 V (2.9b)
Pnor to electncal contact
of the metal rods, the two
separate systems are at
equilibnum, and no corrosion is
occurring. Once the two metals
at different potentials are placed
in electncal contact, the system
will attempt to reach a point of
equilibrium at a potential
somewhere between +a and +c, . The driving force of (+c, - OZ,, ) volts will cause Zn ta be
oxidized, and copper to be
reduced according to the above
R 4 Control of resistance T"'_rl
I
Figure 2.7 Schematic o l a CulZn Banery
equations.
When Zn is oxidized, electrons and 2n2' ions are released. The electrons travel
through the wire to the surface of the copper rod, where they combine with the cu2+ ions
from solution, to form Cu. As the electrons travel through the wire, a current is registered
by an ammeter.
In order to study the variation of the potential with current, the system is
manipulated by varying the
current permined to flow
through the wire, via various
resistors. Figure 2.8 is
obtained by ploning the
potential of Cu and Zn venus
the registered current [2.91.
Three distinct points
on the diagram are of interest:
1. the open circuit at I=O,
2. a point of restricted
current flow, and
3. the short circuit at I=Imax.
log l
Figure 2.8 $ vs. log 1 for CuRn Battery
1. Open Circuit
The point on the diagram representing an open circuit is at I=O, i.e. when no
current flows. This represents the behavior of the system when electrical contact is not
provided, and the two metals behave independently. It is observed that the potential of
each of the metals is the standard Nernst potential which is defined as [2.91:
I$N = $NO + 2.303 RT/nF * log [a&a,d] (2.10)
where $N = Nernst potential
$NO = Standard Nernst potential (equilibnum potential of metal in
contact with its own ions, at unit activity)
R = Gas constant (8.314 Jtdeg mole)
T = absolute temperature (K)
n = number of electrons transferred
F = Faraday's constant (96500 Cleq)
&% = activity, or concentration, of oxidized species
a,,d = activity, or concentration, of reduced species
For the CdZn banery, the Nernst potential is calculated for both the Zn and Cu electrode.
The denvation of the following equations c m be found in Appendix A.
For the reduction of Zn at 25 OC: Zn -t zn2* + 2 e- , Equation 2.1 O becomes:
= $ N ~ O + 0.059212 log [zn2'] (2.1 1)
For the reduction of Cu at 25'C: Cu -t cu2+ + 2 e-, Equation 2.10 becomes:
4h.c" = ~ N C " ' + 0.059212 log CU^'] (2.12)
2. Point of Restricted Flow
Between 1=0 and I=Imax, the current is manipulated to flow at a predetermined
rate. Using resistors, the current is allowed to vary and the potential of each metal is
measured and ploned versus the current. From the diagram, the change in potential for
e x h metal, termed the overvoltage q, can be calculated as 12s91:
f ) ~ u = $8 - $NU (2.13)
W n = $b - $ ~ z n (2.14)
3. Short Circuit
The point of short circuit is the point when the current is allowed to fiow
unresinctedly, i.e. the resistance R=O and the current FI,. The current is govemed only
by the potential difference of the system. This situation is the fiee corrosion situation.
The equilibrium potential is called the free corrosion potential, &,,,,, and the current
associated with 4,, is the corrosion current, I,,.
The above analogy can serve to better undersland the two corrodents îhat are most
commonly encountered by corrosion engineers:
the reduction of H' ions in an acidic solution, and
the reduction of 02 in a neutral aerated solution.
When a metal M is placed in an acidic solution, the following reactions occur:
Since the reduction potential of H' is larger than the reduction potential of most
metals of interest, the metal will undergo oxidation (anodic reaction).
diagram will result:
Figure 2.9 Poiarization Diagram for Corrosion in Acidic Solution
The following
When the system is allowed to corrode freely, the potential $,, and the corrosion
current I,,will apply. Fwthemore, the Nernst potential of H' reduction becomes [291:
It is very interesting to note that, in the case of corrosion in acidic environments,
the Nernst potential depends only on the temperature and on the pH. n i e derivation of
the above equation can be found in Appendix A.
When a metal M is placed in neutrol oerored water, e.g. rainwater or seawater, the
following reactions occur:
Since the reduction potential of O2 is larger than the reduction potential of most metals of
interest, the metal will undergo oxidation (anodic reaction). The following diagram will
result :
Figure 2.10 Polarizalion Diagram Tor Corrosion in Neutra1 Aerated Water
When the system is allowed to corrode freely, the potential $,, and the corrosion
current Ic,,will apply. Furthemore, the Nernst potential of O2 reduction becomes [2.91:
4 ~ 0 2 = 4 ~ 0 2 ' + 2.303 RTl4F log {PO~/[OHJ~}
where PO2 = partial pressure of oxygen in the solution.
In the case of corrosion in neutral aented water, the Nernst potential depends on the
temperature, pH (or pOH), and the partial pressure of oxygen in the water.
2.1.5 The Exchanee Current Densitv. 1,
A term that appeared ofien in the previous polarization diagrams was i,, which
represents the exchange current density. When a metal is in equilibrium with its o u n
ions, the Nerst potential, $N, and exchange current density, i,, apply 12.91. An example of
this is a Cu rod placed in a solution of CU" ions (a solution of CuS04). The equilibrium
reached is a dynamic one. Although no changes are visible to the naked eye, reduction
and oxidation of the metal are taking place at equal rates. This rate is termed the
exchange current density, IO. Electrons travel through the metal from the anodic to the
cathodic sites, which are continuously changing locations.
In reduction
reactions such as H' and
0 2 reduction, the exchange
current density, i,, is very
sensitive to the condition
of the metal surface.
Furthemore, the corrosion
current, I,,, is highly
dependent on the value of
i,. As it is s h o w
schematically in Figure
2.1 1, Ison increases as i,
increases. Consequently,
the rate at which a metal
log 1
Figure 2.11 Dependence oll,, on the value 01 i.
22
will corrode varies as the surface preparation of the metal varies 12*91. When testing metal
samples to obtain Ln, it is very important to ensure that the surfaces of the samples are
prepared consistently, so that variations in 1. will not introduce errors in the
determination of Ln.
2.1.6 Determination of I,,&,,
As mentioned previously, the values of L, and 4,, are obtained fiom the
intersection point between the anodic and the cathodic lines of polarization diagrams.
Up to this point, the diagrams are similar in that each of the two lines is represented by a
straight line. This will be correct in approximately 90% of cases in which activation
polarization behavior governs 12.91. However, an altemative to this situation warrants
some attention. This behavior is termed concentration polarization. The shape of the
polarization lines are determined by either concentration or activation polarization.
For the sake of compatibility in calculations, the term I,, is used instead of
Lon. The term r,, represents the corrosion current demity, and it is diiectly proportional
to Lon, the corrosion current. in fact, Ln = r,, * A, where A is the surface area of the
anode.
2.1.6.1 Activation Polarizotion
Activation polarization, or Tafel behavior, makes up 90% of the cases, and it
occurs when the rate of a reaction is controlled by the slowest of the steps in the reaction
sequence, Le. the electrochemistry of the system govems the rate t2.91. This behavior
occurs in well stirred solutions, where the reaction rate is not limited by the speed at
which a slow species can move through the solution.
In activation polarization, both the reduction and the anodic reactions display
Tafel behavior, i.e. both behave linearly. The Tafel equation relates the overvoltage q to
the current density r by the following equation:
? = P 1% (i/b) (2.17)
where p = Tafel constant, or Tafel dope [2s91. The values of P have been tabulated for the
vanous metals in different media. Table 2.3 shows some typical values.
Metal 1 Temperatore (OC) Solution
1N HC1
0. IN HC1
0.1N HC1
1N HCI
0. IN HC1
2N Hzso4
1N HC1
1N
0.01-SN HCI
Table 2.3 Typieal p Values l4
Figure 2.12 shows a typical diagram in which activation polarization govems. In
order to solve for i,,, the following two equations are solved simultaneously, and the
values of $,, and i, are obtained:
I O' lcon
log 1
Figure 2.12 Activation Polarizntion Diagram
2.1.6.2 Concentration PolarCation
In concentration polarization, only the reduction reaction is affected. The
oxidation reaction exhibits Tafel behavior as it did in the case of activation polarization.
A typical diagram is s h o w in Figure 2.13.
O I c o r r = l ~ log 1
Figure 2.13 Conecntration Polarization Diagram 12'91
Concentration polarization usually governs in cases where the solution is stagnant.
The rate of the reaction depends on how quickly certain species are capable of diffusing
îhrough the stagnant solution, towards the metal surface where corrosion occurs [291. For
example, in corrosion due to H' reduction where the solution is stagnant, the initial
condition is represented by Figure 2.14a. However, as H' reduction occurs at the surface
of the metal, H' ions are used up. As a consequence, a thin boundary layer is formed in
which the concentration of H' ions varies iÏom the concentration H' in the buk solution,
[H+]b, 10 zero. This concentration gradient causes ions to diffuse towards the surface
where they are then consumed by the corrosion process. Figure 2.14b illustrates the
boundary layer in question.
0 1 , Distance from metal surface
IH' I . 1- *
O Distance from metal surface
Figure 2.14 Distribution of H' ions in Time
The rate at which certain species (in this case the H' ion) are able to diffuse
through the boundary layer will govern the rate of the corrosion reaction. in
concentration polarization, it is said that mass transfer controls the rate of the reaction.
The maximum expected current in concentration polarization cases is called the limiring
current, i,. As it can be seen in Figure 2.13, when concentration polarization
governs, I,, is srnaller than it would be if activation polarization governed, i.e. if the
solution had been well stirred.
The limiting current densities for the reduction reaction can be calculated from the
following equation [2.91:
11. = knF [arc& (2.1 9)
where k = mass tmnsfer coefficient (cmlsec)
n = number of electrons transferred in the reduction
F = Faraday's constant (96500 Cleq)
[a&, = activiîy, or concentration, of the reduced species
2.1.7 Effect of Varvine Parameters Using Polarization Diaerams
Polarization diagrams have many uses. They help one visualize electrochemical
phenomena which would otherwise be quite abstract. Polarization diagrams also help
visual understanding and prediction of the effect of varying certain parameters
influencing the corrosion rate in the two corrodents of interest, acidic solutions and
neutral aerated water. The parameters discussed in the following sections are:
r POz and Hi concentration,
r 0 2 solubility,
r multiple corrodents,
r passivity,
r galvanic anack, and
r chloride content.
2.1.7.1 PO2 and Concentration
In cases of corrosion in neufral aerafed water, the value of PO2, the partial
pressure of oxygen (1 atm for pure oxygen, 0.4 atm in air), affects only the value of the
Nernst potentiai, 4 ~ 0 2 12.91:
h 0 2 = h O 2 O + 2.303 RTl4F * log { P O ~ / [ O H ~ ~ }
As the value of PO2 increases, so does the value of bol , and this causes the cathode line
to shift upwards. This is illustrated in Figures 2.15a and 2.15b, for activation and
concentration polarization, respectively. In the case of activation polarization, the results
of increasing PO2 are as follows: - increase in I,, , increase in $,,,,
0 no change in 10,
no change in the anode line.
The above results also apply in the case of concentration polarization because, as
P02 increases, so does the value of [02]b. and this leads to an increase in IL. n i e verticai
line is therefore shifted to the nght.
In cases of corrosion in acidic solutions, as the concentration of H' ions increases
and the pH decreases, the Nernst potential $NH+ increases ($NH+ = -2.303 RTE * pH).
This results in the reduction lines shifting upwards. In the case of activation polarization,
the results of increasing the concentration of H' are as follows:
increase in I,,, ,
increase in $,,,,
0 no change in I,,
no change in the anode line.
The above results also apply in the case of concentration polarization, because as
the value of [H+Ib increases, the value of IL increases as well, and the vertical portion of
the reduction cuve is shifted to the nght.
increases
increases PO, = l atm (bubbling O,)
PO, = 0.21 atm I (air wtunted water) I I
log 1 - 1 con lncreases
m m 1 increases
1 I I I - log t
l corr InCrePSeS
Figure 2.15 Elleel of Vsrying PO1
This parameter must not be confused with P O 2 studied previously. in this case,
the PO2 is kept constant, but the solubility of Oz varies dependiig on the presence of
impurities such as chloride ions in the aqueous medium [2.91.
The solubility of 0 2
affects only the cases of
NaCl content, which I I
corrosion in neutral aerated
water, where concentration
polarization govems. The
solubility of 0 2 varies with
chloride content as illustrated in
corresponds to typical seawater Figure 2.16 Variation olOl Solubility with NaCl Concentration
[2.91. Assuming a constant PO2,
O, Solubility 16, \ Fresh :
[OH] and temperature, an increase in the concentration of dissolved 0 2 in the water
Figure 2.16. The 0 2 solubility is % NaCl
highest at approximately 3% 1 O - 3%
causes an increase in IL (1~=knF[02]b ). Consequently, this will result in an increase in
2.1.7.3 Multiple Corrodents
Situations where a metal is subjected to the effects of more than one corrodent are
not uncommon. For example, acid rain is a corrodent rich in both oxygen and H+ ions.
in such a case, two cathodic reactions occur simultaneously 12.91:
However, only one anodic reaction is involved:
M +M''++neo
Figure 2.17a illustrates the situation of multiple corrodents in cases where
activation polarization governs. The terms ta and ib on the polarization diagram
represent the current density that would apply if one corrodent was acting at a time. In
situations of multiple corrodents acting simultaneously, a new reduction l i e must be
drawn. This line is constructed by addiig i , and ib at any given value of 9. This line is
used to determine the actual current density existing at the metal surface.
The value of t,,, the total current density of the oxidation of the metal, is equal
to IO>+ t ~ + , the current densifies of the reduction of 0 2 and H', respectively. Figure 2.17a
shows clearly that when a second corrodent is introduced in a system, the value of i,,
and 4,, increase. Another interesting point to note is that the reduction of 0 2 h a a
higher contribution to t,, that does the reduction of H' ions.
In cases where concentration polarization govems, the result of addiig a second
corrodent to a system is to increase both t,, and 9,, . This conclusion is more easily
reached when studying the polanzation diagram in Figure 2.17b. The new line
representing the situation of multiple corrodents has, as before, a constant value of i,,
rotai. which is equal to 102 + i ~ + .
0 0
log t 1 con H4 1 con O* l con tau1
I I I I
! ! ! log t O ~ r o n K I c o n 0 , Iranuitai
Figure 2.17 Elleet of Multiple Corrodentr 19]
2.1.7.4 Galvanic Atrack
Galvanic ûttack occurs when two metals are placed in electrical contact in the
presence of a corrodent. In this case, there is one cathodic reaction (reduction of H' or
0 2 ) , and two anodic reactions:
The resulting corrosion system is illustrated in Figure 2.18. If only metal Mi is
present, lines 2 and 4 would apply and i ~ i would result, while if only metal M 2 is present,
Figure 2.18 Galvanic Attark 19'
lines 1 and 5 would apply and I ~ Q would result [91. When both metals are involved, then
lines 3 and 6 would apply. Lines 3 and 6 are obtained by adding the value of the curent
densities of lines 1 and 2, and lines 4 and 5, respectively. It can be seen that when two
metals are involved, the rate of corrosion of the more anodic of the two metals, Ml, will
increase and the rate of corrosion of the more cathodic of the two, M*, will decrease.
Metal Ml is said to suffer accelerated corrosion, or galvanic at ta~k[*~].
Many metals, such as Fe, Cr, Ni, Ti, and Al, exhibit passivity in various
conodents. Passivity is the formation of a protective oxide layer on the surface of the
rnetal which causes it to corrode at a much slower rate than that predicted by Tafel
behavior [91. Figure 2.19 illustrates a typical polarization line of a metal which exhibits
passivity. Three distinct regions can be discemed: the active region, the passive region,
and the transpassive region.
Figure 2.19 Polarization Diagram of a Metal Exhibiting Passiviiy 12.91
The active region, considered so far, is the region limited by the Nernst potentk il,
4NM, and the passive potential, 4,,. The currents in this region vary between the exchange
current density, L,, and the cntical current density, 1,. In this region, the metal exhibits
siandard Tafel behavior.
The passive q i o n is the region limited by the passive potential, $,,. and the
transpassive potential, 6,. In this iekion, the current is equal to the passive current
density, i,, and does not vary with potential.
Passivity is due to the adsorption of Oz onto the metal surface. This adsorption
occurs at potentials between Op, and $,,, at which point passivity begins to breakdoun.
As it can be seen in Figure 2.19, the lower is the value of i,, the lower is the value of I,,
obtained when the intersection of the two polarization lines occurs within the passive
region.
The transpassive region is the region where the potential is higher than O,,. The
breakdown of passivity begins at $,,, when the adsorbed layer of Oz is no longer stable
and begins to disintegrate Iz9]. The value of the current is not constant in this region, but
increases with increasing potential.
2.1.7.6 Chloride Content
The effect of the presence on chloride ions in a solution, and to a lesser extent
halogen ions, is to increase the value of the exchange current density, 1,. of the metal in
the given solution, and to breakdown its passive layer Iz1.
Chloride ions break down, andlor prevent the formation of a passive layer in
metals such as Fe, Cr, Ni, Co, and stainless steels. The passive layer forms due to the
absorption of oxygen onto the metal surface. When chlorides are introduced into the
solution, they compete with 0 2 for absorption Iz1. Unlike the adsorbed Oz which causes
the rate of the metal dissolution to decrease, chloride ions favour hydration of the metal
ions and therefore increase the rate of dissolution I2l.
The value of the potential of the system will determine whether Oz or C1' ions will
be adsorbed, Le. whether passivity will form or breakdown. Below a certain potential,
chloride ions cannot displace the adsorbed OZ and the passive layer will remain stable and
corrosion will be negligible. This potential is termed the cntical potential ['l. At
potentials higher (or more noble) than the critical potential, CI- ions are capable of
displacing adsorbed 02, thus destroying the passive layer.
Breakdown of passivity occurs locally and is not spread out uniformly over the
metal surface. Destruction of the passive layer tpically starts at a point of discontinuity
in the passive film. The result is localized attack and the formation of pits [21. This
combination of snall anodic area, the pit, and large cathodic area, the remaining metal
surface, results in a situation of accelerated corrosion. Furthermore, the higher the current
flow at any pit, the less likely that other pits will form nearby, i.e. the number of pits per
unit area is smaller for deeper pits than for shallower ones 12]. An effective inhibitor for
Cl- ion anack is the addition of extraneous anions to the solution. Species such as N O j
and SOJ', which will not break down the passive layer, compete with Cl- ions for sites on
the passive film and, consequently, inhibit the formation of pits ['l.
The effect of Cf ions can be so pronounced that in some cases stainless steels,
which are known for their resistance to most corrosive environments, have been obsewed
to corrode at rates similar to those of metals that do not exhibit passivity at al1 ['l.
2.2 Measuring Corrosion Rates
In this section, the theory behind the corrosion rate measurements is outlined. It is
on these basic principles that corrosion-measuring equipment are developed. Essentially,
there are two methods used to obtain the corrosion rate electrochemically: Tafel
Extrapolation, and Liiear Polarizztion [9s'61.
A metal which is exposed to a corrodent such as an acidic solution or neutral
aerated water will acquire a certain potential, 4,. This can be seen on the polarization
diagram of Figure 2.20a that at this potential, the current resulting fiom the metal
oxidation is equal in magnitude to the current feediig the reduction of the corrodent, i.e.
at this point of equilibrium, the electrons are being produced and consumed at the same
rate. This current is termed the corrosion current density, i,,.
If the system is manipulated such that a potential 4, other than +,,, is applied,
then the anodic and cathodic currents, i, and i,, will no longer be equal and a net current,
i, will flow. Figures 2.20b and 2 . 2 0 ~ illustrate this point. When the potential increases
above +,,, then the cment leaving the anode will increase, causing the metal to dissolve
more quickly. This phenomenon is called anodic polarization [9.'61. Conversely, if the
potential is decreased below O,, then the cment leaving the anode will decrease and the
metal will dissolve at a slower rate. ïhis phenomenon is called cathodic polarization[9.161.
if the imposed potential is varied and each value is plotted against the logarithm
of the resulting current, a curve resembling Figure 2.21 would be obtained. The section
of the curve below $,, represents the region of cathodic polarization, and the section
above it represents the region of anodic polarization. When the potential is equal to g,,, no net current is expected to flow.
The above theory forms the basis of the two methods used to determine corrosion
rates electrochemically: Tafel Extrapolation and Liear Polanzation.
COLT - - - - - - - - - - - - -
1%
' I 1 I log 1
lc lcorr la
, log t
la lcon lc
(cl
Figure 2.20 variation of 1. and 1. wilh Potential E$
38
I Anodic Polarization ofmetai M /
Cathodic Polarization K log t
Figure 2.2 1 Tafel Curve
2.2.1 Tafel Extrapolation
In Tafel Extrapolation, corrosion rates are measured using data obtained by
polarizing a metal sample cathodically and then anodically. The very simplified
schematic diagram in Figure 2.22 illustrates the typical setup.
The metal under study is called the working electrode. It is placed in the
corrodent along with the awiliary and the reference electrodes. The auxiliary electrode is
usually made up of an inert metal, such as graphite or platinum. The purpose of this
electrode is to act as either a source, in the case of anodic polarization, or a sink, in the
case of cathodic polarization, for the resulting current i. The reference electrode
measures the potential 4 of the metal, and a potentiometer records these values.
Simultaneously, an ammeter records the current flow to or fiom the working electrode.
Finally, a potentiostat is used to impose the desired potential on the system.
Figure 2.22 Schematic of Setup for Tnfel Test ['61
The first step, prior to polarizing the metal sample, is to determine the value of
&,,. The metal sample is placed in the corrodent, and the potential is allowed to attain its
equilibriurn, and the anodic and cathodic reactions are allowed to proceed undisturbed.
There is no net flow of electrons, i.e. ia = ic= i,, and $ = $,,. This potential is called
the opepl circuit corrosionporenrial, and it is measured by the reference electrode.
Once the value of $,,, is recorded, the potentiostat then imposes a potential of
$con-A4. This situation is represented by point a in Figure 2.23. The potential remains at
$,-A4 for a specified amount of t h e , and the value of the resulting current, a, is
recorded. The potential is then increased by a predetermined increment, $,,, and the
resulting current is again ploned at this new value 4 value. This continues until the
potential reaches 4, + A$, and thus al1 potential values between $-A$ and $,,,+A$
have been scanned. The result of ploning the imposed potential versus the logarithm of
the resulting current is the complete curve illustrated in Figure 2.23.
Figure 2.23 Tale1 Curve Obtained by Varying @,., Ig1
Another mical curve is illustrated in Figure 2.24. At low currents, this curve is
non-linear. However, the two branches of the curve become linear at higher current
values. This region of linearity is called the Tafel region. The slopes of the cathodic and
anodic polarization lines in the Tafel regions are termed P, and Pa, respectively. The
value of Ag can range 60x11 50 to 250 mV, or more. Typically, the Tafel region begins at
+,, f. 50 mV, and ends when the various phenornena cause the linearity of the curve to
be lost, e.g., the potential attained encourages the formation of a passive layer and the
cuve suddenly continues vertically upward (current does not increase with increasing
potential) f9.'61.
The value of i,, is obtained by extrapolating the Tafel regions back to the
corrosion potential, g,,, where the two l i e s intersect. Figure 2.25 shows the intersection
of the two dashed lines at a point where $ = $,,, and 1 = I,,. Once the value of i,, is
known, the corrosion rate in mm/yr. can be computed.
Tafel ',
O log i
Figure 2.24 Tafel Regionr 19J61
Figure 2.25 Obînining i,, from the Tafel Curve 19"q
2.2.2 Linear Polarization
An alternative to Tafel Extrapolation is the method of Linear Polarization
which has been studied extensively to date. The procedure is the same as that for Tafel
Extrapolation with the following exceptions 19.'?
O n i e value of A$ is approximately 10-10 mV,
O The values of p, and p, are not obtained automatically, but must be knom or
estimated before hand,
O The data points obtained during polarization are ploned on a linear-linear paph, and
not on a linear-log plot.
In the method of Linear Polarization, once the value of $, is recorded, the
potential is dropped to ($,,, - 20 mV). It is then raised incrementally up to a potential of
($con + 20 mV), and the current is recorded at each step. The $ values and corresponding
i values are ploned on a linear scale and the resulting graph resembles Figure 2.26.
4 corr + AI$
4 corr
4 corr - A$
Figure 2.26 Linear Polarization Curve
Under these conditions of slight polarization, Le. with A+ i: 20 mV, the potential
varies linearly with the resulting current. Stem and Geary (1957) derived the following
relationship to obtain the value of I , , [~~ '~- '~ ' :
where the term (A+ 1 Ai) is also called the polarization resistance, %, given in ohms.
The values of p, and p, can be either determined by the method of Tafel Extrapolation, or
it can be estimated. The value of i,,, is determined by the Stem-Geary equation, and the
corrosion rate in mmlyr. can then be computed.
2.3 Soi1 Corrosion and I ts Effects on Underground Infrastructure
This section deals with the principles of soi1 corrosion and its effects on the
underground infrastructure. Underground pipelines make up the greatest proportion of
the metals threatened by soi1 corrosion. The various mechanisms of soi1 corrosion are
outlined and explained from an electrochemical perspective.
The deterioration of metal pipelines in soils can be due to many phenomena. The
most important ones are the following:
the formation of differential aeration cells,
r galvanic attack,
selective leaching, and
r stress-corrosion cracking.
2.3.1 Differential Aeration Cells
When a pipeline is exposed to conditions which vary along its length, it can be
subjected to variations in the 0 2 exposure 112.91. This results in potential differences and,
consequently, the corrosion in the pipe section located in the area of low 02content.
A situation which is often faced is a pipe which encounters different soi1 types
dong its path. Different soils have different porosities and therefore different 0 2
contents. For example, clays typically have very low porosities and, consequently, low
O2 concentrations. On the other hand, sands are highly porous and well aerated, and
generally contain higher levels of 02. When a pipe rüns through both of these soils, a
corrosion cell is created. ï h e section of pipe located in the clay will have a lower
potential (since the O2 concentration is loaer) than the section in the sand. As a result,
the section in the clay will be anodic to the section in the sand, and corrosion will occur
in the pipe located in the clay. n i e pipe itself will serve as the electrical conductor
allowing electrons to move from the anode to the cathode, and the groundwater will serve
as the ioNc conductor. The circuit is completed, and localized corrosion will proceed at
an accelerated Pace I I 1 .
A similar situation may be created when a pipe passes under a paved surface, such
as a parking lot or a street ['l. The soi1 beneath the paved surface generally has a lower
oxygeri content than does the soi1 beneath the unpaved surface, which is more readily
exposed to air and oxygen-nch rainwater. A corrosion cell is therefore set up with the
pipe beneath the pavement being anodic to the surrounding pipe. Once again, the pipe
itself acts as the electncal conductor, and the groundwater as the ioNc conductor.
Another cause of differential aeration cells is the improper installation of new
pipes[71. Pipes are usually rested directly on undisturbed soi1 and then covered with
relatively loose backfill. The backfill is generally more permeable than the compacted,
undisturbed soil, and will contain higher concentrations of oxygen. A cell is, therefore,
formed with the pipe bottom being anodic to the pipe crown. Electrons move through the
pipe itself, from the bonom to the more aerated crown, with the groundwater acting as the
ionic conductor. This explains why most corrosive attacks on pipelines occur on the
bonorn 114 of the pipe.
2.3.2 Galvanic Attack
Another very common rnechanism of soi1 corrosion is the phenomenon of
galvanic attack. As it was descnbed previously in Section 2.1.2.2, galvanic attack occurs
when dissimilar metal are placed in electrical contact, and exposed to a corrosive
environment. The more anodic of the meials suffers accelerated corrosion, while the rate
of corrosion of the more cathodic metal decreases
A common example of galvanic attack is the corrosion of steel (iron) water and
gas mains at the point of contact with the copper pipe services [Il . Copper, being cathodic
to iron, will result in the iron pipe to suffer accelerated corrosion. Luckily, this situation
does not cause too much damage because the area of the anode (the iron pipe) is much
larger that the area of the cathode (the snialler copper line), and the corrosion is spread
out over a large area.
Galvanic attack can also occur when a new pipe is placed in electncal contact
with an old pipe, even if the pipes are niade of the same material [Il. At first glance,
galvanic anack may not be suspected because the matenals are not different. However,
over the years a protective surface film has formed on the surface of the old pipe,
providing passivity and resistance to corrosion. The old steel is therefore cathodic to the
new steel, which will suffer accelerated corrosion when the pipes are in contact with one
another. Before long, the new pipe may be in worse condition than the old one, leading to
the erroneous conclusion that the pipe material itself is to blame. This situation is often
encountered when the capacity of a water pipe is insufficient and an additional water pipe
is laid parallel to the old one and the two are connected by cross-overs. The old pipe is
the cathode, the new pipe is the anode, the metallic cross-over is the electrical conduc:or,
and the groundwater is the ionic conductor.
Another example of galvanic attack is the accelerated corrosion of iron pipes
placed in contact with a soi1 containing cinders [ I l . Cinders are essentially made up of
carbon, and are therefore cathodic to the iron pipe. The potential difference between the
two metals is in the range of 0.8 to 1.1 V, which can cause very senous damage to the
pipe.
2.3.3 Selective Leaching
Selective leachiig, as described in Section 2.1.2.6, is the removal of one element
îÏom a solid alloy. This occurs because the alloy is composed of elements whose
potentials are very different, resulting in the more anodic of the two beiig "corroded",
leaving behiid a porous mass consisting of the more cathodic element.
An example of this is the graphitization of cast iron pipes [Il. Cast iron is
composed of graphite flakes within a matrix of iron. Graphite is cathodic to iron,
therefore a galvanic ce11 exists. As iron dissolves, it leaves behind a weak porous
material which is characterized by a dark gray color.
2.3.4 Stress-Corrosion Cracking
Stress-corrosion crackiig (SCC) results when a metal is subjected to a
combination of weak corrodent and a weak tensile stress [1s2.91. As described in Section
2.1.2.7, failwe c m appear quite suddenly because no general surface corrosion is
apparent.
An example of localized stresses in buried pipes is "cold bendiif of pipes ['l.
When underground pipes are manufactured, they are often subjected to "cold bending" to
produce bends. f i s c m result in significant residual stresses forming at the bends of the
pipes. Also, the pipe c m be subjected to localied stresses when they are forced into
alignment once placed in the ground. These forces are suficiently large to cause serious
SCC problems. The weak corrodent is usually neutral aerated groundwater, a weakly
acidic groundwater. The result is the accelerated corrosion of the pipe in the areas where
the pipe is subjected to tensile stresses.
2.4 Standards for Determining Corrosivity of Soils
The majority of the standards for determining soil corrosivity were designed to
respond to a particular need, and as such, many different standards have been developed
in North America, France, and Germany. Typically, the variables tested are the same,
although the testing proceedure may vary. Two standards which are used extensively in
Quebec are AWWA Cl05 and PACE 82-3. ïhis project focuses on these two standards.
2.4.1 AWWA Cl05
The Amencan Water Works Association (AWWA) Standard was designed to
assist the engineer to decide whether or not to use polyethylene pipes instead of
traditional materials. It mus1 be kept in mind that the AWWA is a pnvate organization
and not an independent national entity, and as such, the grid developed may be biased to
some extent. Nonethelass, the AWWA standard is used extensively in North America.
The soi1 characteristics examined in the AWWA Standard are the following:
a soi1 type,
a drainage ability,
a soil resistivity,
PH,
a oxidation-reduction potential, and
a sulfide content.
These soil characteristics are evaluated separately and the appropriate point is allocated to
each result dependig on the extent to which the factor contributes to the corrosivity of
the soil. The points are then sununed, and a fmal corrosivity index is reported.
According to the AWWA standard, an index of 10 or more indicates that the soi1 tested is
corrosive, whereas an index below 10 suggests that the soil is not corrosive 1251.
A detailed description of the testing procedure is presented in Chaprer 3:
Procedures and Apparatus. This section deals with the factors tested and the points
allocated to each. The following soi1 characteristics are considered:
O Soi1 type is a characteristic which is recorded in the AWWA grid, but which is not
allocated any points. The type of soil (sand, clay, silt) is reported along with the
following characteristics: color, odor, presence of rocks or pebbles, and the presence
of organic materials.
O The drainoge ability of a soil estimates the ease in which the soil is penetrated by
water. The better the drainage ability of a soit, the less I iely that a soil will become
anaerobic and permit bactenal corrosion. The drainage ability is classified as either
excellent, good or poor, and the following points are allocated:
Excellent 1 O II
O Soil resistivity is a measure of the ability of a soil to conduct a current. The lower the
resistivity of a soil, the beîîer are the soil's electrolytic properties, and the higher is
the rate at which the corrosion can proceed. Soil resistivity is measwed in ohm-cm,
and the following points are allocated:
r The pH of a soil is a rneasure of the H+ ion content of the soil. H+ ion reduction is an
important reaction in the corrosion process. The following points are allocated to this
factor:
r Oxidation-reduction potential, or redox potential, is a rneasure of the potential + of
the soil. The potential of a soil indicates whether or not a soil is capable of sustainiig
sulfate-reducing bacteria, which contribute greatly to the corrosion problern. A low
potential indicates that the oxygen content of the soil is low and, consequently, the
conditions are ideal for the proliferation of sulfate-reducing bactena. The following
points are allocated:
The sulfide content of a soi1 serves as an indicator to the presence of sulfate-reducing
bacteria. The greater the sulfide content, the greater the possibility of the presence of
sulfate-reducing bacteria. The following points are allocated:
When sulfides are present and the pH of the soi1 lies between 6.5 and 7.5, an
additional 3 points shall be added to the calculated index. These points are added to
account for the fact that the conditions are optimal for the proliferation of sulfate-
reducing bacteria.
2.4.2 PACE 82-3
The PACE 82-3 standard was designed to assist the engineer in the decision to
provide protection to buried steel reservoirs, such as a petroleum tanks. In the original
standard, three soil samples are taken £rom the site and tested in the laboratory. Each soil
sample is tested individually, and the results are compared with those of the other two
samples. The three samples are originally located at a distance of 30 meters £rom one
another, and their locations form an equilateral triangle when viewed from above.
An adaptation of this test was used in this project. The soi1 samples were received
and tested individually, with no comparison made between samples. The soil
characteristics examined in the PACE standard are the following:
moisture content,
soi1 resistivity,
0 pH,and
sulfide content.
These soi1 characteristics are evaluated separately and the appropriate point is allocated to
each result depending on the extent to which the factor contributes to the corrosivity of
the soil. The points are then summed, and a final coaosivity index is reported.
A deîailed description of the testing procedure is presented in Chapter 3:
Procedures and Apparatus. This section presents the factors tested and the points
allocated to each. The following soi1 characteristics are considered:
The moisture content of a soil describes the state in which the soi1 is received in the
laboratory. This parameter indicates the extent to which a soil is saturated during the
year. The soil is classified as either dry, moist or saturated, and the following points
are allocated:
II Moist 2 1
r Soi1 resistiviîy is measured in ohm-cm, and the following points are allocated:
r The pH of a soi1 is a measure of the H' ion content of a soil. The following points are
allocated to this factor:
The sulfide content is classified as positive or negative. The following points are
allocated:
1 Sulfide Content 1 Points 11 -
CHAPTER 3: PROCEDURES
AND APPARATUS
The various laboratory experiments performed, the apparatus and the matenals
used, the purpose of the experiment, and the results obtained are described in this chapter.
For each soi1 sarnple collected, the following variables were evaluated:
soi1 type,
drainage ability and moisture content,
pH: direct and saturated,
oxidation-reduction (redox) potential: direct and saturated,
resistivity: direct and saturated,
sulfide content: using HCl + lead acetate paper, and a solution of iodine + Na3N,
concentration of C1' ions,
rate at which a standard metal sample will corrode in the given soil using the method
of linear polarization, and
calculated corrosion indices according to the AWWA and PACE methods.
3.1 Soil Samples
The soil samples tested were obtained fiom the various regions of Quebec. In
most cases, the samples were taken for the purpose of beiig tested according to the
AWWA or the PACE methods by COREXCO, Montreal, to determine the need for
cathodic protection of various metallic structures embedded in the given soil.
in total, 153 soil samples were tested. Of these, only 75 were available in
quantities sufficient enough to permit testing for the corrosion rate using the method of
linear polarization.
3.2 Soil Type
During the course of al1 of the tests to follow, the technician should observe
certain characteristics that will enable the determination of the soil type, i.e. a sand, a
clay, or a mixture of both (sandclay). For example:
s The ability of water to penetrate a soil is a good clue to the soil type. For example, a
sand is very quickly penetrated by water, a sandclay is penetrated slowly, and a clay
is almost not penetrated at all.
c The consistency of the soil when manipulated in one's hands: fine sand forms clumps
that c m b l e easily, whereas clayey rnatenals typically f o m clumps that are either
hard or malleable, but do not c m b l e easily.
c The ease with which the soil is washed off the equipment, e.g. electrodes, plastic
bowls, soil box, rnetal spatulas, etc. Sand rinses off equipment easily, requiring no
scnibbiig at all. Clays, on the other hand, require significant brushing to be rernoved,
and sandclays are relatively easy to wash off, but notas easily as pure sand.
Experience will enable the technician to confidently classifi a soil as a sand, a
sandclay, or a clay. The soil types of the samples tested are presented in Table 3.1, in
which a sand is represented by S, a clay by C, and a sandlclay by SC.
- Soil # -
1 2 3 4 5 6 7 8 9 10 1 1 12 13 14 15 16 17 18 19 20 2 1 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 -
- Soil # - 40 4 1 42 43 44 45 46 47 48 49 50 5 1 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 7 1 72 73 74 75 76 77 78 -
- ioil type - SC S S S S SC SC SC S SC S S S S S S S S S S S S C SC C S S SC SC SC SC SC S S SC SC SC SC SC -
- Soil # - 79 80 8 1 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 1 O3 104 105 106 107 108 109 110 1 1 1 112 113 Il4 115 116 117 -
- ioil type - SC S SC S S S S SC S S SC SC SC C SC SC SC SC SC SC SC SC SC SC SC SC S SC S S S SC S SC SC C S SC SC -
- Soil # - 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 247
-
- ioil spe --. SC SC SC C SC C SC SC SC S S SC SC SC SC SC SC SC S SC S C SC SC S S S S S S C C C SC C S
- Table 3.1 Sou Type Results
56
3.3 Drainage Ability 1 Moisture Content
n i e AWWA and PACE standards define humidity differently. According to
AWWA, humidity is the ability of a soi1 to be penetrated, or to drain water. ïhis variable
is referred to as the drainage ability of the roil. Accordiig to PACE, humidity refers to
the moisture content of a soil on site, or as it is received in the laboratory. This variable is
termed the moisture content of the soil in this thesis.
Drainage Ability
The definition of the humidity index in the AWWA Standard is the drainage
ability of a soil. In the laboratory, this parameter is determined very subjectively. Soi1 is
placed in a bowl, and distilled water is added slowly to the soil. The speed with which
the water penetrates the soil is observed. The drainage ability of the soi1 is th-n classified
in one of the following three categories:
Excellent : a soil that is easily penetrated by water, e.g. a sand
Good: a soil that is penetrated slowly by water, e.g. a sandlclay
Bad: a soil that is almost not penetrated by water at all, e.g. a clay
The drainage ability of the soil samples tested are presented in Table 3.2, in which
excellent drainage is represented by E, good drainage by G, and poor draiiage by B.
Moisture Content
Unlike AWWA, the humidity index in the PACE grid is a measure of the moisture
content of the soil sample as it is received in the laboratory. Again, this is a subjective
evaluation, and it is dependent on the experience of the technician. The moisture content
of the soi1 is determined by visual inspection, and by rollig the soil in one's hands. The
moisture content of the soil is then classified in one of the following three categories:
Saturated
Moist
Dry
ïhis parameter estimates the moisture content of the soil under usual
circumstances. Knowledge of the moisture conditions that a soil is subjected to
throughout the year will enable the engineer to determine how corrosive the soil is to a
water pipe placed permanently in that soil. For example, irrespective of the corrosivity of
a soi1 in sanirated condition, if it is kept very dry throughout the year, the pipe will not
suffer any corrosion. However, the state of one sample does not indicate the general year-
round conditions. This test should therefore be used in conjunction with interviews with
the individuals who are knowledgeable of the condition of the soil in general, i.e.,
percentage of time that a soi1 is saturated, moist, and dry. The rnoisture contents of the
soil samples tested are presented in Table 3.3, in which a dry soi1 is represented by D, a
moist soi1 by M, and a saturated soil by S.
- - Xainage Ability =
G E E E E E E E E G E E E E E E E E E E E E B G B E E B B B G B E E G G G G G v
- Soi #
79 80 8 1 82 83 84 85 86 87 88 89 90 9 1 92 93 94 95 96 97 98 99 100 101 102 1 O3 1 O4 105 1 O6 107 108 109 110 1 1 1 112 113 114 115 116 117 -
- Xainage Ability - G G G E E E E G E E B B B B B G G G G G G G G G G B G G E E E E E G G B E G G -
- - a e Ability - G G E B G B G G G E E G G G G G G G E G E G G G E E E E E E B B B B B E
- Table 3.2 Drainage Abiliîy Resulis
59
- Soi #
- 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 7 1 72 73 74 7s 76 77 78 - Table 3 3 Moisture Content ResuItc
60
The pH of the soi1 samples was measiued in two different ways. The first method
consisted oftesting the soi1 in the state in which it was received in the laboratory, i.e. pH-
direct. This test serves to represent the conditions found on site. The second method
consisted of testing the soil once it had been saturated with distilled water, i.e. pH-
saturated. This test may better represent the case in which the soil is saturated after a
heavy rainfall, or snow melt. Furthemore, it represents the conditions in which the soi1
is found during the linear polarization test. Although both procedures have their
limitations in applicability, the pH of the soil was detennined accordiig to these two
procedures because they were recommended by the AWWA and PACE grids, and were
required to calculate the corrosivity index accordiig to each of these grids.
Necessary Equipment
pH meter
30 ml plastic container with cap
Distilled water
The pH was measured using a pH meter, an electronic device with a probe that
can be inserted into a solution of an unknown pH. A pH meter is an example of an ion-
selective, or ion-specific, elecirode. It is based on the principle that the measured
potential of a solution depends on the concentration of the reactants and the products
involved in a cell reaction.
The pH meter has three main components: a standard electrode of known
potential, a special glass electrode that changes potential depending on the concentration
of H' ions in the solution into which it is dipped, and a potentiometer that measures the
potential between the two electrodes. The potentiometer reading is automatically
converted electronically to a direct reading of the pH of the solution being tested.
The g las electrode contains a reference solution of dilute hydrochlonc acid in
contact with a thin g las membrane. A silver wire coated with silver chlonde is
embedded in the solution. The electrical potential of the glass clectrode depends on the
difference in H+ concentration between the reference solution and the solution being used
in the test. Thus the electrical potential vanes with the pH of the solution tested 1'4.151.
The AWWA recommends that the pH of the soil be determined for the soil as it is
found in its natural state. The pH electrode is simply immersed into the soil and the value
obtained is noted once it has stabilized. Extreme care must be taken when attemptiiig to
plunge the pH meter into dry clay, or into a soi1 containing small pebbles, because of the
delicate nature of the glass bulb. The values of pH-direct of the soi1 samples tested are
listed in Table 3.4.
PACE recommends thrit the pH of the soi1 be determined by testing a slurry
consisting of soil and distilled water. The 30 ml plastic container is filled halfway with
soil and then filled almost to the top with distilled water. The container is then capped
and shaken vigorously. The mixture is allowed to rest for approximately 5 minutes. The
pH of the slurry is then determined by immersing the pH electrode into the saturated soil,
and allowing the value to stabilize. The values of pH-saturated of the soil samples tested
are listed in Table 3.5.
- Soil #
- 40 4 1 42 43 44 45 46 47 48 49 50 5 1 52 53 54 55 56 57 58 59 60 6 1 62 63 64 65 66 67 68 69 70 7 1 72 73 74 75 76 77 78 --.
- PH-
direct - 6.8 7.2 7.5 7.6 7.4 7.8 8.8 8.1 6.1 6.7 7.6 6.7 6.8 7.9 6.9 7.7 7.3 7.7 6.9 6.9 7.5 7.1 5.9 6.7 5.8 6.1 6.2 7
7.2 7.1 7.8 7.7 5.9 7.7 7.4 8.2 7.7 7.5 7.6 -
- - Soil #
- 79 80 8 1 82 83 84 85 86 87 88 89 90 9 1 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 -
- PH-
direct =
7.3 7.9 7.4 8.2 7.2 6.7 6.1 5.7 6
5.9 7
7.2 7.3 7.4 7.4 7.6 7.4 8
7.4 7.8 7.2 7.6 7.9 7.5 6.7 6.8 7.8 7.7 8
7.5 7.1 8.1 8.3 7.1 6.6 7.3 7.3 6.8 7.2 -
- - PH-
direct
Table 3.4 pH-direct Results
63
Soil #
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2 1 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
- - Soil #
- -
40 4 1 42 43 44 45 46 47 48 49 50 5 1 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 7 1 72 73 74 75 76 77 78 -
Table 3.5 pH-saturated Results
64
3.5 Oxidation-Reduction Potential (Redox Potential)
Like pH, the redox potential is measured in two different ways for each soil
sample. The first method is the direct measurement, redox-direct, in which the soi1 is
tested as it is received in the laboratoiy to represent the condi:ions found on site. Also,
this variable is required to calculate the corrosivity index according to the AWWA
standard. The second method involves testing the soi1 once it has been saturated with
distilled water. This measurement is referred to as redox-saturated. This method serves
to represent the conditions in which the soil is found during the linear polanzation tests.
r Digital voltmeter
r 30 ml plastic container with cap
r Distilled water
The oxidation-reduction potential is measured using a digital voltmeter. This
instrument measures the "driving force" or the "pull" of the soi1 on electron. These
electrons would be supplied by the oxidation of an anode placed in contact with the soil,
i.e. metal objects embedded in the soil. This potential is the electromotive force (emf) of
the cell, and it is a measure of the tendency of the soi1 to corrode a metal. The unit of
eleclrical potential is volts, V.
The first voltmeter measures the potential by drawing current through a wire of
known resistance ['4s151. However, when the current flows through a wire, the frictional
heating that occurs wastes some of the potentially useful energy of the cell. A traditional
voltmeter will therefore measure a potential that is less than the maximum cell potential.
The key to determining the maximum potential is to perfonn the measurement under
conditions of zero current, so that no energy is utilized. Traditionally, this has been
accomplished by inserting a variable voltage device, powered ffom an external source, in
opposition to the cell potential. The voltage on this instrument, called apotentiometer, is
adjusted until no current flows in the ceIl circuit. Under such conditions, the ce11
potential is equal in magnitude and opposite in sign to the voltage setting of the
potentiometer, and is the mnrimum ce11 potential since no energy is wasted in heating the
wire. More recently, advances in eleckonic technology have allowed the design of the
digital voltmeters, such as the one used in this project, that draw only a negligible amount
of current [14.'51. These instruments have since replaced potentiometers in the modem
laboratory due to their ease of use.
Redox-Direct
The AWWA recommends that the redox potential be determined for the soil, as
it is received in the laboratory. The platinum electrode is immersed into the soil, and the
redox value is noted once the value has stabilized. Le. the redox value does not Vary
above 1 mV per minute. The values of redox-diuect for the soil samples tested are listed
in Table 3.6.
Redox-Saturated
The slurry prepared for pH testing accordiig to the pH-saturated method is used to
test for the redox-saturated value. The platinum electrode is immersed into the sahirated
soil, and the value of the potential is noted once it has stabilized. The values of redox-
saturated for the soi1 samples tested are listed in Table 3.7.
In testing for the redox potential, an attempt was made to limit the exposure of the
soi1 to the ambient air. Redox tests on soil samples were always performed first, as soon
as the container of soi1 was opened, and this container was closed as soon as possible
afier retrieval of the soil sample. It has been observed in the laboratory that a soi1 whose
redox potential is below O mV, once left open to ambient air for half an hour to an hour, it
may later register a potential above 100 mV. It is essential to keep the soil container well
sealed, to ensure that the readiig taken is not affected by the exposure to the oxygen in
the air.
- - Redox- direct
(mv)
7
Redox- direct
(mV)
214 -3 8 150 184 178 118 175 191 219 232 200 183 180 200 220 23 1 203 210 20 1 224 -3 3 -64 178 240 134 14
-38 219 194 144 112 1 O4 164 170 154 187 192 214 169 -
- Soi #
- Redox- direct
(mV)
209 208 220 194 181 216 204 263 216 200 210 183 21G -49 132 185 208 8 1
228 228 274 260 228 219 228 185 193 171 155 175 190 147 121 183 229 180 260 225 218 -
- Redox- direct
(mv) - 160 230 230 184 190 50
237 281 320 156 192 208 190 185 197 181 167 188 180 147 178 148 191 219 165 195 189 155 139 150 130 246 247 154 28
217
- Table 3.6 Redoxdirect Results
67
Soi #
- 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2 1 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 -
- - Redox- aturated
(mv) - 196 188 225 168 258 194 252 256 218 198 153 165 149 121 140 156 176 186 180 214 264 229 163 191 206 187 179 162 155 154 171 111 155 165 226 115 270 204 197 -
- - Soi #
- 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 247
-
- Redox- idhnated
(mv) - 206 197 223 80 187 4 0 262 267 167 134 175 188 191 180 190 173 159 177 145 174 183 188 193 205 136 161 185 128 99 112 152 225 230 93 -15 222
=
Table 3.7 Redox-saturated Resulis
68
3.6 Resistivity, p
It is found that the nature of the electrolyte, in this case the soil, has a significant
influence on the rate of corrosion of a metal exposed to it. The types and the amounts of
the various dissolved salts in a soil, particularly those which ionize most readily, are
estimated by measuring the electrical resistivity of the soil. The lower the resistivity, the
more the electrolyte contributes to corrosion.
The resistivity of a soi1 is measured in two different ways. The first method
involves measuring the resistivity of the soil as it is received in the laboratory, pdirect.
The second method is to measure the resistivity of the soi1 once it has been saturated with
distilled water, p-saturated. The latter represents the worst case, when the soi1
conductivity is at its highest. Both measurement are necessary to calculate the corrosivity
indices according to A W 7 A and PACE.
Soil box
Ohmmeter
Four wires with clamps at both ends
The resistivity of a given object is calculated by making use of the relationship
between resistivity, resistance and geometry. Resistance, R, is the property of a body or
mass with discemible geometry, e.g. a piece of wire, or a block of soi1 of a given size.
Resistivity, p, on the other hand, is a characteristic property of the material, e.g. copper,
or a specific soil. While resisîance is a function of geomeby, resistivity is not
dependent on the geometry of the body Il1.
The resistance of a recîangular body of any substance, when measured between
parallel faces, is directly proportional to its length and inversely proportional to its cross-
sectional area. In other words, as the depth and width increase, resistance decreases. The
following equation shows the relationship between these variables
where R = the resistance of the rectangular body (Ohms)
p = the resistivity of the substance making up that body (Ohm-cm)
W = the width of the body (cm)
D= the depth of the body (cm)
L =the lengtli of the body (cm)
When measuring p, what is actually being measured is R, and p is then calculated
using Equation 3.1. R is measured using a soi1 box and an ohmmeter. The soi1 box is a
rectangular box with an open top, made of a non-conducting material (usually plastic)
wiîh metal ends and two metal pins inserted into the side of the box ['l. The box is filled
to the top with soil, such that the values of W, D, and L are known. It is then connected
to the ohmmeter, and the current is introduced by means of the two end plates ani the
potential is measured across the two pins. nie value of R is then calculated according to
the following relationship ['l:
The value of the resistivity, p, is then calculated automatically and displayed by the
ohmmeter.
p-saturated
The AWWA method suggests that the resistivily of the soil be determined when
the soil is saturated. l i s represents the woet possible case, i.e. when the conductivity of
the soil is a maximum.
A suficient quantity of soil is placed in a bowl, and distilled water is added
gradually in small quantities. The soil and water are mixed continuously to encourage
penetration of the water into the soil. Some expenence is needed to ensure that the soi1
has reached saturation, and extreme care must be exercised when adding water to avoid
supersaturating the soil. When a soil is supersaturated, the excess water may separate
6om the body of soi1 and it will not be transferred to the box with the rest of the soil.
Ions such as chlorides, which are found in this excess water and which give the water its
conductive properties, will be absent fiom the soil box. This will result in a higher
rcsistivity, which will not be tmly representative of the soil's ability to conduct ions.
When the soi1 is saturated, it is transferred to the soil box a linle at a t h e , where
it is compacted well to eliminate any air bubbles or voids, and to ensure uniformity and
reproducibility of the measurement. The box is then attached to the ohmmeter, and a
readiig is taken. The values of p-saturated for the soi1 samples tested are listed in Table
3.8.
According to the PACE method, the soil is tested for resistivity in the state in
which it is received in the laboratory. Therefore, the wet or dry soi1 is added to the soil
box and compacted. Once more, air bubbles must be absolutely avoided, as they will
result in higher values of p. The box is then wired to the ohmmeter, and the reading is
taken. The values of p-direct for the soil samples tested are listed in Table 3.9.
- - P -
ahnated
- 228 190 780 165
5300 4330 4770 2343 2097 1834 1284 1856 2396 1748 3580 367
6 1400 409
132600 5820 4020 90800 299 4080 3650 4960 3200 778 682 613 2076 1008 1112 1706 1040 2890 1591 2685 2275 -
- - Soi #
- 40 4 1 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 6 1 62 63 64 65 66 67 68 69 70 7 1 72 73 74 75 76 77 78 -
- Soi #
- 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 247
ePha
- - P -
ahnated )hmsm: - 1947 1377 7410 1571 1888 1417 2623 2181 2211 2359 3060 5980 5140 2625 250 87.2 102.6 81.9 1838 5780 6490 2309 4800 4720 11000 8220 34400 18310 9370 17150 3070 2589 2523 1572 2135 2269
- Table 3.8 p - saturated Results
- P - direct ohm@ - 3820 1410 9280 12550 31800 1987 2198 4120
62100
44700 10440 6040 9040 61800 14960 16610 11230 3830 1389 2121 2688 1131 2159 7140 1504 2078 1790 2064 623
146700 224.8 2449 7340 1901 8240 2599 -
- P -
direct ohm-cm: - 4100 3170 2218 82400
9460
129300 1943 1618 1247 1205 3820 3220 5660 3140 3160 1271 205.3 1482 1736 841 496 4000 2012 8230 67900 26970 40200 120700 77200 6040 11930 4080 8900 1247 2980 -
Sol#
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78
Sol#
1 2 3 4 5 6 7 8 9 IO 1 1 12 13 14 15 16 17 18 19 20 2 1 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
- P -
direct ohmcm: - 1947 1683 13300 1577 5140 1429 3750 4400 7600 4850 90100 13360 27180 42000 iO58 568 779 28 1.4 5000
151800 7490 22340 16230 37600 27860
40900 30400 35100 3860 3460 2612 1548 2032 3870
P - direct
(0-1
280 228 1592 173 5300 4340 4780 2277 1940 1713 1223 1904 3550 1813 3580 1443 101300 833
220800 5820
180000 309 18570 3840 13820 10580 825 707 723 2235 1311 1568 3830 1210 7750 2593 10870 3210
Table 3.9 p - direct Results
73
3.7 SuIfide Content
The sulfide content of the soil is determined in two different ways. The first
method uses a solution of iodiie and 3% NajN, and the second a solution of HCI along
with a strip of Iead-acetate paper. These two methods were chosen because they were
recommended by AWWA and PACE, and are required to calculate the corrosivity
indices.
NecesSan, Eauioment
c 2 standard test tubes
c Concentrated HCl acid (15%)
A strip of lead acetate paper
% A solution of 12 (aq) + 3% Na3N
The AWWA procedure for testing for sulfides is to saturate a small quantity of the
soi1 with a solution of iodine and 3% Na3N, and to observe the resulting reaction. A
mal1 amount of soi1 is placed in a test tube, and the iodine solution is poured into the test
tube to top the soil. The mixture is then shaken well, and the degree of reaction is
obsewed and classified as either violent, normal, or absent.
This test is a qualitative one, and may be quite subjective. This is because the
reaction is never very violent, and it is ofien difficult to differentiate between the degrees
of reactions, especially between the normal and the violent. Although only visual
observation is recommended by AWWA, sound was also used to help distinguish
between a violent and a normal reaction. If bubbles can be heard to be exploding at a
quick pace, the soil is considered to undergo a violent reaction. If only a slight sound can
be heard (or none at all), and bubbles c m be seen, then the soil is classified as reacting
normally. If no bubbles are seen or heard, then the soi1 is assumed to contain no sulfides
at all.
The degree of the reaction was then used to establish the sulfide content of the
soil. If the reaction was classified as violent, then the sulfide content of the soi1 was
assumed to be high. Ifthe reaction was classified as normal, then the soi1 was assumed to
contain traces of sulfide. Fially, if no reaction was obsewed, then the soi1 was assumed
to contain no suifides. The sulfide content of the soi1 samples, as determined by AWWA,
are presented in Table 3.10, in which N represents no sulfides, T represents traces of
sulfides, and P represents the presence of sulfides.
HCI and Lead Acetate Paper
PACE recommends using concentrated HC1 in combiiation with lead acetate
paper to determine whether the soi1 contains sulfides. A small amount of soi1 is placed in
a test-tube, and then 15% HCl is added to top the soil. A strip of lead acetate paper is
introduced and held at the top, and the test tube is then covered at the top with the thumb
of the tester. The mixture is then shaken gently, and care is taken not to wet the indicator
paper. AAer a couple of minutes, the paper is obsewed for signs of a brown
discoloration, usually present along the edges of the paper. Any discoloration indicates
the presence of sulfides. Another indication of the presence of sulfides is the smell of
ronen eggs, characteristic of the H2S gas. However, as this product is extremely toxic, it
is highly recommended that one avoids breathiig il, and that the room be kept well
ventilated or, better still, that this experiment be carried out under a fume hood.
When the acid is added to the soil, it is very common to observe a violent
reaction, and a lot of bubbling. This may be the result of the reaction between HCI and
any carbonates that may be present in the soil. The bubbling is the result of the formation
of hydrogen gas, and does not indicate the presence of sulfides. Only the discoloration of
the lead acetate paper can correctly determine whether or not sulfides are present. The
sulfide contents of the soil samples, as determined by PACE, are presented in Table 3.1 1,
where N represents no sulfides, and P represents the presence of sulfides.
- - Suifide content ? d u e ) -
N T N T N P P P P P P P N N T N N
N P P P P N N N N N N N N N P N N N T T T - -
= Sulfide content &xüne) -
T T N N N N N N T T N N N N N N N N T T P P N T T P P N T T P P N N N T N T P -
- Suifide content 5odine) -
N T N N N N N N N N N T N P P N N N N T N N N N N P T P N N N P N T N T N N N -
=
Soil #
- 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 747
-
- Suifide content riodine) -
N N T T N P N T N P P P P T T T P T T P P P T T N T T P P P P T P P P T
- Table 3.10 SuIfide Content Results Usiug Iodinc Solution
76
3.8 Chloride Concentration
The test for chlonde content is not a part of either the AWWA or the PACE
procedures. It is one of the goals of this project to determine if knowledge of the CI- ion
content would permit us to evaluate the corrosivity of a soil better than it could be
determined if only the other parameters were known, i.e. p, redox potential, pH, etc.
The overall procedure can be summarized as follows: The sample is dried and
pulverized. The finest particles are kept and combied with distilled water. The mixture
is allowed to sit ovemight, permitting the iÏee C1' ions to enter into the water. The
potential of the solution is recorded with a chlonde-specific electrode, and the rccorded
value is compared with the pretabulated values to obtain the chlonde ion concentration
of the solution. The chloride concentration of the soil itself is then calculated.
3.8.1 Necessaw Equioment
Potentiometer
Chloride-specific electrode
Electrode wening agents, Le. solutions of 1M (PJH4)2S04 or 1M KNOJ
8 Ceramic bowl and hammer
30 ml plastic containers
2 mm and 200 pm sieves
200 ml beakers
8 Microwave oven
8 Scale
Powdered KCl
Distilled water
a J-cloth + an elastic band
Stop watch
The chloride ion concentration is calculated indiidly fiom the potential that is
measured using an ion-specific elccirode, i.e. an electrode that is sensitive to the
concentration of a particular ion. It is based on the p ~ c i p l e that the measured potential
of a solution depends on the concentration of the reactants and the products involved in a
ceil reaction 1'4.151.
An example of an ion-specific eleckode is the pH meter discussed in Section
3.4.1. Glass electrodes can be made sensitive to ions such as Na', K', NI&', and CI' by
changing the composition of the membrane. In this case, a CI' ion-specific electrode was
used to determine the potential resulting fiom the presence of Cl- ions only 114.151. Unlike
the pH meter, the potential is not converted automatically, but m u t be obtained through a
series of steps which are discussed in the following sections.
3.8.2 Samvle Prevaration
A 200 ml beaker is filled half with soil, and covered with a piece of J-cloth which
is secured in place with an elastic band. The sarnple is dried in a microwave at a high
temperature for 3-5 minutes, or as long as necessary to thoroughly dry the soil. Extreme
care must be exercised when handling the beaker as it reaches very high temperahues.
When the soil has cooled suficiently to permit handling, some of it is kansferred
to a ceramic bowl. The soil is pounded with a hammer for a few minutes to separate any
larger pebbles from the fmer soil. The pulverized soil is passed through a 2 mm sieve to
remove the pebbles that cannot be pulverized. The recuperated fine soil is then rehmed
to the bowl where it is pulverized further to a fine consistency. The soil is then passed
through a 200 pn sieve, and the soi1 recuperated is ûansferred to a clean, tarred 30 ml
plastic container. Approximately 5 g of soi1 should be recuperated. If the quantity is
insufficient, the above procedure can be repeated until such an amount is retrieved.
The exact weight, in grams, of soils in the tarred container is recorded, and
distilled water is added to the soil in a ratio of 2:1, i.e. 10 g of water are added to 5 g of
soil. The container is then capped, and shaken vigorously for 30 seconds. ï h e sample is
then allowed to sit ovemight.
The potential of the previously prepared sample is recorded with the aid of an ion-
specific electrode, which is sensitive to C1' ions only. W s electrode, when atîached to a
voltmeter, registers the potential of a solution due to the presence of only the Cf ions.
The electrode is rinsed thoroughly with distilled water, and filled with a wetting
agent. The wetting agents used in these expenments were 1M KN03, 1M (NH&S04, or
a commercially prepared wetting agent of unknown composition. The choice of wetting
agent appears to make no difference in the final results.
When filling the electrode, care must be exercised to enswe that no air bubbles
are present in the wetting agent. When the electrode is ready, it must then be calibrated
using solutions of known chlonde ion concentrations.
3.8.4 Pre~aration of Calibratine Solutions
In order to calibrate the electrode, the potential of different solutions of known CI'
concentrations are recorded. These solutions are prepared by simply adding KCI, or NaCl
crystals to distilled water in the correct quantities such that solutions with the desired
concentration of CI' ions are obtained. Solutions of 0.01%, 0.03%, 0.33%, 0.65%, and
1.3% CI' ions are required. Table 3.12 shows the weight of KCl to be added to 1 kg of
distilled water in order to obtain the desired concentrations, as well as the equivalent
concentration in ppm.
- - -
Table 3.12 Preparation of Csiibrating Solutions
3.8.5 Calibration of Electrode
Once the calibration solutions and the electrode are prepared, the electrode is
calibrated. This is done by altematively reading the potential of each of the calibration
solutions. The electrode is placed into a solution, and held upnght for a predetermined
amount of time, e.g. 1 minute is usually sufncient, but 3 minutes may be needed to
achieve stability of the reading. This time must be chosen pnor to taking the fmt
reading, and must remain the same for al1 subsequent readings. Each of the five
calibration solutions are tested in tum, fiom the most concentrated to the least
concentrated solution, and then in random order. In order to avoid contaminating the
calibration solutions, the electrode should be rinsed with distilled water and tapped dry
before taking the next readiig.
Each solution is tested twice, and the reproducibility of the potential is
determined. If the potentials are approximately equal, calibration is complete. If the
values Vary significantly, then the electrode should be checked closely for any problems
such as the presence of an air bubble in the wening solution of the electrode, the lack of
wetting agent due to a leak, etc. Further measurements are then taken until
reproducibility of the potentials is obtained, and the technician is confident of the results
obtained. An exarnple of the potentials obtained during one calibration exercise are given
in Appendix B.
3.8.6 Calibration Curve and Eauation
ïhe calibration curve is constructed fiom the potentials registered for each of the
five calibration solutions. For each solution (0.01, 0.03, 0.33, 0.65, and 1.3 % Cr) the
average potential is calculated. The values of the chloride ion concentration (%) are
ploned against the average potential values, and an exponential c w e is fined to the five
points. This curve, along with the correspondiig equation, will be used to obtain the C1'
concentrations of the solution of the samples. A calibration c w e , along with its
equation, is presented in Appendix B.
3.8.7 Testine. Soil Samales
The samples prepared the previous day are now ready to be tested, given that the
standards indicate that 2-6 hours are sufficient to allow al1 C r ions to enter into the
distilled water. The mixture of soil and distilled water has now separated into two parts.
The liquid part contains the Cl- ions, and the deposited soi1 particles. Care must be taken
not to disturb the settled layer before testing the liquid part.
The calibrated electrode is lowered into the 30 ml container, until the membrane
at the tip is fully immened in the solution liquid above the precipitate. The electrode is
then held upright for the tirne chosen during calibration (1-3 minutes), and the potential
of the solution is registered.
3.8.8 Determination of Concentration of Chloride Ions of Soil
Once the potential of the liquid fiaction of each sample is obtained, it must be
transformed into a value more intuitively understandable: percentage concentration, or
ppm of C r ions. This is very easily, and quickly done by readiig off the concentration
value in percentage terms fiom the calibration cuve, or by calculating it using the
calibration equation. An example of this is s h o w in Appendix B.
The variable of interest is the concentration of CI' ions in the soil, and not in the
liquid fiaction of the prepared sample. This value is obtained by simply doubling the
concentration of Cf ions in the liquid fraction. This is due to the fact that a ratio of 2:l
between the water and soil weights was used during the sample preparation.
Finally, the concentration of Cf ions of the soil, in ppm, is obtained by
multiplying the concentration in percentage by 10,000. The values in ppm are retained
for further data analysis, although the concentration in % could have equally been used.
The chloride ion concentrations of the soil samples are presented in Table 3.13.
- Soi #
- 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 7 1 72 73 74 75 76 77 78 -
- CI-
:ontent
@PI - 80
1835 155 168 81
2310 4592 1094 249 436 38 46 58 77 54 8 1 148 111 148 716 345 3257 123 758 448 2030 537 391 423 160 380 1345 7161 12556 210 56
407 36 157 -
- Soi #
- 79 80 8 1 82 83 84 85 86 87 88 89 90 9 1 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 1 O7 108 109 110 111 112 113 114 115 116 117 -
- CI-
:ontent
@Pm) - 172 190 222 17 5 O O 6 3 5 1
362 283 272 253 142 207 8 1 22 1 O9 265 5294 326 320
2067 5394 481 2712 340 191 42 1 1830 756 233 65 17 9
28 759 374 -
- Soi #
- 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 247
C-e
- - Ci-
m e n t
@P) - 243 347 59 160 282 310 268 419 328 258 125 52 34 82
9223 13652 22664 17754 759 75 53 29 149 30 30 28 14 26 72 47 95 154 155 73 35
1042
- Table 3.13 Cbloride Ion Concentrations
83
3.9 Linear Polarization
The final test is the h:ar polarization of a standard steel sample exposed to each
of the soil samples. The result of this test is the corrosion rate, in d y r . , that the steel
sample will undergo in the given soil.
Although very informative and precise, linear polarization is a test that is t h e
consuming, and that requires a very expenenced technician. Furthemore, the equipmeni
is quite expensive. Al1 this makes the test generally inaccessible, and encourages
corrosion engineers to depend on corrosivity indices such as those proposed by AWWA
and PACE, which can be calculated ftom the results of very simple and inexpensive tests.
The corrosion ce11 set up is made up of the following elernents:
a A working electrode: the steel specimen which plays the role of the anode during
anodic polarization, and the cathode during cathodic polarization,
a An auxiliary electrode: the graphite rod which plays the role of the anode in cathodic
polanzation, and the cathode in anodic polarization,
a A reference electrode: the CuICuS04 electrode which measures the potential of the
working electrode at any point during linear polanzation,
a An ionic conductor: the saturated soil sample which allows ions to travel fÏom the
cathode to the anode, and
a The potentiostat which controls the potential of the system, and which acts as the
electncal conductor between the anode and the cathode.
The procedure for testing the soil samples can be summarized as follows: the soil
sample is saturated and a glass jar is filled up to a specific height with the saturated soil.
The surface of the metal specimen (working electrode) is prepared according to a specific
procedure and immersed into the saturated soil, along with the reference and the auxiliary
electrode. The electrodes are wired to the potentiostat, and the corrosion rate is obtained
from Tafel and polarization resistance diagrams.
3.9.1 Necessarv huioment
Potentiostat: the CMSlOO Elecirochemical Measument Çystem by Gamy, Inc.
Graphite electrode (auxiliary electrode)
Cu/CuS04 electrode (reference electrode)
Standard metal specimen, and specimen mount
A hand drill and support
No. 200 and 400 sand paper
1 .O micron, agglomerate-6ee, alpha alumina powder by Leco
Acetone
Caliper
Glass jar
The reference electrode used was the CdCuSO, electrode, which consists of
metallic copper immersed in a solution of saturated copper sulfate. This means that,
instead of measuring the potential of the system against the standard of hydrogen ion
reduction (whose potential is 0.000 V by convention), the potential is measured against
the reduction of copper ions, cu2+. This half-ce11 reaction is the reduction of CU" ions:
In order to obtain the potential against the standard of hydrogen ion reduction, the
value of 0.337 V is subtracted fiom the value of the potential obtained against the
CuICuS04 electrode. For example, a potential of 0.400 V Vs CU'+ reduction , is
equivalent to a potential of 0.063 V against H+ reduction. Regardless of which electrode
is chosen, it is used to measure the potential of the working electrode at any given time
during the experiment.
The standard metal specimen used in the expenments consisted of a small
cylinder with an approximate height of 14.2 mm and a diameter of 9 mm. It a a s created
6om material cut 6om a ductile iron pipe removed 6om the ground for testing by the
authority of COREXCO, Montreal. The specimen was machined in the Civil Engineering
Materials Testing Laboratory. It \vas fomied with a "thread" m i n g through the center,
such that it could be screwed onto the erid of a rod. This rod consists of a long thin tube
through which runs a wire comecting the metal specimen at one end, to the potentiostat
at the other. This wire ensures that the steel specimen, which is later immened into the
soil, is in constant contact with the potentiostat. Figure 3.1 shows a schematic of the
working electrode made up of a steel specimen screwed onto the rod.
..a. - -",..."..'.6 1 to poentiostat Rod
I Steel Specimen
Figure 3.1 Componcnts of the Working Elcctrode
3.9.2 Trial Runs and Reproducibilitv of Kesults
Before testing any of the soils that were retained for further analysis, trial u s
were performed on expendable soi1 samples to determine the exact procedure to be
followed such that reproducibility of results is ensured. This process is very important,
because if the technician is unable to perform the test in a repeatable manner within a
prescribed tolerance, then the result would be unreliable.
The trial runs served the following purposes:
To identify the method of preparing the soi1 samples,
To identify the method of preparing the surface of the steel specimen,
0 To provide the technician experience and the ability to perform the tests quickly and
consistently,
0 To determine the scan rate and the scan range to be used in obtainiig the Tafel and
polarization resistance diagrams, and
To determine the time to be provided for the steel specimen to stabilize in the soil
pnor to polarization.
Trial runs were performed on three different soils. The complete procedure was
established and is presented in the following section. The results of the trial run
performed on sample No. 123 are presented in Appendix C. From these results, it can be
seen that the procedure established yielded reproducible results to a satisfactory degree.
The soi1 sample is tested under conditions of saturation. Soi1 is placed in a bowl,
and distilled water is added gradually in small quantities, and worked into the soil, until
the soil is saturated. Care must be taken not to ovenaturate the soi1 because the
conductive properties of the soi1 may be incorrectly estimated if the excess water bleeds
to the surface of the soi1 during the polarization testing. The soi1 is saturated in order to
represent the worst case scenario in which the soil's conductive properties are highest.
Furthemore, it eliminated one of the variables that differs between the samples, i.e.
moisture content. It should be noted that, if a soi1 is completely dry, linear polarization of
the steel specimen is not possible because the system is missing a key element: the ionic
conductor.
Once the soil is saturated, it is transferred to the mason jar, which is filled to a
specified height. The height requirement is intended to ensure reproducibility of the
cathodic area, which consists of the area of the graphite rod which is in contact with the
soil. If the graphite rod is immersed into the soil such that its end touches the bottom,
and the height of the soi1 is always the same, then the same area of graphite will be in
contact with the soil, i.e. constant cathodic area.
The soil must be observed for signs of air bubbles. Lightly shaking the jar may
consolidate the saturated soil and eliminate any air bubbles, which tend to increase the
overail resistivity of the soil. Furthemore, if the steel specimen is in contact with an air
bubble, the actual anode area will be smaller than what has been assurned, and therefore
the corrosion rate will be underestimated. The soil sarnple is prepared first and the
reference electrode and graphite rod are secured into place within the mason jar. The
steel specimen is prepared next.
3.9.4 Pre~aration of the Workiie Elechode
This section serves to outline the method of preparing the surface of the specimen
prior to each polarization sequence. As observed previously, surface preparation plays an
extremely important part in the corrosion process, i.e. it can greatly affect the rate at
which the corrosion w".l proceed. For example, the presence of a protective surface film
would result in a lower corrosion rate. If such a film is not properly removed, or if the
steel sample is exposed to ambient air after cleaning such that a protective film is allowed
to form prior to testing, then the results obtained would be rnisleading. For this reason,
the specimen must be prepared in a consistent manner each time to ensure reproducibility
of the results.
For each soil sample, the steel specimen is polarized four times. The surface must
be prepared thoroughly pnor to each of the tests. The first step in the surface preparation
is the sanding of the surface. In order to obtain a uniform sanding, an ordinary hand drill
is mounted securely on a stand and a screw, whose diameter is compatible with that ofthe
steel specimen thread, is inserted into the "nose". The steel specimen is then secured
onto the end of the screw. Two sandig papers are used: sizes 400 and 600. As the drill
rotates the specimen, it is sanded on al1 sides with the size 400 paper first, and then with
the sue 600 paper. The specimen is then sanded with alumina paste, which ensures a
smooth preparation of 1.0 p. The specimen is then removed fiom the drill, and screwed
onto the end of the working electrode rod. When a tight seal is ensured, the specimen is
rinsed thoroughly with acetone to eliminate any greases, and then Mise with distilled
water. The specirnen is then quickly immened into the saturated soil sample, and the
appropriate test is m.
3.9.5 Polarization of the Steel Soecimen
Once the soil sample has been prepared and the working, auxiliary and reference
electrodes have k e n immersed into the soil, the first of the four polarization tests is
initiated. The goal of this test is to obtain the Tafel diagram, and to extract fiorn it the
values of the Tafel constants, P, and Pa. The potentiostat used enables the technician to introduce the desired values of the
scan rate, the scan range, etc. The following variables were specified:
i 250 mV fiom Open Cicuit Potential. Eoc
Delay provided to attain E, 1000 s or 0.017 mVls (1 mVlmin)
IR drop compensation
Anodic area, i.e. metal surface area approximately 4.5 cm2 (subject to change)
Table 3.14 Values Specilied for Tafel Test
Once this test is cornpleted, a graph such as that illustrated in Figure 3.2 is
obtained. The values of P, and P. are obtained by plotting the anode and cathode lines
such that their slopes coincide with the dope of the Iinear Tafel regions.
P M * ,
Taiei C u m EOC 4 M W 7 S V
'jk96tfdtan 2Wl199512 10 20 h 4 5 2 l i n 2 E b 707g*uixO 27'RvEpur CWBm OFF D*ON Km*
Figure 3.2 Typical Tafel Plot
Once the values of p, and p, are obtained, the steel specimen is removed from the
soil and cleaned accordiig to the standard method. The specimen is then inserted into the
soil again, and the second test is initiated with the goal of determinimg the corrosion rate
of the steel sample by the method of linear polarization. ï h e following variables are
introduced into the program prior to polarization:
f 20 mV 6om Open Circuit Potential, Eoc
Delay provided to anain E, 1000 s or 0.0 17 mVls (1 mVlmin)
IR &op compensation On
Anodic area, i.e. metal surface area approximately 4.5 cm2 (subject to change)
Density of metal 7.87 g/cm'
Equivalent weight of metal 27.92 g
Table 3.15 Values Specilied for Linear Polarhtion Test
90
Once the test is completed, a curve such as that illustratcd in Figure 3.3 is
obtained. n i e value of %, IO, and the corrosion rate are obtained by plotting a line
whose dope coincides with that of the line in the region imrnediately sunounding the
point on the curve at which the current equals zero.
Once the corrosion rate is obtained, the saturated soi1 is discarded and the entire
process is repeated a second time with a îtesh sample of the same soil. n i e soi1 and the
steel specimen are prepared according to the specified methods, and the two tests are run
again to obtain new values of p, and p,, and then the corrosion rate. Table 3.16 gives the
values obtained for soi1 sample No. 96. The results obtained indicate that the procedure
followed yielded reproducible results. The values of the corrosion rate for each of the
soi1 samples tested are presented in Table 3.17.
I --
Figure 3.3 Typieal Linear Polsrizatioo Curve
Linear Polarization I Eoc (mV) -851 4,., (mv) -852.1 i,,, (A 10-6 A/cm2) 8.914 % (A 104-3 ohm c d ) 2.970
Table 3.16 Results Obtained for Soi1 Ssmple # 96
Comsi01 Rate
(-90
Table 3.17 Corrosion Rates
3.10 Calculating the corrosiviîy indices according to AWWA and PACE
The defmitions of the variables included in each of the corrosivity gids have k e n
discussed in Chapter 2, and reviewed in the previous sections of this chapter. This
section introduces the spreadsheets used to obtain the corrosivity indices quickly and
without error.
Figure 3.4 shows the spreadsheet used to calculate the corrosivity indices
according to AWWA and PACE. The values of the appropnate variables are entered in
lines A, C, and E, and the corrosivity indices are given automatically in lines D and G.
The corrosivity indeces of the soils are presented in Tables 3.18 and 3.19.
ANALYSIS OF SOIL CORROSIVITY
SOIL SAMPLE: JK-33
On'gin: St€anvtpot#lO Dale: WC6195 Descn'p(ion: Siltyclay,lightbrow
METHOD 1: AWWAC-105
* I f h pH is bewn6.5ard 7.5,ard sulides are preçentandlorthe redoxiç negadve,add 3poim: m c
METHOD 2: PACE
Boalvsls E
&2h 1 8 1.0 0.0 1.0 IF
INDEX 170.01~
Figure 3.4 Spreadshnt Used for Quick Calculation of Corrosivity Indices
95
Table 3.18 CorrosMty Indices According to AWWA
AWWA index -
Table 3.19 Corrosivity Indices According to PACE
CHAPTER 4: ANALYSIS OF EXPERIMENTAL
RESULTS AND DISCUSSION
4.1 Analysis of Preliminary Data
n i e statistical package SAS was used to analyze the data collected during the
experimental phase of this project. This data was presented in Chapter 3. Furthermore,
the information presented in this Chapter is selective and consists only of the material
deemed to be essential. Furthermore, Appendix D: Principles of Regression Anabsis is
included for the information of the reader, and it is recomrnended that Appendix D be
consulted pnor to readiig this chapter.
n ie analysis consists primady of regressing the variables, individually and in
combiiation, with the dependent variable. The dependent variable 01) in the analysis is
the corrosion rate obtained by the method of linear polanzation. This variable is denoted
'CorrRate', and it is considered to be the "bue" corrosivity of a soil. It was the
objective of this study to derive the relationships of the other variables with CorrRate,
both individually and in appropriate combiiations. Once the relationships between the
variables is understood, the importance of the chlonde content of a soi1 is evaluated, and
a decision is made on whether or no1 this variable provides suficient information to be
considered significant.
There is a total of 12 independent variables O(,), seven discrete and five
categoncal:
1. pHdir: pH of the soil, obtained by testing the soi1 in the state in which it is received in
the laboratory (discrete),
2. pHsot: pH of the soil, obtained by testing a portion of soil supenaturated with
distilled water (discrete),
3. Reddir: redox potential of the soil, in mV, obtained by testing the soil in the state in
which it is received in the laboratory (discrete),
4. R e m : redox potential of the soil, in mV, obtained by testing a portion of soil
supersaturated with distilled water (discrete),
5. Resdir: resistivity of the soil, in ohm-cm, obtained by testing the soil in the state in
which it is received in the laboratory (discrete),
6. Ressat: resistivity of the soil, in ohm-cm, once it had been saturated with distilled
water (discrete),
7. Chl: chlonde ion content ofthe soi1 in ppm (discrete),
8. Soilfype: categoncal variable representing soil m e (S for sand, SC for sandklay,
and C for clay),
9. Moisture: categoncal variable representing moisture content of the soil as it is
received in the laboratory @ for dry, M for moist, and S for saturated),
10. Stilfl: categoncal variable representing sulfide content obtained by testing the soi1
using a solution of iodine and Na3N (N for negative, T for trace, and P for positive),
11. Sul 'CI: categorical variable representing sulfide content obtained by testing the soil
using concentrated HCI and lead acetate paper (N for negative and P for positive), and
12. Drainage: categoncal variable representing ability of the soil to 'drain' water (E for
excellent, G for good, and B for bad).
Of the 12 variables, only 10 will be used in this analysis. Drainage will not be
included in any of the following analyses because the information it provides is alrnost
identical to that of the variable Soilfype. In the majonty of the cases, a sand will have an
excellent drainage ability, a sandlclay will have a good drainage ability, and a clay will
have a poor drainage ability. One of the two variables is therefore redundant, and it was
decided to retain the variable Soilfype. Furthermore, the variable SulfHCl will also be
eliminated fÏom the list because it is felt that errors were made during testing for this
parameter. As a consequence, the SulfHCl value is unavailable for many observations,
and this results in a decrease in the reliability of the results of the statistical analyses
obtained using this variable.
The analysis of the data is divided into the following sections:
Data Exploration
Transformation of Variables
Regressing Discrete Variables One At A Time
Correlation Matnx
RSQUARE Procedure
Includiig Categorical Variable
Variables Retained for Further Analysis
Determining Sigiificance
Discussion of Results
4.1.1 Data Exaloration
The first step in any analysis is the familiarization with the experimental data.
Each of the eight discrete variables is studied individually and the distribution of the
values are observed for signs of normality, outliers, skewness, etc. The distribution of the
data plays a very important role in ensuring that the results of regression analyses are
consistent. Furthermore, outliers are also very influential, and they must be identified and
observed during the course of the statistical tests that follow.
For each variable, the following information is extracted fiom SAS output files,
and examined:
a The number of observations, (N), the mean, the standard deviations, the variance, and
the skewness,
a The five highest and five lowest observations,
a The five quantiles, the range and the h i g e spread,
a The stem and leaf diagram, the box plot, and the normal distribution plot, and
a The outliers.
Figure 4.1 displays the information produced by SAS for the variable pHdir,
includig the stem and leaf diagram, the box plot and the normal distribution plot.
The pHdir values range between 4.2 and 8.8. The data are slightly negatively
çkewed, i.e. the mean is slightly smaller than the median. This generally indicates the
presence of outliers in the lower end of the distribution, and this is quite evident when
the stem and leaf diagram and the box plot are studied. There are seven outliers: one in
the upper end, and six in the lower. Besides the outliers, the data points seern to be well
distributed and the box plot appears to have a standard shape. Fially, the normal
distribution plot is not exactly linear, in fact it appears to be slightly curved. This
indicates a small deviation fiom normality. The usefulness of a transformation is
exmined in the next chapter.
Figure 4.2 displays the information produced by SAS for the variable pHsur,
including the stem and leaf diagram, the box plot and the normal distribution plot.
The pHsa! values range between 4.7 and 9.2. As withpHdir, the data are slightly
negatively skewed, with four outliers in the lower end only. Besides the outliers, the data
seern to be well distributed, with a relatively good box plot. n i e normal distribution
plot is a linle less c w e d than that ofpHdir, but a slight deviation fiom normaliîy is still
observable.
(9P 1 2 ' 6 P I l L ' 5 S $1 ( I L 1 6 ' 8 198 1 5 ' 5 6 ' 5 $5 15P 1 6 ' 8 (LZ 11.5 b.9 $ 0 1 IC 16 '8 P I 15 P ' 8 $06 15L 1 8 ' 8 I E Z I L ' b 9.8 8S6
sqo 2-@TH sqo asan01 6 ' 8 $66
L ' b UTU $0
I ' L 1 0 $52 L.L pan $OS 5 1 ' 8 CO ESL
I S l - < I d lWl=< ld O < WON
I i l l ' l d uean PX
SSJ GySo~Inx
ameyieli u n s
sa6n wns
Reddir
Figure 4.3a displays the information produced by SAS for the variable Reddir,
includig the stem and leaf diagram, the box plot and the normal distribution plot.
The Reddir values range between -528 mV and 320 mV. It may appear from the
various diagrams that the situation is unacceptable and that the distribution is not at al1
normal. This may not be the case. The very large extreme values (-528, -138 and 320
mV) may be the cause of the box plot and the normal distribution plot having such a
distorted form. The presence of these three observiitions force the diagrams to be drawn
with large intervals, and as such, the remainimg variables tend to be lumped together. A
clue to this can be drawn from the observation of the quantiles. If the extreme values are
ignored and only the values withh the hiige range are examined (behveen 4 3 and QI),
an equal number of observations are noted to be above and below the median. This is
characteiistic of the symmetnc normal distribution. In this case, the hiige range is equal
to 43-41 = 61.5 mV. Dividing this value in h o , gives a values of 30.8 mV. For a
normal distribution, the values obtained by adding and subtracting this value from the
median will correspond approximately to the values of 4 3 and Q1. The values obtained
by 187.5 f 30.8 mV are 218.3 mV and 156.7 mV. These values are very close to the
actual ones of 216.5 mV and 155 mV, and therefore, the variables are well distributed
within the hinge area. However, it cannot be concluded form the above test that the data
set is normally distnbuted.
In order to determine the normality of the vanable, the three extreme values are
removed and the process is repeated. Figure 4.3b displays of relevant information,
includig the stem and leaf diagram, the box plot, and the normal probability plot for
Reddir without the three extreme values. The data are slightly negatively skewed, with
13 outliers in the lower end. This is a somewhat high number. However, aside from the
outliers, the data points are well distributed and the box plot appears to have a standard
shape. Fially, the normal distribution plot is not quite linear, in fact it is significantly
curved. This indicates a deviation 60m normality which may be corrected by a
transformation in the next section.
Rebat
Figure 4.4a displays the information produced by SAS for the variable Rebat,
including the stem and leaf diagram, the box plot and the normal distribution plot.
The Rebat values range between -475 mV and 282 mV. Once more, it may
appear fiom the various diagrams that the situation is unacceptable and that the
distribution is not at al1 normal. However, an examination of the quantiles results in
following: the value of half the hinge range is equal to 5912 = 29.5 mV and 174 i 29.5 =
206.5 mV and 144.5 mV. ïhese values are quite close to the true values of 196 mV and
137 mV, respectively. This was the result anticipated, and M e r analysis is therefore
warranted.
In this case, it appears that the cause of the distortion in the diagrams is the one
extreme value of -475 rnV. When this value is removed and the process repeated, the
resulting boxplot and normal distribution plot are greatly improved. The results obtained
are presented in Figure 4.4b. The data are slightly negatively skewed, with the outliers in
the lower end. However, aside fiom the outliers, the data points are well distributed and
the box plot appears to have a standard shape. Fially, the normal distribution plot is
fairly linear, except for the lower end outliers. This indicates a slight deviation from
normality.
O N P n N O P * N P N - N N N I
Resdir
Figure 4.5 displays the information produced by SAS for the variable Resdir,
including the stem and leaf diagram, the box plot and the normal distribution plot.
The Resdir values range between 173 and 22080 ohm-cms. Once more, it appears
fiom the various diagrams that the situation is unacceptable and that the distribution is
not at al1 normal. Unlike the case of the redox potentials, the situation is not the result of
one or two extreme values. In fact, it appears that the entire set of values contributes to
the problem. This can be concluded form the examination of the quantiles. The value of
half the hinge range is equal to 925212 = 4626 ohm-cms and the predicted quantiles are
equal to 3750 + 4626 ohm-cms, that is 8376 and -876 ohm-cms. There is a very large
difference between these values and the actual quantiles reported in Table 4.5 and
therefore, it is not only the extreme points that contribute to the distortion of the
diagrams. but also the entire body of the values.
Furthermore, examination of the normal probability plot suggest that the majority
of the variables are concentrated below 5000 ohm-cms, but that there are a significant
number of points which are several orders of magnitude larger. It appears that a
logarithrnic transformation may be indicated. This will be studied in the next section.
Ressat
Figure 4.6 displays the information produced by SAS for the variable Ressat.
including the stem and leaf diagram, the box plot and the normal distribution plot.
The Ressat values range between 73 and 183400 ohm-cms. As in the case of
Resdir, it appears that the whole set of observations contribute to the distorted shape of
the boxplot and normal probability plot. Examination of the quantiles rtveals the
following: half the hinge range is equal to 350412 = 1752 ohm-cms and 2259 I 1752 =
401 1 and 507 ohm-cms. These values are far fiom the calculated quantiles of 4800 and
1296 ohm-cms, respectively. Furthermore, examination of the normal probability plot
seems to suggest, as in the case of Resdir, that a logarithmic transformation may be able
to correct the normality problem.
Chloride
Figure 4.7 displays the information produced by SAS for the variable Chloride.
including the stem and leaf diagram, the box plot and the normal distribution plot.
The Chloride values range between O and 22664 ppm. As in the case of Resdir
and Ressar, it appears that the whole set of observations contribute to the distorted shape
of the boxplot and normal probability plot. Examination of the quantiles reveals the
following: half the hinge range is equal to 42312 = 211.5 ppm and 190 i 211.5 = 401.5
ppm and -21.5 ppm. These values are far ftom the calculated quantile values of 481 and
58 ppm, respectively. Furthermore, examination of the normal probability plot seems to
suggest, as in the case of Resdir and Ressc!, that a logarithmic transformation may be able
to correct the normality problem.
CorrRate
Figure 4.8 displays the information produced by SAS for the variable CorrRote,
including the stem and leaf diagram, the box plot and the normal distribution plot.
The CorrRate values range between 0.06 and 0.26 d y r . The data are positively
skewed, i.e. the mean is slightly larger than the median. This generally indicates the
presence of outliers in the upper end of the distribution, and this is quite evident when
the stem and leaf diagram and the box plot are studied. There are two outlien: 0.19 and
0.26 d y r . Besides the outliers, the data points seem to be well distributed and the box
plot appears to have a standard shape. Fially, the normal distribution plot is not exactly
linear, in fact it appears to be slightly curved. This indicates a small deviation ftom
normality. The usefulness of a transformation will be studied in the next section.
- o o o m n P N P 4 O P . " r < P N U ) - yo=- W N N
m N - o m - N
8 " N
n 0 1 C U U - 4 9 0 - - m C o l a - Eu] 3 4 0 S c & - - E E l l r n V A ~ t t
L E E Z Z O G Z z O k ; O P P U ) " P U ) O m m 2 2.1NW::Z;Z
m P P P - P Fin m m w m . N
DL W . r . , " N W .+ _ m . . * . O d " P 1 N N m "7
* 3 O
OOrnLD.? w - m 3 n O O i n r n P N O " m O ' m N N O i n N " I I N 3 O n 3 ' E
DI Y * 3 c C -",
a c o Y1 3U
X V C 4 O
a m f 1 3 . . z u s
I O Z D S m s s a s mou O Y 1 0 " ? 0 E I V O P in N - 2 ô 2
. , . , . * . , . , . , . , . * - a , , . , . , . , . * . , . , . , . , * + N I I I
. .
m m N N m c - P " . . . ? . . ~ N L i W O ~ O O r < Y I ~ O P r < O i O O
o 0 m m . O 0 0 , . O W W r n . . . d m . - Y I 0 0 0
m.. . N N 4
4.1.2 Transformation of Variables
There are rnany different ways to transform data. The values can be logged,
inversed, square-rooted, and so forth. Although a transformation can be applied to any
variable, each one seems to work best on a particular type of variable. Of al1 the
transformations, the one that may be of value is the logarithmic (or log) transformation,
which is typically applied to variables representing physical characteristics such as length,
weight and concentrations. The variables being studied represent concentrations (as in
the case of pH, redox potential and chlorides) and physical characteristics ( such as soi1
resistivity), and rnay be rendered 'normal' by the log transformation.
The variables which appear to be in most need of transformation: Chloride.
Resdir, and Ressat are studied first. The normal probability plots of these three variables
are very similar, and it is suspected that the same transformation can be useful for al1
three. The transformation proposed is the replacement of the original value with the
logarithm of that value. These new variables, referred to as LChl, LResdir and LRessat,
are presented in Tables 4.1, 4.2, and 4.3. The analysis performed on the original data
(Section 4.2) is repeated using the logged data, and the results are presented in Figures
4.9.4.10, and 4.11.
In the case of LResdir and LRessat, the distribution has greatly improved. The
shape of the boxplots is acceptable, and the normal probability plots are fairly linear. The
transformation is therefore considered a success and, fiom this point fonvard, the
variables LResdir and LRessat will be used instead of Resdir and Ressat.
In the case of LChl , the distribution has also improved dramatically. The boxplot
has a shape which is alrnost perfect, and the normal probability plot is very close to being
perfectly linear. This transformation is considered a success, and the variable LChl will
replace Chloride fiom this point fonvard.
The transformation of the remainimg variables: pHdir, pHsat, Reddir, and Redsat.
Like Chloride, these variables represent the concentrations: ions in the case of pH,
and oxygen in the case of the redox potential. However, unlike Chloride, the
concentration hm already been logged in obtaining the pH and the redox potential. n i e
formulae for obtainiing the pH and the redox potential, 4, of a solution are the following:
It can be clearly seen that the pH and the potential4 are not direct measurements
of the concentration, but represent the concentration indirectly given that these
concentrations have been logged. For this reason, it is considered unreasonable to
perfonn a second logarithmic transformation on these variables, which are already the
result of a logarithmic transformation. It is possible that a transformation will render the
data more attractive, but it must be understood why a transformation is performed. Any
data, through a series of transformations, may be made to exhibit characteristics of
'normality'. However, if these transformations cannot be justified or understood
intuitively, it is bener not to include them at all.
Finally, another variable which is considered a candidate for the log
transformation is CorrRate. The values of the new variable, denoted LCorr, are
presented in Table 4.4. The LCorr values were analyzed and the results are presented in
Figure 4.12. It c m be seen that the distribution of the data has not improved much. For
this reason, the original variable will be retained for the analysis to follow.
In conclusion, the following discrete variables will be used 6om this point
fonvard: pHdir, pHsut, Reddir, Redsat, LResdir, LRessat, LChl, and CorrRate.
LChl (ppm) - 3.813 3.994 3.474 3.951 2.464 2.173 2.255 1.255 0.845 1.204 1.477 1.380 1.000 0.699 1 .O4 1
1.623 1.903 2.117
3.294 2.201 1.544 1.954 1.892 2.806 2.857 3.3 11 2.467 2.816 2.744 2.072 2.771 2.525 2.913 2.342 2.43 1 -
- - LChl @Pm) - 1.903 3.264 2.190 2.225 1.908 3.364 3.662 3.039 2.396 2.639 1.580 1.663 1.763 1.886 1.732 1.908 2.170 2.045 2.170 2.855 2.538 3.513 2.090 2.880 2.651 3.307 2.730 2.592 2.626 2.204 2.580 3.129 3.855 4.099 2.322 1.748 2.610 1.556 2.196 -
>le4.1 \
- Soi1 #
- 79 80 8 1 82 83 84 85 86 87 88 89 90 9 1 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 -
ues of LC
- Soil #
- 1 2 3 4 5 6 7 8 9 1 O 1 1 12 13 14 15 16 17 18 19 20 2 1 22 23 24 25 26 27 28 29 30 3 1 32 33 34 35 36 37 38 39 -
- Soil #
- 40 4 1 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 7 1 72 73 74 75 76 77 78 -
- Soil #
- 79 80 8 1 82 83 84 85 86 87 88 89 90 9 1 92 93 94 95 96 97 98 99 1 O0 101 102 1 O3 104 105 106 107 108 109 110 1 1 1 112 113 114 115 Il6 117 -
es of LR
- Soil #
- Il8 Il9 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 247
-
Table 4 3 Values of LResrol
122
- m . 4 w w - N n . w n m . . - . . . a - N N "
Soil #
- 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2 1 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 -
Soil #
- 40 4 1 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 6 1 62 63 64 65 66 67 68 69 70 7 1 72 73 74 75 76 77 78 -
Soil #
- 79 80 8 1 82 83 84 $5 86 87 88 89 90 9 1 92 93 94 95 96 97 98 99 1 O0 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 -
Soi #
- 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 247
- Table 4.4 Valua of LCorr
124
rnN"YIN ""3 in0 I O N Y 3 I . . . . . . 0 0 3 3 0 0 0 , 8 ,
m w w P P m m m m 0 0 ~ 3 N : I I I I I I I I 8443-3
l , , , , z
4.1.3 Remession of the Individual Variables
Now that the distribution of each variables has been studied, it is tirne to study the
distribution of the residuols arising fiom the regression of the independent variables with
the dependent variable CorrRate. n i e first step is to regresç each X variable individually
and to observe the residual distribution. Here, the goal is to study the distribution of the
residuals for signs of anomalies, and to keee track of the outliee on Y. In the previous
section, the outliers obtained were outliers on the X variable only. In this section, the
outliers on the residuals are examined, i.e. outliers on the model chosen to fit the data.
It is most probable that the distribution will not be perfectly normal. This usually
suggests that another X variable should be added to the model to account for the variance
which was not accounted by the f i s t variable. In a later section, a complete model
(consisting of a cornbiiation of X variables) is regressed against CorrRate and a normal
distribution is anticipated. If this does not occur, it will be assumed that model is not
correct, and the search for the missing X variable to be added to the model will continue.
For each variable, the following information was extracted and examined:
a Number of observations N, SSE, PRESS statistic, R2, R2-adjusted, F-ratio,
a Outlien identified by a z-score > 3,
a Outliers identified by a Cook's Distance (CD) >l,
a Plot of residuals vs. predicted value of y ( e vs. y ),
Stem and Leaf diagram, box plot and normal distribution plot constructed from the set
of residuals.
pHdir and pHsat
Tables 4.Sa and 4.5b display the information related to the variables pHdir and
pHsut, respectively. Furîhermore, Figures 4.13a through 4.13d display the SAS output
for the variablespHdir and pHsot, includiig the e vs. y plot, the stem and leaf diagram,
the box plot, and the normal probability plot for the residuals.
Residual Informatiou: VariablepHdir 1
N SSE
PRESS statistic R'
R'-adjusted F-Ratio
Outlien with z > 3 Outliers with CD > 1
Table 45a Residual Characteristies:pHdir
Residual Information: Variable pHs@
PRESS statistic
Outliers ~4th z > 3 Outliers \sith CD > 1
Il
Table 4.5b Residual Characterktics: pHs01
Only 74 observations were available for regressing the variable pHdir against
CorrRate. ï h e increase in the error sum of squares was 8%, which is quite acceptable.
This value is obtained by the following equation:
% increase in the error surn of squares = PRESS - SSE SSE
The above percentage gives an idea of the ability of the equation to fit foreign data, i.e. to
fit data which were not a part of the observations used to create the equation itself.
Another way to measure this is to compare R2. to R2. The decrease in R2 is called the
shrinkage, and it represents the decrease in predictive power of the equation when used
on the population as a whole. in this case, the shrinkage is 9%, which is acceptable.
The F-ratio is used to determine if the variable is significant in predicting y. ï h e
value obtained for pHdi is equal to 10.61, which is substantially higher than the critical
value of 4.00 (see Figure F.l) and, as suspected, pHdir is considered significant in
predicting CorrRate.
The outliers are identified by two different methods: by the z-score of the residual
and by Cook's Distance (CD). When the z-score of a residual is larger than 3 andor
when the value of CD is larger than 1, the observation is considered a residual. For every
variable, observation #72 is considered an outlier with a z-score > 5. Furthemore,
observation #149 has a z-score close to 3 and should also be examined in future analyses.
Finally, normality of the residuals is determined using the e vs. y plot, and the
normal distribution plot. The e vs. y plot is used to identifi any underlying trends in the
distribution of the residuals. if the plot shows a particular pattern in the residuals, this
generally indicates that the proposed model is not a complete one, i.e. that not al1 the
variance in the system has been accounted by the model proposed, and that the addition of
another variables may be necessary. This can also be determined by studying the shape of
the cuve of the normal distribution plot. A straight line is characteristic of a normally
distnbuted set of values, and therefore, any non-linear c w e would indicates a deviation
from normality which may be corrected by the addition of another variable. in the case of
pHdir, the e vs. y plot does not show any clear trends in the distribution of the points.
However, it does show that the majority of the residual values are between i 0.05 mmlyr,
and that one point in particular (#72) is much higher than the nom. This can also be seen
on the boxplot where observation #72 is clearly an outlier, and on the normal probability
plot where the outlier is located far above the c w e . Fially, the cuve on the normal
probability plot is slightly c w e d which, as suspected, indicates that some other variable
should be added to the model ofpHdir alone.
in the case of pHsut, 74 observations were analyzed and revealed an F-ratio of
4.12. This value is larger than the critical value of 4.00 and, as such, pHsat is considered
to be significant in predicting CorrRate. Furthermore, there is an 8% increase in the error
sum of squares, and a 24% decrease in R~. From the above results, it is clear thatpHsat is
not as good aspHdir in predicting CorrRate, even though the distribution of the residuals
is very similar. However, because one variable is k i n g studied at a tirne, it cannot be
concluded that pHsat will not be better in predicting CorrRate when it is used in
combination with other variables. This will be discussed in a later section.
C - m w m - - Y ) P m - N T P O 3 m m o r - e W N P m n o - 0 m - r " L I . O . m n o ? ? Y & ? 193 0 0 1 0 0 0
X U C 4 n * L I - * I O S O Z s s s s s - 0 0 O Y I O Y I O E l u O P Y I N - :ô20
Reddir and Redsat
Tables 4.6a and 4.6b display the information related to the variables Reddir and
Redsat, respectively. Furthemore, Figures 4.14a through 4.14d display the SAS output
for the variables Reddir and Redsot, including the e vs. y plot, the stem and leaf diagram,
the box plot, and the normal probability plot for the residuals.
R Residual Information: Variable Re&
N SSE
PRESS statistic R2
R2-adjusted F-Ratio
Outliers with z > 3 Outliers with CD > 1
- --
Table 4.6a Residual Cbaracteristics: Reddir
Residual Information: Variable Rea3at I
N SSE
PRESS statistic R~
R~-adjusted F-Ratio
Outliers with z > 3 Outliers with CD > 1
Table 4.6b Residual Cbaracteristics: Redsal
Only 74 observations were available for regressing the variable Reddir against
CorrRate. The F-ratio is equal to 0.324, which is far below the critical ratio of 4.00 and
which indicates that Reddir is not significant in predicting CorrRate. This is also
indicated by the value of 0.0044 for R*, which shows that the correlation between Reddir
and CorrRate is very low. n i e results obtained fiom Reddir appear to indicate that this
variable will not play an important role in future analyses. This will be s h o w to be true
in the following sections.
The e vs. y plot shows that the majority of the residuals are between k0.05 mm/y.
There are two points which appear to be located far fiom the rest: observation #72 and
#149. According to the z-score and Cook's Distance, the only bue outlier is observation
#72, but #149 should also be observed because of its hi& z-score. Fially, the normal
probability plot indicates a deviation fiom normality, as well as the presence of the two
outliers which are located far fiom the line.
In the case of Redsat, the results indicate that this variable performs worse than
Reddir. The F-ratio of 0.0012 clearly shows that this variable is not significant in
predicting CorrRate, and the extremely low value of R2 indicates that there is little
correlation between Redsat and CorrRate.
@ D O L I n N m " Y i r - m 4
Y I - 0 0 0 N N ? m Y - 0 . . . . O E C . , 0 0 0 0 . O . - < Y 3 0 0 1 1 1 0 1 Y l c o o ? 9 U
X V C .4 O 4 n w i . 4 Z U a E O E O Z
O P YI N 4 2 0 :
" , o . . m n i O Y i Y i 0 P m m ? P .<D
- 0 1 1 Y I N m m .r N O W I I
9 7 9
" m N m 4 m r - w u m m w m P N O n w m C i n N O " " " N O . O
. O 0 . O . 0 . . O 1 0
0 0 1 1
3 v i o r n m m m w m 3 P O O N W "O... Y n Y ) i n w < O Cm", c Y I N O - ' o m o a " " 0 0 0 O N 0
0 89888 .3g 0 1 1 1 0 0 1
LResdir and LRessai
Tables 4.7a and 4.7b display the information related to the variables LResdir
and LRessat, respectively. Furthemore, Figures 4.15a through 4.15d display the SAS
output for the variables LResdir and LRessat, including the e vs. y plot, the stem and leaf
diagram, the box plot, and the normal probability plot for the residuals.
Residoal Information: Variable LResdir
N SSE 1
PRESS statistic R2 1
R~-adjusted F-Ratio
Outliers with z > 3 Outliers with CD > 1 I Table 4.7a Residual Characteristics: LResdir
Residoal information: Variable LRersnf
N SSE
PRESS statistic R2
R2-adjusted F-Ratio
Outliers with z > 3 Outliers with CD > 1
Table 4.7b Residual Characteristics: LRessal
in the case of LResdir, 70 observations were used. The F-ratio obtained is 2.553,
which is below the cntical F-ratio of 4.00 and indicates that the variables is not
significant in predicting CorrRate. ïhis is contrary to the expectation, considering the
importance of resistivity in the corrosion process.
There is a 39% decrease in the R2 value and a 14% increase in the error sum of
squares, when applying the equation to the population as a whole. These values are
somewhat higher that expected. Furthermore, observation #72 is identified as an outlier,
with observation #149 also has a high z-score.
The e vs. y plot shows that 90% of the residuals are between 10.04 d y r , with
only two residuals being above 0.06 mdyr. These two residuals also appear on the
normal probability plot as points located off the curve, as seen in Figure 4.16d. The
curve of the normal probability plot also suggested a slight deviation form normality.
in the case of LRessat, 74 observations were analyzed. The F-ratio of 2.598 is
also below the cntical value and therefore, contrary to what is expected, the analysis
shows that LRessat is not significant in predicting CorrRate. Furthermore, there is a 39%
decrease in the R2 value and a 4% increase in the error sum of squares. Once more,
observation #72 is identified as an outlier, with observations #149 and #42 exhibiting
nutlier behavior.
The e vs. y plot shows that 95% of the residuals are between 10.04 d y r , with
only two residuals above 0.06 d y r . in general, the results suggest that LRessat is better
that LResdir in predicting CorrRate. This is the expected result because CorrRate was
obtained by testing a soi1 which has been saturated with distilled water and, as such,
whose resistivity during testing was better represented by Ressat than by Resdir.
4 P N " , N W m m - .A c m - ~ w o i m U Y I N - - - N O - C P d 0 0 1 1 N O 4 : 2 9 9 0 9 m g 9 o ; O y ' y
- - - - - L I N - < I N " O m N - - m U w d - N m z m r < m m m
n m o - m m " W 4 . O O O i E I O . . . . Y 0 0 0 0
* O * m P - O N O m P * m m m o m m -
m m - n o - P 0 0 NI* 0 o n w o . '?Y 7 m o . o . - 0 0
O m o 0
LChl
Tables 4.8 displays the information related to the variable LChl. Furthermore,
Figures 4.16a and 4.16b display the SAS output, includiig the e vs. y plot, the stem and
leaf diagram, the box plot, and the normal probability plot for the residuals.
Residual informatiou: Variable LChl
N SSE
PRESS statistic R~
R~-adjusted F-Ratio
Outliers with z > 3 Outliers with CD > 1
Table 4.8 Residual Characteristics: LChl
In total, 73 observations were used to analyze the relationship between LChl and
CorrRate. The F-ratio of 8.572 is much higher than the critical value of 4.00 and, as
such, this variable is considered very significant in predicting CorrRafe. The decrease in
R~ is only 12% and the increase in the error sum of squares is 7%. These value are quite
acceptable. Furthermore, observations #72 and #149 are identified as outliers, with #42
exhibiting outlier behavior. This can also be seen on the e vs. y plot, where 95% of the
residuals are between rt0.04 M y r , with only 'e two outliers above 0.06 M y r .
Fially, the normal distribution plot shows a slight deviation fiom normality which may
be corrected by the addition of another variable to the model.
The results analyzed up to this point suggest that the pHdir variable performs the
best, followed by LChl ,pHsut and LRessar. The variables which appear to add the least
information are Reddir and Redsar. It is not at al1 surprising that the pH valuc plays such
an important role, as the reduction of H' ions is one of the iwo reactions expected to
contribute to the corrosion problem. The other reaction expected is the reduction of 0 2
and, as such, the insignificant role of Reddir comes as a surprise.
The influence of the chloride content was also expected. Chlorides have a dual
effect on the corrosion rate. Firstly, they decrease the resistivity of the soi1 because they
are ions and conductive by nature, and secondly, they inhibit the formation of the
protective passive layer on the steel specimen. On the other hand, what is very surprising
is the insignificant role played by the variable LRessat. It was thought that because
resistivity indirectly measures the chloride content, as well as the general ionic content of
the soil, that LRessar would provide almost as much information as LChl . However, this
has not been s h o w yet.
4.1.4 Correlation Matrix
In the previous section, the relationship between CorrRate and each of the
independent variables was studied. One could easily proceed and regress each variable in
turn with al1 the others in order to identifi the extent to which the variables are
intercomlated. An alternative to this is to study the correlation matiix of the set of
independent vanables, plus the dependent one. The correlation matrix for the data under
study is presented in Figure 4.17.
The ideal situation is one in which the correlations between the dependent
variable (CorrRare) and each independent variable are high, and the correlation behveen
the independent variables themselves is low. This would result in the least amount of
multicollinearity, i.e. redundant information, and would lead to a situation where each
variable that is added to an equation would provide new information and would serve to
significantly increase the effectiveness of the equation.
An examination of the correlation between CorrRaie and the independent
variables will quickly reveal that the results are the same as those obtained in the previous
section: the pHdir variable perfonns the best, followed by LChl , pHsot and LRessar. The
variables which appear to add the least information are Reddir and Redsar. Another
important point is the sign of the correlation between CorrRafe and LResdir. One would
expect that the corelation would be negative, i.e. the higher the resistivity, the lower the
corrosion rate, but this is not the case. However, it must be kept in mind that the variable
LResdir was obtained by testing the soi1 in the state in which it was received in the
laboratory, which means that some soils were tested when dry and others were tested in a
saturated condition. The results obtained are therefore misleading.
Another important fact observed fiom the correiation matrix is the presence of
high correlations betweenpHdir and pHsa!, Reddir and Redsaf, and between LResdir and
LRessaf. As each of these pairs essentially measure the same soi1 property, it is not at al1
surprising to see high correlations. This indicates that the information provided by the
variables is essentially the same, and that only one of the two variables needs to be
included in a model. The choice of the variable to be retained will be discussed later.
Another correlation of particular interest is that between LChl and each of the
resistivity variables, LRessot and LResdir. The correlation between LChl and LRessaf is
very high, and this indicates that the two variables essentially provide the same
information. Although LChl is the better of the two variables, it remains to be
determined whether or not the extra information provided by LChl is sufficient to
consider this variable significant when the variable LResllir is already known.
The correlation matrix provides information about the interaction of the variables.
This is the first step in the determination of a model which describes the corrosion
phenomenon well. The next step consists of comparing the possible models, which will
be done using the RSQUARE procedure in SAS.
4.1.5 RSOUARE Results
The RSQUARE procedure in SAS is used to obtain a list of the 10 best 1-variable,
2-variable, 3-variable models, etc. This procedure is considered better than the stepwise,
fonvard, and backward regression procedures because it does not present one final model
as the best model. instead, it provides the analyst with a set of models that perform best,
and allows the analyst to compare the models and to select the one shown to be the most
logical.
Table 4.9 presents several 1, 2, 3 and 4-variable model which are made up of a
combination of variables which is considered acceptable by the analyst. in the case of the
one-variable model, the results obtained are similar to those obtained in the previous
exercises. As expected, the variables which appear to be correlated best with CorrRate
are the pH variables, and LChl . However, it is swprising that the variable LResdir
performs better than LRessat, and that LResdir appears in almost al1 the best 2 and 3-
variable models, when LRessat appears in r few. Furthermore, these two variables often
appear in the same models, which suggests that the information provided by each of the
variables is not necessarily repetitive. And finally, even though LResdir often appears
together with LChl, LRessat never does. These results will be considered M e r in a
later section.
Another important result is that pHsot andpHdir never appear in the same model.
This suggests that the information provided by one variable is not necessary when the
other variable is already in the model. This was an expected result. Furthermore, the
models containing pHdir almost always perform better than those containing pHsat. For
this reason, it can be safely concluded thatpHdir outperformspHsat.
Finally, Redrat does not appear in any of the models and, although Resdir does, it
does not appear to play a very important role. This is certainly a surprishg result that will
be considered further in a later section.
Possible
Variables in Mode1
PHDIR PHSAT
LCHL LRESDIR
LRESSAT
REDDIR REDSAT
LRESDIR LCHL
PHDIR LCHL PHDIR LRESDIR
PHSAT LCHL PHDIR LRESSAT
PHDR REDSAT
PHDIR REDDIR
PHSAT LRESDIR
PHDIR LRESDIR LCHL
PHSAT LRESDIR LCHL PHSAT LRESSAT LRESDIR
REDSATLRESDIR LCHL
REDDIR LRESDIR LCHL REDSATLRESSAT LRESDIR
REDDIR LRESSAT LRESDIR
PHDIR REDDIR LRESSAT LRESDIR
PHDIR REDDIR LRESDIR LCHL
PHSAT REDDIR LRESDIR LCHL
1,2,3, and 4 - Variable Models
4.1.6 Cateeorical Variables
Up to this point, only discrete variables have been considered. The influence on
the categoncal variables such as Soiltype, Moisfure and Sulflhave been ignored. One
way of includiig categoncal variable in the analysis is to transform each one into a set of
dummy variables which can then be treated like discrete variables (see Section D.5). The
results of such an analysis are not very obvious, and for this reason, a simpler exercise
will be performed to investigate the general effect of the categorical variables.
This exercise consists simply of calculating the correlation matrix and performing
the RSQUARE procedure on the data which has been sorted. For example, to examine
the effect of the variable Soiltype, the data is first sorted into the three categories: sand,
sandlclay, and clay. Then, for each of these three categories, the correlation matrix is
calculated and the RSQUARE procedure is performed. The correlation matrices
obtained for the variable Soiltype are presented in Figures 4.18a through 4.18~. The
following important points are extracted form the matrices:
a For clays, the variables which perform best arepHdir and LChl . Conversely, LRessat
performs very poorly. Furthermore, the variable Reddir performs quite well.
For sandclays, pHdir performs best, followed by LChl aid LRessaf which perform
equally well. The redox variables appear not to be very helpful.
For sands, the variables which perform best are LChl and LRessat. The pHdir
variables seems to perform poorly.
The above results suggest that the variable Soilfype plays an important role. For
example, the pH of a soi1 seems to be more important when the soi1 is a clay, and the
resistivity is more important when the soi1 is a sand. Furthermore, both variables are
important when the soil is a sandclay, and the chloride content appears to be important
irrespective of the soil type. It is therefore concluded that the variable Soiltype should be
included in future analyses, and as such, this categoncal variable will be transformed into
a set of three dummy variables and treated in the same manner as the other discrete
variables.
- d m PN.. - n m O m .
. O
O C l N m O N n O m O W . . O O
W m N m n N m m m O T .
. O
- n N N m N N n v O - . . O
W O N N - N - - " O - . . O O
- n N m . 4 - N O - 0 W . . O O
" " N N O N N O m O m . . O
O N O N
H ? . O "
I E
O N - N P - m 0 P, d - . . O
".a N C W N m O m O m i
. O
m m - N P N - * O O m . . O
O
m .+ N m N N P m m O - . . O O
L n " N m P N W" - 0 - . . O
- 0 N 4 " N m O m O m .
. O
O N O N
O O 0 . . O d
" " N N O N N O
2 ? . O
C :: Di
m 3 N O N . + O N O N N O O
h? X ? 8 0 ;O
O N L n d N O N " O N O N O 0 0 n o 0 . m . . O . O * O
W P N V N N m - N m m N m m O "
0 9 :? 8 0 & O
" N N w - N m . 4 - m w N C - m - P N N O N . - . & O & O
" r i N m m N m N N N C N P m r ( " " 9 ? . O . O
O O
" n N W m N N D I N m m - N m - " - 0 m o - . W .
& O & O
5 3 YI 10
P P cl A
P P n O n P N ' O 4 m m O
'29 X ? & O ;O
O N - P r > O N P N i O m m 0 0 0 ) o 0 . - 7 .
io o o
W N N m o n n d N * " " W O w m - 0 - m W . 0 .
o 0 à 0
N ~ N m m n W n N m m 3 m 0 W N m o O P
m . & O - . & O
" m N m o n O m N - 0 3 - O N m 'O N W N . 4 .
o 0 o o
" m N m w m n w N " N d N O w m " O " N W . m .
8 - à 0
" m N O N - P W N N P " m o m m m o n " L n . W .
à 0 à 0
O m N - A n DOON W N " - 0 - O
2 9 . O n ? & O
Correlacion Analysis
Pearson Correlation Coefficients / Prob > IR1 under Ho: Rho-O / Number of Observations
PHDIR PHSAT REDOIR REOSAT LRESDIR LRESSAT LCHL
PHDIR 1.00000 0.61974 -0.03843 -0.03715 -0.06464 -0.13583 -0.01416 0.0 0.0001 0.7826 0.7897 0.6660 0.3419 0.9223
54 51 54 54 47 51 50
PHSAT
REDDIR -0.03843 -0.06473 1.00000 0.82414 0.38833 0.37746 -0.41076 0.7826 0.6452 0.0 0.0001 0.0070 0.0063 0.0030
54 53 54 54 4 7 51 50
REDSAT -0.03715 -0.23635 0.82414 1.00000 0.22604 0.40600 -0.48817 0.7897 0.0884 0.0001 0.0 0.1266 0.0031 0.0001
54 53 54 54 47 51 50
LRESDIR -0.06464 -0.02610 0.38833 0.22604 1.00000 0.74594 -0.57108 0.6660 0.8633 0.0070 0.1266 0.0 0.0001 0.0001
47 4 6 4 7 47 47 47 15
LRESSAT -0.13583 -0.26105 0.37746 0.40600 0.74594 1.00000 -0.885a7 0.3419 0.0671 0.0063 0.0031 0.0001 0.0 0.0001
51 50 51 51 47 51 48
U H L -0.01416 0.20294 -0.41076 -0.48817 -0.57108 -0.88587 1.00000 0.9223 0.1620 0.0030 0.0003 0,0001 0.0001 0.0
50 49 50 50 4 5 4 8 50
Figure 4.18b SAS Outpuk Correlation Mntrix for Sand Snmplcs
- - - --
Y n ~ m n o m m m m n n m ~ m m o o m ~ m m O m -UII r w r w o c W Y C n n s n m r o r
n m m - W " '" O O m . . g":""N: : 9 g i o & O & O ,O & O ,O . O ;O
i N o r n N n O N " , Cr", r - N - 4 . 4 - O UI N m m 5 O ~ P n m p m n r NP Y O P ~ O P O r m m r " m r P I m m W O 0 0 O A W " o m " - n
O . . g ? 2 9 Z ? " O O . - - N . . O ,O & O ,O ;O . O X 0 X 0 O
c o m r n r o r n n ~ m w r - r O w - - r 3 0 m
2 Z = P N ~ P U I ~ P m o r O r m o r n n c O r m m C r - m o O
m - n r o n w n m o 0 0 WC)
W ' . O . - . O . P . g ? m . O - N .
4 O . O < O ;O ;O ,O ;O = X 0 O
O - N L P C o m n n n r o r r - o r - N w m m n o P n n r o m r N - P O P m o r W O P w w r
0 P W n a UIW m - 0 m o w o m - m - 3 w n r N m 0 0 m o m o r N W " . 3 . 0 . 0 . O . P . œ . O O
W . - . A O $ 0 Io i" O Io 4 "
c N O U I w O r - - w O P n n c ~ m w r r n n n m $ 2 g P g4P -0,. O P N d C mm,. m N c - - 0 1 m o O
0 " 7 - N O m o m., r m n m m N m W i n o m o m . d . w . y & ;x 2; 0 . 0 .
X 0 X 0 O . O
5 n g z mg: g : O - n - - U I o w o m n m
D P m N U I O 0 - r N U I P m n r " m "
D m o w 4 0 0 m o " W m m r m N r n - , , O : 9 !2P 2 7 zL4 O b
.& . O & O ' O & O & O & O 8 0 0 . . O
O w m w W O P m r r n r w n ~ n n n m !j O P m m r - m P U I n c m r c - n m r d m - ~n n r n w - r w o n z q w N O w P m o m 0.-
D W 4 . N . 4 . O . N . 9 8 & O ;O ' O ;O ;O ,O ;O
œ O n o m n ~ o m 0 4 - o m r N O - .. O P ",OP m m " n U I P n o r P P C O r n P 22: 0 O N O P m 3 m P W
P 8 9 % 9 Z 9 21 W., 2; ES: "" 4 . A . 0 ? 9, & O ' O & O < O ,O ;O ; O
W
2 ; 2 !j g 8 2 V1 i 8 I I n e 2 2 5 8
The results obtained fioz?. the RSQUARE procedure indicate that the variables
LChl and pHdir play a very important role, and that both the resistivity variables
conhibute new information. In some cases, LRessat performs bener than LResdir, and
in other the opposite is tme. Furîhermore, the variables LChl and LResdir often appear
in the same model, whereas LRessot rarely appears in a model wi?h LChl . It is becoming
clear that the variables LRessat and LChl provide overlapping information, and that
LResdir appears to represent some other inherent characteristic of the soil.
AAer a similar analysis involving the categoncal variables Moisture and Sulf7, it
was concluded that these variables do not introduce new information, nd it was decided
that they will not be considered in future analyses. In the case of the variable Moisture, it
is not at al1 surprising that the variable is unnecessary. Moisture is a measure of the
saturation state of the soil as it was received in the laboratory. This is not the state in
which the soil was tested to obtain the value of CorrRote, the dependent variable.
Therefore, the variable Moisture cannot be usehl in predicting the value of CorrRate
when it is so wholly unrelated to the conditions under which CorrRote was obtained.
The variable Sulji appeared to provide no new information, and was considered
not io be useful. However, it is believed that this conclusion is not one that would
necessarily apply to future studies. The sulfide content of a soil is generally considered
an important influence on corrosivity. in fact, sulfides (s*) are very corrosive and can
cause severe damage to metal surface. The fact that the variable SuIf7 does not play an
important role in this analysis may be a result of errors in the testing procedure. Another
reason why the effect of sulfides is minimized may be due to the fact that sulfides are
only measured quantitatively. Perhaps sulfide content should be measured with more
precision, as in the case of chlonde content. This sirnply means that it may be
insuficient to qualiQ a soi1 as containing either no sulfides (N), trace amounts of sulfide
(T), or a lot of sulfides (P). in the case of chlonde content, it was observed in Section
4.1.2 that the variable characteristics, pnor to the logarithmic transformation, were simply
unacceptable. The variable Chloride did not exhibit normality, and could not be used in
regression analyses. Like the chloride content, sulfide content is a concentration, and it
may be necessary to perform a similar transformation on this variable before it can be
used. This will be discussed further in a later section.
4.1.7 Variables Retained For Further Analvses
Up to this point, al1 of the discrete variables were included in the analyses.
However, it has become clear that some variables perform better than others. From this
point on, only the variables which are considered useful in predicting the dependent
variable CorrRofe will be retained. This step is one of "cleaning-up". Many different
variables were measured during the experimental portion of this project, but the reason
they were measured should be remembered. Not al1 of the variables can provide
information on the phenomenon measured by the variable CorrRate.
The first step is to choose between the pH variables. Which one of the two
represents most accurately the conditions under which the variable CorrRate was
obtained? PHdir was obtained by testing the soil as it was received in the laboratory,
whereas pHsat was measured when the soil was supersaturated. It is felt that the method
used to obtainpHsat is unacceptable. When the soil is mixed with water at a ratio of 1:1,
the soil is greatly beyond saturation. However, the soil tested for CorrRafe is just barely
saturated. in most cases, the soil received in the laboratory is already moist, and only a
small amount of water is added prior to testing for CorrRate. For this reason, pHdir is
considered to be the most representative variable of the two.
in the case of Reddir vs. Redsat, it is felt that Redsaf is not a good measure of the
oxidation-reduction potential of the soi1 because of the large quantity of water added to
the soil pior to testing. The water added contains a certain amount of oxygen which will
influence the readiig. For this reason, the variable Reddir is considered the most
representative variable of the two.
in the case of LResdir and LRessat, it was decided to include both variables in
future analyses. Although LRessat is the variable which represents the condition of
CorrRate testing most accurately, the variable LResdir appears to introduce information
that LRessat does not. It is suspected that LResdir represents some inherent property of
the soil which inîiuences its resistivity. This suspicion arises fiom the fact that the
majority of the soils obtained in the laboratory are already moist, i.a. they have a very
similar moisture content. However, they do not have a moisture content which optimizes
their conductive properîies until they are saturated, Le. until the movement of ions is
optimized. It may be possible that LResdir measures some conductive property of the soil
which is not the result of the movement of the ions in the soil. It is further suspected that
the property represented by LResdir may be the soil type, or the soil content. Perhaps
certain soil particles are inherently more conductive than others and that LResdir
measures this phenomenon. It is for this reason that both LRessat and LResdir are
retained for further analyses.
In summary, the following variables are retained for further analyses: CorrRare,
LChl , pHdir, LResdir, LRessat, Reddir, and Soiltype.
4.2 Consideration of Chlorides in Predicting the Corrosion Rate
4.2.1 Determinine Sienificance
Now that the general behavior of each variables is understood, it is time to
determine the significance of the chosen variables. The term signiycance refers to the
statistical significance of a variable. A variable is considered significant if the
information provided by this variable is sufficiently important, such that its addition to a
set of other variables increases the ability of the set to explain the phenomenon under
consideration. Determinimg significance consists of studying the results of a series of
ANOVA tables and detemining, at each step, whether the variable added is significant.
An ANOVA table pemits rapid calculation of the F-ratio and the correlation coefficient,
R~ (see Section D.4).
The variables on which attention is focused are the following: pHdir, LChI,
LRessat, LResdir and Soilrype. The goal of the analysis is to answer the following hvo
questions: '1s the variable LChl necessary when LRessat is already included in the
model?', and 'Does the variable LResdir provide the same information as Soiltype and, if
so, should it be included in a model containing the variable Soiliype?'.
The following information is entered in the ANOVA table, and is required to
determine significance of a model R:
a A benchmark model, a, to which rnodel R is compared,
a The error sum of squares, SSE, of each of the two models,
a The degrees of fieedom, DOF, of each of the two models, and
a The cntical F-ratio with which the calculated F-ratio is compared.
The SSE and the DOF of the possible models are listed in Table 4.10. The critical
F-ratio is obtained from Tables D.l . For the sake of simplicity, the cntical F-ratio will
Variable in Mode1
Intercept
Intercept + Soiltype Intercept + pHdir Intercept + LChl
Intercept + Soilîype + pHdir Intercept + Soiltype + LChl
Intercept + Soiltype + LRessat Intercept + Soiltype + Reddir Intercept + Soiltype + LResdir Intercept + pHdir + LResdii
Intercept + Soilspe + pHdu + LChl Intercept + Soiltype + pHdu + LRcssat Intercept + Soiltype + pHdir + LRedi Intercept + Soiltype + pHdir + Reddu
Intercept + Soilspe + pHdi + LChl + Lressat
DOF SSE
Table 4.10 Possible Modelr with Corrnponding SSE and DOF Values
161
be taken as 4.00 when we are comparing two models with a difference of 1 DOF, and
3.15 when comparing two models with a difference of 2 DOF's. These values correspond
to an a value of .O5 and a DOF of 60 for the R-model. This is a conservative choice.
There are an infmite number of ANOVA tables which can be constructed &om
these variables, but only the tables considered relevant are presented here. n i e model
tested (R) and the model against which it was tested (o) are presented along with the F-
ratio and the result of the significance test, i.e. whether it is significant or not.
The first step of the analysis is to determine if the variable Soiltype is significant.
This variable is selected first because of the need to determine if the variable LResdir
represents some aspect of Soiltype. The fdlowing ANOVA table results:
The resuits indicate that the variable Soiltype is indeed significant. The next step is to
determine which variables cm be added to Soiltype significantly. Each variable is tested
in tm, and the results indicate that only pHdir, LChl and LRessat can be added, with
variable pHdiï pirforming best.
The next step consists of determinimg which variables can be added to Soiltype
and pHdir significantly. Each of the remaining variables was tested in tum, and the
SOURCE DOF SSE MS F R~
results indicate that both LChl and LRessat can be added. As suspected, LResdir cannot
be added to pHdir and Soihype with significance. It would be interesting to see whether
or not LResdir could have been added topHdir if Soiltype were not already in the model.
The following ANOVA table results:
Difference ----- -------------- ---- interce*+ Soilwe(S2) --
intercept(o)
--- 2 0.00752 0.00376 4.70 0.12 68 70
0.05433 0.06185
0.00079
The results indicate that LResdir could have been added to pHdir if Soiliype were not
already in the model. It appears that these two variables contribute similar information.
Perhaps LResdir measures some property of the soi1 independent of conductivity related
to the ion content. LResdir was determined by testing a soil which was not saturated with
water, but was only moist. It may be possible that, in this state, resistivity measures the
conductivity of a soi1 due to particle charge, conductivity of certain types of soil particles,
or even the air content. Perhaps these soil properties are taken into account by the
variable Soiltype and, as such, one variable is unnecessaty when the other is present in
the model.
As stated earlier, both LRessat and LChl can be aaded to pHdir and Soilfype
sigiificantly, witn LChl performing better than LRessat. The question that m u t now be
answered is whether or not LChl is necessary when LRessai is already in the model, and
vice versa. The corresponding ANOVA table, in which the model pHdir+Soiltype+LChl
+LRessat is compared to the model pHdir+Soiliype+LChl follows:
Difference l 0.00000 0.00000 0.00 0.00 intercept + Soiltype + pHdir +
LChl + LRessat R - 0.04032 0.00062 intercept + Soiltype + pHdir +
LChl (o) 0.04032
It is very clear that the variable LRessar need not be added lo the model containhg
LChl . 1s the addition of LChl to a model containimg Lressat necessary? The following
ANOVA table illustrates the situation:
The results indicate that LChl need not be added to the model when LRessat is already
included. Although either one of the two variables can be added to the pHdir+Soiltype
model, when one of the two variables is included in the model, the other is not needed.
The better variable is LChl , but it is also a variable which is much more difhult to
obtain. As it was described in Chapter 2, obtainiig the chloride concentration is more
time and effort consuming. It was one of the objectives of this thesis to detennine
whether or not the determination of chloride concentration is essential to estimate the
corrosivity of a soil. If so, a suggestion would be made to incorporate the chloride ion
concentration into the existing grids to estimate corrosivity, i.e. PACE and AWWA.
It was initially suspected that the concentration of conductive chloride ions was
already incorporated in the resistivity measurement, but the effect of chloride ions is so
important that m e r research is warranted. In fact, is strongly believed that the effect of
chloride ions is very important and that, even though the presenr results show that
chloride ion concentration need not be added to the existing grids, the measurement of
chloride concentration provides invaluable information to the potential corrosion
problem.
in conclusion, the model which appears to provide the most information with the
least amount of redundancy is the following: Soiltype + pHdir + LRessar. For the
moment, it is suggested that the variable LChl need not be added to the existing
corrosivity grids. However, it must be remembered that the variable LChl performed
bener than the variable LRessat and that the only reason that the suggestion of replacing
LRessar with LChl was not made was because the chloride content is more difficult and
time consuming to obtain. If a soi1 testing laboratory is equipped to test for the chloride
ion content, then it is recommended that they do so. The information provided by this
parameter can be invaluable in certain cases.
4.2.2 The Effect of Removintr Outliers
Outliers can have a very large influence on the results obtained using regression
analysis. For this reason, outliers must be identified and studied carefully. It is incorrect
to simply eliminate an outlier from a data set simply because its behavior is different from
the rest of the data. in fact, in a data set made up of 150 observations, the presence of
one or two deviant observations is not unusual. A popular way of dealing with outliers is
to perform two separate studies, one includiig the outliers and the other excludiig them.
The results are then compared and the f i a l conclusions are drawn.
in this case, the outliers have been identified as observations # 72 and # 149.
These observations have unusually high corrosion rates which the variables studied were
unable to explain fully. This does not mean that they must be eliminated from the set.
On the contrary, these two soils should be studied further because they may provide
information unlike al1 the other soils. However, for the sake of completeness, the analysis
was repeated on the data set without the two outliers.
The conclusions drawn from the results obtained are very similar to those
obtained from the complete data set. Certain differences did appear and they warrant
some attention. These difference are:
a The variable Soiltype does not appear to be very significant, and is not included in the
f ia1 model.
a in general, the variable LRessat performs more poorly and, as a consequence,
a The variable LChl appears to be even more important than before.
The conclusion that would have been drawn fiom the above data set would be that
the variable LRessar is not usefu! in determining corrosivity, and that only LChl and
pHdir can predict the corrosivity of a soil. These results are difficult to accept. How can
the variable LRessat be useless? It is well known that the resistivity of a soil is a key
indicator to its corrosivity. Then why is this parameter badly represented in this data set?
Furthermore, can the two deviant observations be eliminated without M e r study? It is
felt that, in this preliminary analysis, the outlien should not be ignored. It is also
suggested that these two soils be studied M e r to determine what variable(s), which
have not been studied up to this point, are responsible for the corrosive nature of the soils.
Finally, it has already been determined that the variable LChl performs better than
LRessar. The goal of this report was not to convince the industry to begin testing for
chloride content. Any legitimate corrosion testing Company is already aware of the
importance of chloride ions in the corrosion process, and is most probably already testing
for this parameter. The goal of this report was to determine whether or not the
suggestion should be put fonvard to add this parameters to the existing grids. The results
obtained up to this point do not suggest this conclusively.
4.3 Power Analysis
The power of a statistical test is the probability of finding a variable significant
when it is in fact so. in cases when a variable is not significant, it is important to
determine the power of the statistical test. A low power may be the reason why a variable
did not prove to be significant, and consequently, the analyst rnay choose to disregard the
results obtained. in this case, the variable LChl proved not to be significant when the
variable LRessat was already in the model. The power will therefore be checked to
ensure that the probability of finding the variable significant is adequate. A power of .70
is generally considered acceptable.
When the variable LChl was added to the model consisting of pHdir, LRessar and
the two soil type variables, the value of the power parameters were as follows:
K1=4
ks= 1
K = 5
The value of N, the number of observations, is taken conservatively as 70 and the value
of L is determined to be 9.6 (see Equation D.13). The power is obtained by interpolating
between values obtained fiom Table D.8. The power of this statistical test is 0.87, which
means that there is an 87% chance of hd ing LChl significant, if it is so. ïhis result
indicates that the power of the statistical tests is not responsible for fmding LChl
insignificant when LRessat is already in the model.
CHAPTER 5:
CONCLUSIONS AND RECOMMENDATIONS
In total, 153 soils were tested for the following: pH, oxidation-reduction
potential, sulfide content, resistivity, soil type, drainage ability, moisture content, and
chloride ion content. Of these, 75 soils were tested uçing the method of linear
polarization, an accelerated electrochemical test used to evaluate the corrosion rate of
ductile iron embedded in soil. ïhis testing method proved to be a powefil tool in the
evaluation of soi1 corrosivity, and the applications in this field appear endless (see Section
5.2 for more details on possible future work).
However, certain limitations of linear polarization testing must be remembered.
The corrosion rate obtained using this method is the corrosion rate of the soil as it is
found during testing. Any future changes to the soil, or the presence of any extemal
influences affecting the corrosion rate, cannot be accounted for by the method of linear
polarization. For example, the following possibilities cannot be accounted for:
The presence of stray current corrosion,
0 Galvanic attack of a ductile iron pipe when connected to copper service laterals,
The future migration of chlofide ions from the surface to the depth of the embedded
metal, and
The potential establishment and proliferation of sulfate-reducing bacteria.
In sum, only the corrosivity ofthe soi1 itself is measured, and as such, this parameter must
be considered one part of a complete study in the determination of the corrosion potential.
5.1 Summary of Results
Each soi1 was tested accordiig to the AWWA Cl05 and PACE 82-3 Standards,
and additionally, the chloride ion content and the corrosion rate were obtained using the
linear polarization test. This data was analyzed using the Statistical Analysis System
(SAS) and the following results were obtained:
The variable which plays the most important role in the prediction of the corrosion
rate is the pH of the soil. Furthemore, of the two pH testing procedures, saturated vs.
unaltered, the method in which the soil is tested in its unaltered state proved to be the
bener predictor of the corrosion rate.
a The chloride ion content proved to be an excellent predictor of the corrosion rate.
Second only to pH, this variable was highly correlated with the corrosion rate, and
appeared in al1 the best predictor models. The information provided by the chlonde
ion content and the resistivity of the soil when saturated overlapped, and as such,
when one variable is included in a predictor model, the other variable is insignificant.
Although chloride ion content outperfomed soil resistivity, the additional information
provided by this variable was not significant enough to suggest that it be added to a
model containimg soil resistivity. Furthemore, the power of the significance test was
examined and was shown to be acceptable.
a Soil resistivity was retained instead of the chlonde ion content, because this variable
is currently being used in the industry standards, and it can be determined rapidly and
easily. Conversely, chlonde ion testing is t h e consuming and requires an
experienced technician. Furthemore, soil resistivity can be measured in-situ, whereas
chlonde ion content can only be measured under laboratory conditions. For these
reasons of practicality, it is suggested that chloride ion content not replace resistivity
in the existing standards.
a The important role of chloride ion content in the predictions of the corrosion rate bas
been established. It is therefore strongly recommended that this variable be tested
whenever possible. Although not included in the industry standards, the information
provided by this variable will provide a bener understanding of the soil and its
corrosive properties.
a Soil resistivity proved to be a good predictor of the corrosion rate, although not as
good as expected. Of the two testing procedures, unaltered vs. saturated, the
resistivity measured when the soi1 was saturated with distilled water represented the
bener of the two in predicting the corrosion rate.
The variable representing the resistivity of the unaltered soil, i.e. measured as
received in the laboratory, proved to be an insignificant predictor of the corrosion rate
when the soil s p e was included in the predictor model. It appears that when the
resistivity of a moist soil is measured, the result may indirectly represent some
inherent resistivity which depends on the soil type, e.g. the Lherent conductivity of
clay particles vs. that of sand particles.
The variable which proved to be the least important in the prediction of the corrosion
rate is the oxidation-reduction potential of the soil. This is a very surpriskg result,
given the importance of this parameter in the corrosion process. It is strongly
suspected that the method by which the soil is handled, the time the soil is exposed to
air before testing, and the addition of distilled water, al1 served to alter the potential of
the soil, and as a consequence, only a small correlation between the oxidation-
reduction potential and the corrosion rate was observed.
All statistical analysis steps were performed on two data sets: one containing al1
observations, and another from which outliers were excluded. Although the
numerical results varied, the conclusions drawn were essentially the same.
5.2 Recommendations for Future Wnrk
0 First and foremost, a new set of soil samples should be created in the laboratory. Soil
samples should have a predetermined pH, chloride content, sulfide content, oxidation-
reduction potential, resistivity, and clay content. This will enable the researcher to
study the effect of varying one parameter at a time, which is not possible in a set of
randomly selected soi1 samples.
0 The accuracy of the method of linear polarization can be studied further. Soil
samples obtained from sites of pipe failures can be tested using this method, and the
results can be compared to the actual record of 'lears to break" or "breaks per year".
This requires a well organized, long term study in which a minimum of 100 soi1
samples must be analyzed and the background information related to the cortoded
pipe m u t be gathered. Furthermore, a thorough knowledge of the other influencing
phenomena will enable the researcher to identify cases of stray current corrosion.
galvanic attack, and sulfide attack, which may cause the pipe to fail prematurely, and
which are not measurable using the method of linear polarization.
a The tests used to determine the sulfide content should be investigated further.
Inconsistencies in the results obtained fiom the two tests suggest that the tests may be
improved for future use of the AWWA and PACE standards. It is also strongly
suggested that the sulfide ion content be determined more accurately in future
laboratory experiments, and that the results obtained using the tests described in this
report may not be suficient to represent the true effect of sulfide ions in the corrosion
process.
a Study the various methods currently being used to determine the chloride ion content
(e.g. reading potentials, and titration). Determine which method is most accurate,
which is least subjective, and which is the least subject to human error. This project
is a very important one because it will create a proper base for further research in the
area of chloride content. Furthermore, in the testing procedure outlined in this report,
only the finest particles of soi1 were retained for the chloride ion test. The effect of
retaining only the fmest particles, as opposed to using a more representative
specimen, should be studied further.
a Based on the method of linear polarization, develop an in-situ test for corrosion rate.
a Repeat the study undertaken in this report with the following changes:
a Sulfide test is replaced by the actual sulfide ion concentration,
Soils are tested for pH, oxidation-reduction potential and resistivity
immediately after, or before, Iinear polarization testing, in order to ensure that
al1 test are performed under the same conditions,
Use the chlonde ion test which yields the most reproducible results, and which
is subject to the least human error,
Upgrade the soi1 type classification to include more types of soils, e.g. organic,
and silt, and
Include the various types of metals in the study.
a Consider incorporating temperature and moisture content in the existing grids to
account for seasonal variations in these factors. Using the method of linear
polarization, laboratoiy testing can be done to determine the effect of temperature and
moisture content on the corrosion rate, and the knowledge of insitu conditions will
enable the engineer to determine the potential nsk for corrosion with more accuracy.
Using the method of linear polarization, study the variation of the corrosion rate NI
rime. Time-dependent phenomena such as the development and subsequent
proliferation of a sulfate-reducing bacteria colony, can be studied by creating the
proper environment, and testing for the corrosion rate at given intervals.
BIBLIOGRAPHY
Parker, M.E., "Corrosion by Soils", NACE Basic Corrosion Course, National
Association of Corrosion Engineers, Houston, Texas, 1969, p. 6-1.
Uhlig, H.H, and Revie, R.W., Corrosion and Corrosion Conbol, John Wiley B;
Sons, New York, 1985.
Funahashi, M., and Young, W.T., "Investigation of E-LOG I Tests and Cathodically
Polarized Steel in Concrete", Proceedings of NACE Conference CORROSION-94,
paper no. 301, National Association of Corrosion Engineen, Houston, Texas, 1994,
p. 30111.
Sehgal, A.D., Kho, Y.T., Osseo-Aszre, K., and Pickering, H.W., "Reproducibility
of Polarization Resistance Measurements in Steel-in-Concrete Systems",
Corrosion, Vol. 48, No. 9, September 1992, p. 706.
Feliu, S., Gonzalez, J.A., Andrade, C., and Feliu, V., "Polarization Resistance
Measurements in Large Concrete Specimens: Mathematical Solution for a
Unidirectional Current Distribution", Materials and Structures, Vol. 22, 1989, p.
199.
Macdonald, D.D., Urquidi-Macdonald, M., Rocha-Filho, R.C., and El-Tantawy, Y.,
"Determination of the Polarization Resistance of Rebar in Reinforced Concrete",
Corrosion, Vol. 47, No. 5, May 1991, p. 330.
Lavrenko, V.A., a7d Shvets, V.A., "Determination of the Corrosion Activity of Soi1
in Relation to Steel by the Polarization Resistance Method", Institute of Problems
of Material Science, Academy of Sciences of the Ukraine, Kiev, Translated fiom
Fiziko-Khimichesknya Mekhanika Materialov, No.3, May-June 1992, p. 108.
Rogers, W.F., "Statistical Predictions of Corrosion Failures", Proceedings of
NACE Conference CORROSION-89 (New Orleans), paper no. 596, National
Association of Corrosion Engineers, Houston, Texas, 1989, p. 59611.
Fontana, M.G., and Greene, N.D., Corrosion Engineering, McCaw-Hill, New
York, 1967.
[IO] Wakelin, R.G., and Gummow, R.A., "The Effect of Copper on the Corrosion of
bon Watennains", Proceedings of NACE Conference CORROSION-90 (Las
Vegas), paper no. 383, National Association of Corrosion Engineers, Houston,
Texas, 1990, p. 38311.
[ I l ] ASïM, Standard Guide for Examination and Evaluation of Pitting Corrosion,
ASïM Specification G 46-94.
[12] Sears, E.C., "Cornparison of the Soi1 Corrosion Resistance of Ductile bon Pipe and
Gray Cast iron Pipe", Materials Protection, Vol. 7, No. 10, October 1968, p.33.
[13] De Rosa, P.J., and Parkinson, R.W., "Corrosion of Ductile iron Pipe", Water
Research Center External Report TR 241, United Kingdom, October 1986.
[14] Segal, B.G., Chemistry Experin~eni and Theory, John Wiley & Sons, New York,
1985.
[15] Zumdahl, S.S., Chemistry, D.C. Heath and Company, Massachesetts, 1989.
[16] Ailor, W.H., Handbook on Corrosion Testing and Evaluation, John Wiley & Sons,
New York, 1971.
[17] Oldham, K.B., and Mansfeld, F., "On the So-Called Linear Polarization Method for
Measurement of Corrosion Rates", Corrosion, Vol. 27, No. 10, October 1971, p.
434.
[18] Mansfeld, F., and Oldham, K.B., "A Modification of the Stem-Geary Linear
Polarization Equation", Corrosion Science, Vol. 11, 1971, p. 787.
[19] Stem, M., "A Method for Determinhg Corrosion Rates From Linear Polarization
Data", Corrosion, Vol. 14, 1958, p. 440.
[20] Townley, D.W., "Determination of Maximum Scan Rate for Linear Polanzation
Measurements", Corrosion, Vol. 47,No. 10, October 1991, p. 737.
[21] Oldham, K.B., and Mansfeld, F., "Corrosion Rates from Polarization Curves: A
New Method", Corrosion Science, Vol. 13, 1973, p. 813.
[22] ASTM, Standard Practice for Calculation of Corrosion Rates and Related
Information fiom Electrochemical Measurements, ASTM Specification G 102-89.
[23] ASTM, Standard Reference Test Method for Making Potentiostatic and
Potentiodynamic Anodic Polarization Measurements, ASTM Specification G 5-87.
[24] Fitzgerald III, J.H., "Evaluating Soil Corrosivity - Then and Now", Proceedings of
NACE Conference CORROSION-93 (New Orleans), paper no. 4, National
Association of Corrosion Engineers, Houston, Texas, 1993, p. 411.
[25] Stroud, T.F., "Corrosion Control Measures for Ductile Iron Pipe", Proceedings of
NACE Conference CORROSION-93 (Las Vegas), paper no. 585, National
Association of Corrosion Engineers, Houston, Texas, 1993, p. 58511.
(261 Stevens, J., Applied Muhivariate Statistics for the Social Sciences, Lawrence
Erlbaum Associates, Mahwah, New Jersey, 1996.
[27] Draper, N.R., and Smith, H., Applied Regression Anaiysis, John Wiley & Sons,
New York, 1966.
[28] Montgomery, D.C., and Peck, E.A., Introduction to Linear Regression Ana!vsis,
John Wiley & Sons, New York, 1982.
[29] SAS Institute, SAS / STAT User's Guide, Volume 1, Version 6, SAS Institure bc.,
North Carolina, 1990.
[30] SAS Institute, SAS / STAT User's Guide, Volume 2, Version 6, SAS Institue Inc.,
North Carolina, 1990.
[31] Coben, J., "A Power Primer", Psychological Bulletin, Vol. 112, No. 1, 1992, p.
155.
APPENDM A:
DERIVATION OF POTENTIAL EQUATIONS
A.1 Equation for +N,.
The reduction of Zn is represented by the following equation:
The Nerst potential is obtained by substituting the appropriate values into the
following equation:
= +$ + 2.303 RTInF * log [a&[a,d] ( A 4
For the reduction of Zn, the following values are substituted into Equation A.2:
R = 8.314 Jldeg mole,
0 T = 298 Kelvin,
0 n = 2 electrons transferred,
F = 96500 C Ieq,
r [h,] = [zn2'] , and
[ared] = [Zn(s)] = 1, because the concentration of a solid is equal to 1.
Once the above values are substituted, Equation A.2 becomes:
hZn = hZnO + 0.059212 * log [zn2+]
A.2 Equation for 4~h.c.
The reduction of Cu is represented by the following equation:
The Nerst potential is obtained by substituting the appropnate values into the
following equation:
+N = c" + 2.303 RTInF * log [a.,,]I[a,d] (A.2)
For the reduction of Zn, the following values are substituted into Equation A.2:
R = 8.3 14 Jldeg mole,
T = 298 Kelvin, - n = 2 electrons transferred,
r F = 96500 C Ieq,
[h,] = [cu2'] , and
[a,,d] = [Cu(s)] = 1, because the concentration of a solid is equal to 1.
Once the above values are substituted, Equation A.2 becomes:
hcu = hcuO + 0.059212 * log [CU*']
A.3 Equation for eNB+
The reduction of H+ is represented by the following equation:
The Nerst potential is obtained by substituting the appropnate values into the
following equation:
= +d + 2.303 RTInF * log [a.,J/[a,,d] 64.2)
For the reduction of H', the following values are substituted into Equation A.2:
O n = 2 electrons transferred for each Hl released,
[a.,,] = [ p l 2 , and
[a,d] = partial pressure of Hz(g) = 1 atm, because the reduced species is a gas under
normal pressure.
Once the above values are substituted, Equation A.2 becomes:
h = ~ H + O + 2.303 RT/2F * log [ P l 2 / 1
Furthemore, given the following two facts:
a reduction of H+ is taken as the baseline potential, i.e. h + O = 0, and
a pH = - log [H+],
the fmal equation becomes:
APPENûIX B:
TESTING FOR CHLOFUDE ION CONCENTRATION
B.1 Creating a Concentration vs. Potential C u w e
The first step in creating a concentration vs. potential curve is to record the
potentials of five calibrating solutions of known concentration: 0.01%, 0.03%, 0.33%,
0.65%, and 1.3%. This is done a minimum of two times, and the average potential for
each calibrating solution is calculated. The standard deviation of the concentrations
obtained for each calibrating solution are then compared to the maximum values
permitted: 1.5 for 0.01%, and 1.0 for the remainimg solutions. An example of the
potentials obtained during a calibration exercise are presented in Figure B. 1.
For each calibration solution, the average potential is plotted versus the known
chlonde ion concentration, and a curve, such as the one presented in Figure B.2, is
obtained. 7 . e curve which fits the five data points best is an exponential one. The
equation of this curve is presented in the upper nght hand corner of Figure B.2.
The chloride ion concentrations of the soil samples tested are obtained from the
curve or from the equation in Figure B.2. The measured potential of the soil sample is
located on the curve and the corresponding concentration is obtained either fiom the
equation, or from the cuve itself. For example, for a potential of - 40 mV, the
following concentrations are obtained:
a From the curve, the concentration is approximately equal to 0.82 %
0 From Equation B.2, the concentration is equal to 0.1622 e = 0.816 %
The choice of which method to use depends on the degree of precision required.
LECïRODE: JACQUES-CARTIER
ATE: MAY 26,1995
AhWLES: JK-01 ïû JK-82
Trial 1 Trial 2 Trial 3 Trial 4 Trial 5 Trial 6 Trial 7 Trial 8 Tria1 9 Trial 10
'ALUES TO BE PLO'ITED: Potential Chloride (mv) Concentration
(%) -51.3 1.3 -34.8 0.65 -16.9 0.33 40.2 0.03 70.1 0.01
Figure B.1 Poteotisls Obtained for Calibrsting Solutions: Series 1
SERIES 1: Chloride Concentration vs Potential
Chbilde Conunmfhn (74
9 4 ;
Figure B.2 Calibration CUNC for Stries 1
APPENDiX C:
TRIALS FOR REPRODUCIBILITY
Trial nins were performed on the various soil samples in order to establish a complete
procedure for testing subsequent soil samples. Soi1 sample No. 123 is presented for
analysis. The corrmion rate was obtained twice, through two independent sets of tests.
Each test is composed of two parts: the Tafel test from which the values of P. and P, are
obtained, and the Liear Polarization test from which the corrosion rate is obtained.
The results indicate that the corrosion rate of a metal sample placed in soi1 No.
123 is equal to 0.1 19 mm/yr. This result was obtained fiom both Trial No.1 and Trial
No.2. The results of each test are presented in Figures B.3 through B.6. As it can be
seen, the results obtained fiom the hvo trial runs are very sirnilar, indicating that the
procedure followed yields reproducible results.
Tafel Cuve 'jkl23tLdia' 27n11995-12:59:34
Figure 8.3 Trial No. 1 : Tafel Results
Figure 8.4 Trial No.1 : Llnear Polarkation Raulb
Figure B.5 Trial No. 2 Tafel Results
Figure B.6 Trial No. 2 : Linear Polarhtion Rnults
APPENDlX D: PRINCIPLES OF
REGRESSION ANALYSIS
This chapter presents the techniques used in a n a l m g the data presented in
Chapter 3: Procedures and Apparatus, and includes the follouing:
Data exploration
Simple Iinear regression analysis
Data transformation
Multiple variable regression
Categorical data
Outliers
Variable Selection
Mode1 Validation
Power
The SAS Statistical Package
D.1 Data Exploration
Prior to begiming statistical analysis of the data using sophisticated computer
packages and advanced statistical techniques, it is vely important to be familiar with the
data set. Each variable should examined individually, and the following quantities
should be obtained for each 126291 :
a the usual descriptive statistics: number of data points, mean, standard deviation,
variance, skewness, etc.
quantiles including the median,
a stem-and-leaf diagram and box-and whisker plot, and
9 the nomal probability plot
Quantities such as the number of data points, the mean, and the standard deviation
of a variable are easily calculated; they provide considerable information about the data.
They are also the values that will be needed in subsequent calculations, e.g. the number of
data points, N, is a quantity that plays an important part of almost every statistical
calculation: it is used to determine significance in the ANOVA table. It is necessary to
calculate the power of a statistical test, and plays a key role in selecting the number of
variables that will make up an equation.
The quantiles obtained (median, upper and lower hinges, etc.) provide
considerable information about the range of values of a given variable. Are the values al1
within a limited range, or are they spread out? Are there any values that are remarliably
different form the general trend? Quantiles are also used to calculate the range outside
which a value is considered an outlier. The identification of outliers is extremely
important in statistical analysis.
Stem-and-leaf diagrams, such as the those presented in Figure D.l, are
constmcted with the values of a given variable, and can help sumrnanze the distribution
of the data in a visual way, which is usually more easy to understand [26291. Furthermore,
it makes the calculation of the quantiles, quantile ranges and outliers quick and easy.
Stem Leaf e 16 6 1 14 12 61 2 10 36 2 8 679 3 6 01148 5 4 O 1 2 0033683 7 O 2345 4
-0 75 2 -2 8497 4 -4 87 2 -6 900866 6 -8 30 2
-10 O 1 -12 80 2 -14 5 1 -16 S 1
----+----+----+----+
nnltiply Stem.Leaf by IO**-1
Figure D.l Stem and LeaiDiogram
It is almost always a good idea to display numerical information graphically
where it is possible. The s!ern-and-leaf diagram is an excellent tool for the graphical
display of the distribution of the data, but it contains more information than its ofien
needed. The box and whisker plot aims to display the elementary information (median.
hinges and outliers) pphically 126291. A box and whisker plot is illustrated in Figure
D.2.
Stem Leaf S 16 6 1 14 12 61 2 IO 36 2 8 679 3 6 01148 5 4 O 1 2 0023683 7 O 2345 4 -0 75 2 -2 8497 4 -4 87 2 -6 900866 6 -6 30 2 -10 O 1 -12 80 2 -14 5 1 -16 5 1
----*----*----+----+
Uultiply Srem.Leaf by 10--1
Figure D.2 Stem and Leal Dingram and Boxplot
The normal probability plot is a graphical display that permits the analys! to
determine if the data values art. distributed nomally. The observaticns are arranged in an
increasing order of magnitude and then plotted against expected normal distribution
values. The plot should resemble a straight line if nonnality is tenable 126291. Figure D.3
shows a normal probability plot.
Normal Probability Plot
Figure D.3 Normal Probabiliîy Plot
D.2 Simple Linear Regression Analysis
In simple tenns, l iear regression analysis is the action of fining a straight l i e to
the data. Simple regression analysis involves a dependent variable, Y, and an
independent variable, X. For each value xi observed, there is a correspondingvalue of y,.
The goal is to denve an equation that will link the values of xi and y,, with the least
amount of error possible. This is better understood graphically. Figure D.4 shows a plot
of Y vs. X. It is our goal to fhd the line which runs through these points such that the
vertical distances between the line and the y values are miniiized, i.e. the values of e,
are minimized. The equation relating the values of each observation of y and x of the
following fom:
y , = ( P o + P ~ x , ) + e , (D.1)
where (p, + Pi x, ) is the portion of y, predicted by the straight line, and e, is the portion of
y, that the straight l i e fails to predict, which is the error or the residual. The term Po represents the intercept, i.e. the value of y at the point where the straight line meets the
Y-axis. The term pi represents the slope of the straight line.
How can one
determine if the line
represents a good esthate of
the relationship between X
and Y ? The answer lies in
the study of the residuals, ei.
The method most commonly
used to calculate the
magnitude of the error of the
equation, is to add the
squared values of each of the
Y
PO I 0
O x, X
individual error. This value Figure D. 4 Y vs. X Plot
is referred to as the Error S m of Squares. or SSE:
SSE = z e,2 P .2 )
It should be noted that when the errors are squared, the effect of the larger values are
emphasized. The result of this is that outliers, whose residuals are high, can have an
enormous effect on the value of SSE and, consequently, on the best fit line. It is,
therefore, important that outliers not be ignored, but studied closely. The outliers will be
discussed later.
For the case when the line is to be chosen such that the SSE is minimized, there
exist a closed form solution for the parameters P,and PI, given by 126291:
where x' = the mean of the x values,
y' = the mean of the y values,
N = the number of observation of x and y,
L(xy) = the sum of the product of x and y for the N observations,
Lx =the sum of the x values for the N observations,
Ly = the sum of the y values for the N observations, and
Lx2 = the sum of the x2 for the N observations.
Besides SSE, other quantities are computed to determine the measure of fit of a
line. The variance of the estimate. S2 , represents the average of the squared errors
{L~,?(N-2)}, and the standard error of the estimate, Syh, is simply the square root of the
variance [26291.
Up to this point, it has been assumed that the variable X has an infiuence on the
value of Y, and an equation has been chosen which includes the variable X. It is not
always the case that a variable X provides any information about the behavior of the
variable Y.
Figure D.5 illustrates such
a case. The slope of the line, PI, is very small or even
insignificant. In fact, the straight
line which best represents the
points seems to be the horizontal
line which runs through the mean
value of y, y'. It will not always
be so obvious that a variable X is
insignificant in predicting Y. So,
how can one determine if the Figure D. 5 Example of an Inaignilicant Predictor
variable X is significant? This is done by comparing the SSE of the mode1 including X,
to the SSE of the model bfised only on the mean value of y, . This last model is referred
to as the benchmark model.
The equation relating X and Y using the benchmark model based only on the
mean value of y, i.e. the horizontal line, is the following:
where Po is equal to y'. The value of the error sum of squares is referred 10, in the case of
this model only, as SSY.
A measure of fit that is very cornrnonly used is the squared multiple correlation,
R2. This value represents how well a model predicts Y in cornparison to the benchmark
model. R' is calculated as follows [26291:
R'= SSY-SSE P.6) SSY
The value of R2 is always positive, and ranges between O and 1. For example,
R2 = 0.10 means that taking X into consideration in predicting Y will result in a 10 %
decrease in the error sum of squares, i.e. a 10 % irnprovement in the prediction of Y.
However, considering that this improvement is compared to a model which is based on
nothimg but the mean, it does not appear to be such an important improvement. 1s the
improvement important enough to consider the variable X significant? This significance
is determined by examinimg another important parameter in statistics: the F-ratio.
Pnor to introducing the F-ratio, it should be mentioned that the benchmark model
used in this computation does not have to be based only on y'. A model being tested can
be compared to any model that is a submodel of itself. This simply means that the model
being tested is an extension of the benchmark model. For example, if it is required to
prove whether or not a variable X3 can be added significantly to a model which includes
the variables XI and X2, then the benchmark model is the one that contains XI and X2,
and the model to be tested is the one containimg al1 three. From this point onward, this
benchmark model will be termed the w-model, and the model beiig tested as the Q-
model.
Like R', the F-ratio is a measure of the improvement of one model over another,
however the F-ratio takes into account the number of variables that were added to obtain
this improvement. For example, it is certainly bener if an investment of $100 yielded a
retum of $10 000, rather than if this return were obtained firom an investment of $1000.
The investment made can be compared to the variables added to obtain a better
prediction. It is preferable that fewer variables be added and, as such, the significance of
a model m u t be determined by taking into account the number of variables added. This
is done by including the degrees offeedom in the equation. The degree of fieedom of a
model is equal to the number of observations, N, minus the number of parameters k ing
fined by the model. The value of the F-ratio is calculated as follows [26291:
The ideal situation is a large drop in the SSE accompanied with a small &op in
the degrees of freedom. This will result in a large F-ratio. F=l is considered the baseline
performance. If the ratio is close to 1, this signifies that almost no significant
improvement was made, i.e. there has been no return on the investment made. Table D.l
shows the critical values of F that m u t be obtained in order to consider the R-mode1
significant. The term df for Numerator refers to the &op in the degrees of 6eedom in
going 6om the w-mode1 to the R-model. The term df error refers to the degrees of
6eedom of the R-model. If the F-ratio calculated is larger than the appropriate critical
value, the R-mode1 is considered significant 126291.
A tool that is used to allow the analyst to quickly calculate the values of R', F,
and SylX, and check al1 the relevant information at a single glance is the analysis of
variance table, or ANOVA table.
Figure D.6 shows a typical ANOVA table. The degrees of 6eedom and SSE of
the w-model, and the R-mode1 are entered into the table. The degrees of 6eedom and
SSE of the Diff-model, i.e. difference model, are obtained by subtracting the entries of the
R-mode1 6om those of the o-model. The mean squares (MS) for each model are
obtained by dividing the SSE by the degrees of 6eedom, and the value of SylX can be
obtained by taking the square mot of the mean squares. The R' value is obtained by
dividing the SSE of the Diff-mode1 by the SSE of the w-model, and the F-ratio is obtained
by dividing the MS of the Diff-mode1 by the MS of the R-model. The ANOVA table
permits rapid calculation of the relevant parameters, and presents the information in an
organized format.
(1) DOF (Diff) = DOF(R) - DOF(w) ( 5 ) MS (w) = SSE (w) 1 DOF (a) (2) SSE (Diff) = SSE(R) - SSE(w) (6) F = MS (DIFF) 1 MS (R) (3) MS (DIFF) = SSE @IFF)IDOF @FF) (7) R' = SSE O F F ) 1 SSE (O) (4) MS ( 0 ) = SSE (n) I DOF (n )
Figure D.6 ANOVA Table
Table D.l Critiral Values for
194
Table D.l (cont'd) Critical Valucs for FI2']
Table D.l (cont'd) Critical Values for F '261
196
Table D.l (cont'd) Crincal Values for F 1261
D.3 Data Transformations
Linear regression is used to calculate the best-fit equation relating a set of x
variables to a set of y variables, Le. Po, PI, and e, are determined. Once this equation is
obtained, the next step is to determine if this model is significant. When a model is being
tested for significance, there are certain assumptions that m u t be checked in order to
ensure that the result obtained is credible. Ifthese assumptions do not hold true, then we
cannot depend on the results obtained from the test.
n i e following fhree assumptions m u t be checked [26291:
r The set of residuals for al1 x values are normally distributed, with a mean value of
zero and a standard deviation of a.
As it can be seen in Figure D.7, when al1 the individual residuals, e,, are ordered and
ploned, they must exhibit normality. A normal distnbution has a mean of zero, and a
standard distnbution of O, with only 20% of the residuals falling outside of O f 20.
A tool which is very helpful in determining normality is the normal distribution plot,
as discussed in Section D.1.
l
Figure D.7 Normal Distribution
r For each individual x value, the y values are normally disiribuied, with a mean value
of zero and a standard deviation of a.
As it can be seen in Figure D.8, the y values must be distributed normally for each
value of x. Figure D.9 shows an example of data failing to meet this criterion. Again,
the normal probability plot is a useful tool in determining normality ofthe y values.
Figure D.8 Normally Distributed Y values
--
Figure D.9 Y Values Not Distributed Normally
r The residuals are distributed independenrly from one another.
This staternent implies that the behavior of the residuals independent fiorn one
another, e.g. the reason that one point has a high residual has nothing to do with the
fact that another point value has a high residual. However, this is not ofien true.
Most commonly, there is another factor not yet accounted, which would link the two
phenomena.
The first two assumptions
usually go hand in hand. If one holds
tnie, usually the other will as well.
When the raw data is received, Y is
usually ploned versus X and the
characteristics of the resulting cuve
are studied. The ideal situation is that
the plot resembles the one presented
in Figure D.lOa. Ideally, the
relationship between X and Y is a
linear one. However, this is not often
the case. When a non-linear
(a)
Figure D.lOa ldeal Y vs. X Distribution
relationship exists, such as those presented in Figures D.lOb and D.lOc, linear regression
cannot be used. Forhmately, with an appropnate transformation most relationships can
be made linear.
The most commonly used transformation is the logarifhrn of the data, either of
the x variable, or the y variable, or both, if necessary. It is generally ageed that the data
that responds best to this transformation is the data representing physical magnitudes such
as weight, temperature, concentration, length, etc. Furthermore, the data must be non-
negative, with values which are not very close to zero.
Other transformations include the following 126291:
reciprocal, where xi becomes llxi: usually for physical measurements,
a square root, where xi becomes dx,: usually for fiequencies,
arcsine, where x, becomes arcsinedxi: usually for proportions, and
log odds, where xi becomes log{ x 1 (1-x) 1: usually for proportions, where no O or 1
values are present.
Any data can be manipulated to eventually appear linear, however, the key to a
simple, effective transformations is the knowledge of the phenomenon being studied.
Figure D.lOb,c Non-linear Relalionsbips Behveen X and Y
D.4 Multiple Variable Regression
Multiple variable regession involves one dependent variable, Y, and two or more
independent variables, XI, X2, etc. The equation relating Y to the X's is of the following
form:
YI= Po + PI XII + B2 x12 + ... + PL; X A + e, (D.8)
The value of the error sum of squares, SSE is calculated accordig to the following
equation:
SSE = Z (y, - Po - PI X,I - xa - ... - Pli x,k )2 0 . 9 )
The parameters of the equation are obtained easily by making use of certain basic
principles of matnx algebra. This lengthy calculations are reserved for s o h a r e packages
such as the SAS, which will be introduced in a later section.
The concepts presented in Section D.l on simple linear regession also apply to
multiple variable regression. Although more difficult to visualize, multiple regression
can be thought of as fitting a line through a set of points in a three dimensional space, or
one of a higher dimension. The residual can be thought of as the distance in the y-
direction between a point in space and the line.
As in simple regession, the SSE is a measure of the accumulated error of a model
predicting Y. In multiple regression, SSE is used to determine which combination of
variables best predicts Y. This can be best explained using an example:
n i e gas consumption (Y) of 45 automobiles ir studied. The independent variables
considered to best predict gas consumption are the weight of the automobile (W), and the
automobile length (L). The analyst m u t determine which of the two variable best
predicts the gas consumption, and whether or not both variables should be used together.
The analyst begins by obtainiig the equation, and the SSE, of al1 the possible
combinations of L, W, and the intercept O. The following results were obtained:
Table D. 2 Inlormation about Possible Models
The results indicate that the best one-variable model is the one consisting of only
the weight, because it has the smaller SSE. The best two-variable model is the one
consisting of weight + intercept. Finally, the best (and only) three-variable model is the
one consisting of al1 three variables. It is obvious that the lowest SSE is obtained for the
model in which al1 three variables are involved. This will alwqw be the case. However,
the analyst must decide whether or not adding a variable produces a decrease in the SSE
which is significani. The first step is to decide between the one-variable model, and the
two-variable model. The following ANOVA table shows al1 the relevant information:
The F-ratio obtained is larger than the critical F-ratio, therefore the addition of
the weight is significant and so therefore, the two-variable model is retained. The next
step is to compare the two-variable model with the three variable model to determine if
the variable L is significant. The following ANOVA table shows al1 of the relevant
information:
n i e F-ratio obtained is smaller than the critical F-ratio, therefore the addition of L
is not significant. This means that the best model to predict gas consumption is the one
containing only weight and the intercept.
It sliould be noted, that the one-variable model containimg only the intercept had
been used initially, and checked whether is was significant to add L to the model, the
answer would have been affirmative. Continuing the exercise to check whether adding W
to the two-variable model would be significant, it would have been noted that it would
not be so. The conclusion would have been that the model consisting of automobile
length and the intercept was the best model. An explanation to the significance of the
various models follows.
When two dependent variables, such as L and W, provide redundant information,
it seems easy to understand that only one of the two variables will be needed in the
model. One of the two might be bener than the other, as W is in this case, but in the
absence of this variable, the second one may provide almost as much information. This
concept is called multicollineari/y, and it can be bener understood with the aid of the
following Vem diagrams presented in Figures D.ll. Figure D.l l a shows the case of one
independent variable, X. If each of the two circles is of unit area, the shaded area
represents R*, or R ~ ~ ~ , i.e. the propoition of the variance in Y that can be explained by
the variable X. In the case when X is insignificant in predicting Y, the shaded area is
very small. Conversely, the higher the correlation between X and Y, the larger is the
shaded area.
Difference Intercept + W + L (R)
Intercept + W (a)
1 42 43
1 29 30
1 0.69 0.70
1.45 < F,
Figure D.1 lb shows the case when two independent variables are involved. ï h e
total proportion of the variance in Y that can be explained by the two variables. Xi and
X2, is equal to the total shaded area. This value is called the squared multiple correlation,
R ~ , , ~ . The proportion of the variance of Y accounted for by X2, with XI partialled out,
is indicated by the shaded area in Figure D.1 lc. This represents the extra information
that X2 provides when Xi is already in the equation. It is referred to as the squared partial
correlation of X2 with Y and with XI partialled out, R ~ ~ , , . It is easy to see that the less
XI and X2 overlap, the higher the usefulness of each of the two variable, and the larger is
the proportion of Y accounted on an overall basis [261.
Figure D.ll Venn Diagrams lor 1 and 2 independent variables'16'
A good tool for examinimg the extent of overlapping is to shidy the correlation
matrix of a set of variables, includiig the dependent and the independent variables. One
such matrix is presented in Table D.3. The ideal situation is to have high correlations
between the Y variable and each of the X variables, and to have low correlations between
the X variables themselves. This will most probably result in a large part of the variance
of Y being accounted, Le. a large growth in R' as each of the variables is added to the
model.
Table D.3 Correlation Matrix
D.5 Categorical Variables
The techniques studied so far take only discrete variables into account. Discrete
variables represent a specific quantity, e.g. a pH of 7.4 or a chioride content of 4763
ppm. Parameters such as the mean, standard deviation, and SSE can be calculated for
such variables, and a linear equation can be determined. But how can this be achieved for
categorical variables which represent a category instead of a specific quantity, e.g.
soiltype (sand, clay, or sand/clay), sulfide content (positive, trace, or negative) ? This is
obtained by "expanding" the categorical variable into an appropriate number dummy
variables. This can be best explained with an example.
The variable Soilgpe is a categorical variable with the following classes: sand,
clay, and sand/clay. In the present form, the variable cannot be studied in the same way
as the discrete variables. For this to be possible, categorical variables such as this one
must be "expanded" into a set of dummy variables. The number of dummy variables to
be created will depend on the number of classes. in this case three classes exist and the
analyst can choose to use either two or three variables. in the two variable case, the
dummy variables will assume the following values [261:
Table D.4 Dummy Variables for Soiltype: 2 Variable Case
In this case, the class 'clay' is considered the reference category against which the
behavior of 'sand' and 'sandklay' are assessed. In the three variable case, no reference
category exists, and the dummy variables will assume the following values:
Table D.5 Dummy Variables for Soiltype: 3 Variable Case
Sand
SandIClay
Clay
The difference between the discrete and categorical variables is the effect they
have on the final equation relating Y to the X's. For example, if the discrete variable
'pH' and the categorical variable 'soilSpe' are used to predict the variable 'CorrRate', the
following overall equation would result (for the mode1 with three dummy variables):
n i e variable pH has an effect on both Po and Pi , i.e. on the intercept and the dope
of the l i e . However, the dummy variables cm be viewed as having a direct effect on
only the intercept because they assume a value equal to O or 1. For example, for a sand
the equation would become:
CorrRate = (Po + Pr) + Pi pH 0.11)
1
O
O
O
1
O
O
O
1
In essence, the term p, representç the 'jump' in CorrRate resulting 6om the soi1
being a sand. Similarly, P,, and P, represent the jumps resulting 6om a sandlclay and
clay, respectively.
The technique of creating dummy variables for coding categoncal variables is
used to extend the use of multiple regression analysis to include variables that could not
be included othenvise. Other techniques are also available, e.g. interaction variables, and
non-linear combiations relating x and y. These methods did not prove to be useful in
this project, but could be beneficial in future research on the subject. References 29 and
30 should be consulted for M e r information on these techniques.
D.6 Outliers
Outliers are data points that split off, or are very different 6om the rest of the data.
They can occur because of two fundamental reasons: (1) a data recording or entry error
was made, or (2) the subjects are simply different 6om the rest. The first type of outlier
can be identified by always listing the data and checking to ensure that the data has been
entered accurately. The amount of time it takes to list and check the data for accuracy is
well worth the effort, and the computer time is minimal.
Statistical procedures in general can be quite sensitive to outliers. This is
particularly true for the regression techniques. It is very important to be able to identify
outliers and then decide how to consider them. This is quite important, because the
results of the statistical analysis m u t reflect most of the data, and not to be highly
influenced by just one or two errant points f261.
Outliers can have a very large effect on the correlation coefficients, R,. Figure
D.12 shows graphically how the inclusion of an outlier can drastically change the
interpretation of the relationship between X and Y. In case A, there is no relationship
without the outlier, but there is a strong relationship with the outlier. Convenely, in case
B the relationship changes 6om strong, without the outlier, to weak when the outlier is
included.
Figure D.12 Eîîert olOutliers on R' [261
Besides the graphical method, outliers can be detected by studying z scores. For
each variable being studied, the z score can be calculated as follows:
(D. 12)
where z,, = the z score of observation i for variable j,
x , = the recorded value of observation i for variable j,
pJ = the mean value of the observations of variable j, and
a, =the standard deviation of the observations of variable j.
If the variable is approximately nürmally distributed, then z scores with absolute
values near 3 should be considered as potential outliers. This is because, in a distribution
which is normal, about 99 % of the scores should lie withii three standard deviations of
the mean. Therefore, any z score value larger than 3 indicates a value very unlikely to
occur. Of course, if the number of observations is large (Say >100), then simply by
chance, it may be reasonable to expect a few subjects to have z scores of over three.
However, the above rule is generally considered reasonable lZ6'.
Up to this point, the measurement cf the outliers on the predictor variables, Xj,
have been considered. The Z scores can also be calculated for the residuals obtained
when a model is fitted to the data. These standardized residuals are used for fmding
observations whose predicted y values are quite different from their actual y value, i.e.
they do not fit the model well. As in the previous case, an observation whose
standardized residual is greater than three in absolute value is considered an outlier 1261.
Altematively, an outlier can be defmed as a point, which if deleted, can produce a
substantial change in at least one of the regression coefficients. That is, the prediction
equations with and without the point are quite different. A quantity that measures this
change is the Cook's distance (CD). Unlike the z scores which identi6 the outliers on Y
or on the X's individually, Cook's distance measures the combined effect of a point being
an outlier on Y and on the set of predictors. Cook and Weisberg (1982) indicate that a
CD, > 1 would generally be considered too large, and would therefore identify probable
outiiers [261.
Once the outliers are identified, a decision m u t be made on whether or not the
errant point should be eliminated fiom the set. ïhis action m u t not to be taken lightly,
and without serious consideration. if one fin& after further investigation of the outlying
points thzt an outlier was due to a recording or entry error, then of coune, the appropriate
correction should be implemented and the analysis m u t be repeated with the corrected
data. However, if the errant data is due to an instrumentation error, then it is legitimate to
drop the outlier. However, if none of these appear to be the case, then one should not
drop the outlier, but report two analyses (one including the outliers and the other
excluding it). Outliers should not necessarily be regarded as 'bad'. As a matter of fact, it
has been argued that outliers can provide some of the most interesting cases for further
research [261.
D.7 Variable Selection
The number and type of variables, which should be included in a model, needs to
be considered. Most ofthe methods of model selection are strongly based on the concept
of multicollineanty and semipartial correlations, which were introduced in Section D.4.
Prior to introducing the techniques for mode1 selection, it mus1 be emphasized that
the single most important tool in selecting a subset of variables for use in a model is the
knowledge of the area under study. Furihermore, it is important for the investigator to be
judicious in the selection of predictors. If too many variable are used, the prospects of
cross validation may be influenced negatively. The analyst can exercise hislher judgment
in the creation of new variables fiom the existing ones. if, for example, the analyst
knows that two different variables essentially measure the same thing, a new variable may
be created by averaging them, or by adding the z scores of the two. An alternative is the
removal of one of the variables fiom the set.
A quantity which measures the extent to which a variable provides redundant
information is the variable infIationfactor ( VIF). which is based on the calculation of
the correlation between the independent variables only Each independent variable is
regressed in tum against the remainimg X's, and the correlation is obtained. A high
correlation indicates that the remainiig X variables account for a large amount of the
variation in the variable under study. This means that the variable provides little
information that the remaining variables do not already provide, i.e. it provides redundant
information. It is suggested that a variable be removed if VIF > 10. Variables should be
eliminated one at a tirne, and the new VIF values should be calculated prior to the
removal of any subsequent variables [261.
The methods most commonly used to select a mode1 are the forward, backward
and stepwise selection procedures. Al1 these procedures involve examining the
contribution of a predictor with the effect of the other predictors partialled out, or held
constant. Through the use of semipartial correlations, as was obtained in the ANOVA
tables presented in Section D.4, the correlations among the predictors are disentangled
and the unique variance of each predictor related to the variance of y is determined.
The automobile example of Section D.4 is a good example of the fonvard
selection procedure. The first predictor that enters the equation is the one with the
highest simple correlation with y. If this predictor is significant, the predictor wiîh the
largest semipartial correlation with y is considered, etc. At some point, a given predictor
will not be significant and the procedure will be terminated. In the forward selection
procedure, once a variable enters the equation, it is not removed [261.
The stepwise procedure is basically a variation of the forward selection procedure.
However, at each stage ofthe procedure, a test is made for the least useful predictor. The
importance of each predictor is constantly reassessed, and a predictor that may have been
the best entry candidate earlier may now be superfluous, and is removed.
The backward selection procedure involves the removal of predictor fiom an
equation initially containing al1 the predictors. At each step, the partial F-ratio is
calculated for every predictor. The smallest value is compared to the critical F-ratio, and
the appropriate variable is removed. The new equation is computed, and the process
continues until al1 insignificant variables are removed.
The forward, stepwise and backward selection procedures do not necessarily
propose the sarne final model. In general, the stepwise procedure is considered the best
of the three methods because it verifies al1 of the variables at each step and removes the
one(s) that are redundant. A mistake that is commonly made by analysts is to consider
the final mode1 proposed by these methods as the besr model possible. This is not the
case. The model proposed may be just one of the many models which provide the best
prediction for Y. For this reason, these methods are limited in their use. The one
technique that appears to offer the analyst with the most choice in the model is the R-
square procedure. This technique does not propose a model, but simply lists the 10 best
combinations of one-variable, two-variable, three-variable models, etc., ranked according
to their overall RZ value. n i e analyst can then compare behveen the various
combinations with high R2 values, and is free to use hisher judgment is selecting the
most reasonable model.
It is generally agreed that the number of variables, k, to be included in an equation
depends on the number of observations, n, in the data set. The rule of thumb proposed is
to chose k such that n/k > 10 [261. Another criterion often used is Mallows' C,. This
measure was introduced by Mallow (1973) as a criterion for selecting a model. It
measures the total squared error, and chooses the model(s) where C, = p, where p = k+l.
For these models, the amount of underfitting andor overfitting is minimized, i.e. there are
neither too many nor too few predictors in the equation. Mallows' Cp is given in the
output file created whenever a SAS program is used to propose a model [16'.
It is suggested that al1 the of above methods be examined individually prior to
deciding on a final model. However. ultimately it is the knowledge of the researcher of
the phenomena under study that will ensure that the best and the most reasonable model
is selected.
D.8 Model Validation
It is cmcial for the researcher to obtain some measure of how well the regression
equation will predict on an independent sample of data, i.e. can the equation be
generalized? There are essentially three ways of validating a model: data splitting,
computing the adjusted R2, and the PRESS statistic.
Data splitting involves randomly splitting the available data into two parts
(roughly 113 and 213). The regression equation is denved using the so-called derivation
data (2/3), and then applied to the other set of data, the validation data. The predicted
values of y for the validation data are compared with the recorded y values, and the
correlation between the two sets is calculated. This correlation rrpresents how well the
equation works on an independent sample of data Iz6'.
The adjusted R' value measures the shrinkage in predictive power. Shrinkage
refers to the decrease in R~ as it is measured in the sample with the equation derived 6om
it, versus what it would be in the population as a whole using the same equation.
Certainly the equation will not predict as well. The adjusted R2 value estimates how well
a prediction equation derived from one sample, would work on the population sample, i.e.
the theoretical sample consisting of al1 possible data points. It does not indicate how well
the denved equation will predict for the other samples 6om the same population. The
adjusted R' value of the population is compared to the R2 value of the sample, and the
percentage of decrease is noted
in many cases, there is not enough data to permit random splitting. One can still
obtain a good measure of the predictive power by the use of the PRESS statistic
bredicted residual sum of squares). in this approach, the y value for each observation is
set aside and a predictive equation is denved with the remaining data. This is done for
each of the n observations, and as a result, n prediction equations are derived and n true
errors are determined. The PRESS statistic is simply the sum of the squares of these
errors. Unlike the SSE, the PRESS statistic is more representative of the true error
because the equation of the line was obtained without the observation under study, Le. the
line was not fined to this particular point prior to computing the error.
D.9 Power
Type 1 error, or the level of significance (a) is the probability of rejecting the null
hypothesis when it is me , i.e. fmding a variable to be significant, when in fact it is not 12&
311. The a level set by the experimenter is a subjective decision, but it is usually set at .O5
or .O1 to rninimize the probability of making that kind of error. There is, however,
another type of error that can be made in conducting a statistical test: type II error,
denoted P, which is the probability of accepting the nul1 hypothesis when it is false, i.e.
finding a variable to be significant when in fact, it is not. Not only can either of these
errors occur, but they are inversely related. An example of the two-group problem with
15 observations follows:
Table D.6 Relationrhip behveen a, P. and Power '26'
The entries in the last column, (1- P), is called the power of the experiment, and it
is the probability of rejecting the null hypotheses when it is false, i.e. fmding a variable to
be significant when it is. Depending on the circumstance, power analysis can be
undertaken before or after the data has data has been collected and analyzed. For
example, if a researcher is going to invest a lot of time and money in canying out a study,
then he or she would certainly want to have a hi& power, Le. a high probability of
finding what they are looking for if it is really there. Altematively, if a researcher has
already cornpleted a study and has found that a certain variable is insipificant, it is
important to know whether or not the power was high enough. If the power was low, the
chances of fmding significance may have been too low, and as such, significance was not
found even though it may have been there. A low power may lead the research to make
false conclusions about the significance of a variable [261.
The power of a statistical test depends on the following factors ['? - The a level set by the experimenter,
c The sample size n, and
The effect size, i.e. to what extent is the effect ofthe variable observable.
Power is heavily dependent on the sample size. For example, for a medium effect
size and an a = .05, the power of a test for different values of n is presented in Table D.7.
- --
Table D.7 Relationsbip between n and Power lZ6l
As the above example suggests, when a sample size is large, power is rarely a
problem. It is only when small sample sizes are evaluated that power cm influence the
results obtained.
The effect size is usually classified as small (f a 0.2), medium (f ' = 1.5), or
large (f ' > 3.5) ['Il. A large effect size is usually associated with a phenornenon which,
when present, is very easy to detect. In general, the effect size of phenomena are
considered medium. The equation relating the sample size n, the effect size f , and the
number of variables in the R-model K. is ['Il:
n = L + K + l (D. 13) F
where L is a parameter which depends on the a value chosen, the difference in the
number of variables between the R-model and the o-model (k~), and on the power of the
statistical test. L is obtained by consulting tables ruch as Table D.8 for a = .O5 ["l.
It is generally considered tme that a sample size of 50 or Iarger is sufficient to
detect a medium effect, i.e. the power of the test would be approximately 0.70.
D.10 The SAS Statistical Package
The Statistical Analysis System (SAS) was selected for use in this project because126.2%301.
** It is very widely distributed,
*O It is easy to use,
*O It can be used for a veIy wide range of analyses, fiom very simple statistics to
complex multivariate analyses, and
** It is a well documented package, having been in development and use for over two
decades.
Essentially, the SAS program reads a file created by the analyst and performs the
various analyses requested. Stmcturally, a SAS program is composed of three
fundamental blocks: the staternents setting up the data, the data lines, and a series of
procedure (PROC) statements which describe the statistical analyses to be performed on
the data entered 129.301.
For a list of the procedures and a complete description, it is suggested that the
reader refer to two volumes: the SASISTAT USER'S GUIDE, VOLUME 1 and 2 [29.301.
The most preferred volume in this project is VOLUME 2 which contains the fundamental
regression procedures. However, it is suggested that both volumes be consulted to fully
understand the scope of this statistical package, and becorne familiar with al1 of the
possible techniques that may be used to analyze the data.
Table D.8 Values of L for a = 0.5 13']