quality assurance & quality control. references primary references –michener and brunt (2000)...
TRANSCRIPT
![Page 1: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/1.jpg)
Quality Assurance & Quality Control
![Page 2: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/2.jpg)
References• Primary References
– Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell Science.
• Edwards (2000) Ch. 4 • Brunt (2000) Ch. 2 • Michener (2000) Ch. 7
– Dux, J.P. 1986. Handbook of Quality Assurance for the Analytical Chemistry Laboratory. Van Nostrand Reinhold Company
– Mullins, E. 1994. Introduction to Control Charts in the Analytical Laboratory: Tutorial Review. Analyst 119: 369 – 375.
– Grubbs, Frank (February 1969), Procedures for Detecting Outlying Observations in Samples, Technometrics, Vol. 11, No. 1, pp. 1-21.
![Page 3: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/3.jpg)
Outline:
• Define QA/QC• QC procedures
– Designing data sheets– Data entry using validation rules, filters,
lookup tables• QA procedures
– Graphics and Statistics – Outlier detection
• Samples• Simple linear regression
![Page 4: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/4.jpg)
QA/QC• “mechanisms [that] are designed to prevent
the introduction of errors into a data set, a process known as data contamination”
Brunt 2000
![Page 5: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/5.jpg)
Errors (2 types)• Commission: Incorrect or inaccurate
data in a dataset– Can be easy to find– Malfunctioning instrumentation
• Sensor drift• Low batteries• Damage• Animal mischief
– Data entry errors
• Omission– Difficult or impossible to find– Inadequate documentation of data values,
sampling methods, anomalies in field, human errors
![Page 6: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/6.jpg)
Quality Control• “mechanisms that are applied in advance,
with a priori knowledge to ‘control’ data quality during the data acquisition process”
Brunt 2000
![Page 7: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/7.jpg)
Quality Assurance• “mechanisms [that] can be applied after
the data have been collected, entered in a computer and analyzed to identify errors of omission and commission”– graphics– statistics
Brunt 2000
![Page 8: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/8.jpg)
QA/QC Activities• Defining and enforcing standards for
formats, codes, measurement units and metadata.
• Checking for unusual or unreasonable patterns in data.
• Checking for comparability of values between data sets.
Brunt 2000
![Page 9: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/9.jpg)
Outline:
• Define QA/QC• QC procedures
– Designing data sheets– Data entry using validation rules, filters,
lookup tables• QA procedures
– Graphics and Statistics – Outlier detection
• Samples• Simple linear regression
![Page 10: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/10.jpg)
Flowering Plant Phenology – Data Entry Form Design
• Four sites, each with 3 transects• Each species will have phenological
class recorded
![Page 11: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/11.jpg)
Plant Life Stage______________ _____________________________ _____________________________ _____________________________ _____________________________ _____________________________ _____________________________ _____________________________ _______________
What’s wrong with this data sheet?
Data Collection Form Development:
![Page 12: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/12.jpg)
Plant Life Stageardi P/G V B FL FR M S D NParpu P/G V B FL FR M S D NPatca P/G V B FL FR M S D NPbamu P/G V B FL FR M S D NPzigr P/G V B FL FR M S D NP
P/G V B FL FR M S D NPP/G V B FL FR M S D NP
PHENOLOGY DATA SHEETCollectors:_________________________________Date:___________________ Time:_________Location: black butte, deep well, five points, goat draw
Transect: 1 2 3
Notes: _________________________________________
P/G = perennating or germinating M = dispersingV = vegetating S = senescingB = budding D = deadFL = flowering NP = not presentFR = fruiting
![Page 13: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/13.jpg)
ardi P/G V B FL FR M S D NP
Y N Y N Y N Y NY NY NY N Y N Y N
arpu P/G V B FL FR M S D NP
Y N Y N Y N Y NY NY NY N Y N Y N
asbr P/G V B FL FR M S D NP
Y N Y N Y N Y NY NY NY N Y N Y N
deob P/G V B FL FR M S D NP
Y N Y N Y N Y NY NY NY N Y N Y N
PHENOLOGY DATA ENTRYCollectors Troy MadduxDate: 16 May 1991 Time: 13:12Location: Deep WellTransect: 1Notes: Cloudy day, 3 gopher burrows on transect
![Page 14: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/14.jpg)
Outline:
• Define QA/QC• QC procedures
– Designing data sheets– Data entry using validation rules, filters
and lookup tables• QA procedures
– Graphics and Statistics – Outlier detection
• Samples• Simple linear regression
![Page 15: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/15.jpg)
Validation Rules:• Control the values that a user can
enter into a field
![Page 16: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/16.jpg)
Validation Rule Examples:
• >= 10• Between 0 and 100• Between #1/1/70# and Date()
![Page 17: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/17.jpg)
Validation rules in MS Access: Enter in Table Design View
>=0 and <= 100
![Page 18: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/18.jpg)
Look-up Fields• Display a list of values from which
entry can be selected
![Page 19: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/19.jpg)
Look-up Tables in MS Access: Enter in Table Design View
![Page 20: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/20.jpg)
Macros• Validate data based on conditional
statements
![Page 21: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/21.jpg)
You want to make sure that a value for vegetation cover is entered in every record. To do this, create a macro called “NoData” that will examine the contents of the cover field whenever the field is exited.
![Page 22: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/22.jpg)
IsNull([Forms]![Vegetation_data]![Cover])
This is what the “NoData” Macro looks like.
![Page 23: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/23.jpg)
Attach the macro to the “On Exit Event” in the Text Box: Cover Properties Window
![Page 24: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/24.jpg)
When the user tabs out of the “cover” field without entering data, this message box flashes to the screen.
![Page 25: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/25.jpg)
Other methods for preventing data contamination
• Double-keying of data by independent data entry technicians followed by computer verification for agreement
• Use text-to-speech program to read data back
• Filters for “illegal” data– Computer/database programs
• Legal range of values• Sanity checks
Edwards 2000
![Page 26: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/26.jpg)
Table ofPossibleValues
and Ranges
Raw DataFile
Illegal DataFilter
Report ofProbable
Errors
Flow of Information in Filtering “Illegal” Data
Edwards 2000
![Page 27: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/27.jpg)
ID cover (%) 1998 (cm) 1999 (cm)a 43 3 2b 133 3 4c 16 5 4d 230 2 3
Data set ‘tree’
Data Checkum; Set tree;
Message=repeat(“ ”,39);
If cover<0 or cover>100 then do; message=“cover is not in the interval [0,100]”; output; end;
If dbh_1998>dbh_1999 then do; message=“dbh_1998 is larger than dbh_1999”; output; end;
If message NE repeat(“ ”, 39);
Keep ID message;
Proc Print Data=Checkum;
A filter written in SAS
Error reportObs ID message
1 a dbh_1998 is larger than dbh_1999
2 b cover is not in the interval [0,100]
3 c dbh_1998 is larger than dbh_1999
4 d cover is not in the interval [0,100]
![Page 28: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/28.jpg)
Outline:
• Define QA/QC• QC procedures
– Designing data sheets– Data entry using validation rules, filters,
lookup tables• QA procedures
– Graphics and Statistics – Outlier detection
• Samples• Simple linear regression
![Page 29: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/29.jpg)
Identifying Sensor Errors: Comparison of data from three Met stations, Sevilleta LTER
![Page 30: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/30.jpg)
Identification of Sensor Errors: Comparison of data from three Met stations, Sevilleta LTER
![Page 31: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/31.jpg)
QA/QC in the Lab: Using Control Charts
![Page 32: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/32.jpg)
Laboratory quality control using statistical process control charts:
Determine whether analytical system is “in control” by examining Mean Variability (range)
![Page 33: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/33.jpg)
All control charts have three basic components: – a centerline, usually the mathematical
average of all the samples plotted. – upper and lower statistical control
limits that define the constraints of common cause variations.
– performance data plotted over time.
![Page 34: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/34.jpg)
16
18
20
22
24
26
0 2 4 6 8 10 12 14 16 18 20
Time
Mean
LCL
UCL
X-Bar Control Chart
![Page 35: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/35.jpg)
Constructing an X-Bar control chart
• Each point represents a check standard run with each group of 20 samples, for example
• UCL = mean + 3*standard deviation
![Page 36: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/36.jpg)
Things to look for in a control chart:
The point of making control charts is to look at variation, seeking patterns or statistically unusual values. Look for:– 1 data point falling outside the control limits – 6 or more points in a row steadily increasing
or decreasing – 8 or more points in a row on one side of the
centerline – 14 or more points alternating up and down
![Page 37: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/37.jpg)
Linear trend
0
2
4
6
8
10
12
0 2 4 6 8 10 12
time
![Page 38: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/38.jpg)
Outline:
• Define QA/QC• QC procedures
– Designing data sheets– Data entry using validation rules, filters,
lookup tables• QA procedures
– Graphics and Statistics – Outlier detection
• Samples• Simple linear regression
![Page 39: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/39.jpg)
Outliers• An outlier is “an unusually extreme value
for a variable, given the statistical model in use”
• The goal of QA is NOT to eliminate outliers! Rather, we wish to detect unusually extreme values.
Edwards 2000
![Page 40: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/40.jpg)
Outlier Detection
• “the detection of outliers is an intermediate step in the elimination of [data] contamination”– Attempt to determine if contamination is
responsible and, if so, flag the contaminated value.
– If not, formally analyse with and without “outlier(s)” and see if results differ.
![Page 41: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/41.jpg)
Methods for Detecting Outliers
• Graphics– Scatter plots– Box plots– Histograms– Normal probability plots
• Formal statistical methods– Grubbs’ test
Edwards 2000
![Page 42: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/42.jpg)
X-Y scatter plots of gopher tortoise morphometrics
Michener 2000
![Page 43: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/43.jpg)
Example of exploratory data analysis using SAS Insight®Michener 2000
![Page 44: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/44.jpg)
Box Plot Interpretation
Upper quartileMedian
Lower quartile
Upper adjacent value
Pts. > upper adjacent value
Lower adjacent valuePts. < lower adjacent value
Inter-quartile range
![Page 45: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/45.jpg)
Box Plot Interpretation
Inter-quartile range
IQR = Q(75%) – Q(25%)
Upper adjacent value = largest observation < (Q(75%) + (1.5 X IQR))
Lower adjacent value = smallest observation > (Q(25%) - (1.5 X IQR))
Extreme outlier: > 3 X IQR beyond upper or lower adjacent values
![Page 46: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/46.jpg)
Box Plots Depicting Statistical Distribution of Soil Temperature
![Page 47: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/47.jpg)
Statistical tests for outliers assume that the data are normally distributed….
CHECK THIS ASSUMPTION!
![Page 48: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/48.jpg)
Y
-3 -2 -1 0 1 2 3
0.0
0.1
0.2
0.3
0.4
a. The Normal Frequency Curve
Y
-3 -2 -1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
b. Cumulative Normal Curve Areas
Normal density and Cumulative Distribution Functions
Edwards 2000
![Page 49: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/49.jpg)
Quantiles of Standard Normal
Y
-2 -1 0 1 2
78
910
1112
Normal Plot of 30 Observations from a Normal Distribution
Edwards 2000
![Page 50: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/50.jpg)
Quantiles of Standard Normal
Y
-2 -1 0 1 2
01
23
4
a. a right skewed distribution
Quantiles of Standard Normal
Y
-2 -1 0 1 2
0.0
1.0
2.0
3.0
b. a left-truncated Normal distribution
Quantiles of Standard Normal
Y
-2 -1 0 1 2
-50
5
c. a heavy-tailed distribution
Quantiles of Standard Normal
Y
-2 -1 0 1 2
24
68
10
12
14
16
d. a contaminated Normal distribution
Normal Plots from Non-normally Distributed Data
Edwards 2000
![Page 51: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/51.jpg)
Grubbs’ test for outlier detection in a univariate data set:
Tn = (Yn – Ybar)/S
where Yn is the possible outlier,
Ybar is the mean of the sample, and
S is the standard deviation of the sample
Contamination exists if Tn is greater than T.01[n]
![Page 52: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/52.jpg)
Example of Grubbs’ test for outliers: rainfall in acre-feet
from seeded clouds (Simpson et al. 1975)
4.1 7.7 17.5 31.432.740.6 92.4 115.3 118.3119.0 129.6 198.6
200.7 242.5 255.0274.7 274.7 302.8334.1 430.0 489.1703.4 978.0 1656.0 1697.82745.6T26 = 3.539 > 3.029; Contaminated
Edwards 2000But Grubbs’ test is sensitive to non-normality……
![Page 53: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/53.jpg)
0 500 1500 2500
05
10
15
20
rainfall
a. rainfall
Quantiles of Standard Normal
rain
fall
-2 -1 0 1 2
0500
1500
2500
b. Normal plot
0.5 1.5 2.5 3.5
02
46
810
log10(rainfall)
c. log10(rainfall)
Quantiles of Standard Normal
log10(r
ain
fall)
-2 -1 0 1 2
0.5
1.5
2.5
3.5
d. Normal plot
Edwards 2000
Checking Assumptions on Rainfall Data
Contaminated
Normallydistributed
![Page 54: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/54.jpg)
Simple Linear Regression:check for model-based……
• Outliers• Influential (leverage) points
![Page 55: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/55.jpg)
Influential points in simple linear regression:
• A “leverage” point is a point with an unusual regressor value that has more weight in determining regression coefficients than the other data values.
• An “outlier” is an observation with a response value that does not fit the X-Y pattern found in the rest of the data.
Edmonds 2000
![Page 56: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/56.jpg)
x
y
0 10 20 30 40 50
50
100
150
outlier andleverage pt
leverage pt
outlier
Edwards 2000
Influential Data Points in a Simple Linear Regression
![Page 57: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/57.jpg)
x
y
0 10 20 30 40 50
50
100
150
outlier andleverage pt
leverage pt
outlier
Edwards 2000
Influential Data Points in a Simple Linear Regression
![Page 58: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/58.jpg)
x
y
0 10 20 30 40 50
50
100
150
outlier andleverage pt
leverage pt
outlier
Edwards 2000
Influential Data Points in a Simple Linear Regression
![Page 59: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/59.jpg)
x
y
0 10 20 30 40 50
50
100
150
outlier andleverage pt
leverage pt
outlier
Edwards 2000
Influential Data Points in a Simple Linear Regression
![Page 60: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/60.jpg)
body weight (kg)
bra
in w
eig
ht (g
)
0 2000 4000 6000
01000
2000
3000
4000
5000
African Elephant
Asian Elephant
Man
Edwards 2000
Brain weight vs. body weight, 63 species of terrestrial
mammals Leverage pts.
“Outliers”
![Page 61: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/61.jpg)
log10 body weight (kg)
log10 b
rain
weig
ht (g
)
-2 -1 0 1 2 3 4
-10
12
3
Elephants
Man
Chinchilla
mispunched point
Edwards 2000
Logged brain weight vs. logged body weight
“Outliers”
![Page 62: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/62.jpg)
Outliers in simple linear regression
0
50
100
150
200
250
0 50 100 150 200 250 300 350 400 450
Beta-Glucosidase
Ph
os
ph
ata
se
Observation 62
![Page 63: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/63.jpg)
Outliers: identify using studentized residuals
Contamination may exist if
|ri| > t /2, n-3
= 0.01
![Page 64: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/64.jpg)
Simple linear regression:Outlier identification
n = 86
t/2,83 = 1.98
![Page 65: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/65.jpg)
Simple linear regression:detecting leverage points
hi = (1/n) + (xi – x)2/(n-1)Sx2
A point is a leverage point if hi > 4/n, where n is the number of points used to fit the regression
![Page 66: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/66.jpg)
Regression with leverage point:Soil nitrate vs. soil moisture
![Page 67: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/67.jpg)
Regression without leverage point
Observation 46
![Page 68: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/68.jpg)
Output from SAS:Leverage points
n = 336
hi cutoff value: 4/336=0.012
![Page 69: Quality Assurance & Quality Control. References Primary References –Michener and Brunt (2000) Ecological Data: Design, Management and Processing. Blackwell](https://reader034.vdocuments.us/reader034/viewer/2022051517/5697c0241a28abf838cd44c9/html5/thumbnails/69.jpg)
Questions?