describing & examining scientific data science methods & practice bes 301 november 4 and 9,...
TRANSCRIPT
![Page 1: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/1.jpg)
Describing & Examining Scientific
Data
Science Methods & Practice BES 301
November 4 and 9, 2009
![Page 2: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/2.jpg)
Describing Scientific Data
32.6 cm 23.2
23.2 31.6
14.1 35.6
35.2 26.2
36.8 36.7
45.1 32.4
33.5 42.6
33.9 27.8
16.6 42.8
38.2 47.6
Length of coho salmon returning to North Creek October 8, 2003 *
* pretend data
These data need to be included in a report to Pacific Salmon Commission on salmon return in streams of the Lake Washington watershed
So, what now?
Do we just leave these data as they appear here?
![Page 3: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/3.jpg)
Describing Scientific Data
Length of coho salmon returning to North Creek
October 8, 2003
We can create a
![Page 4: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/4.jpg)
Describing Scientific Data
Length of coho salmon returning
to North Creek October 8, 2003
0
5
10
15
20
25
30
35
40
45
50
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Fish
Len
gth
(cm
)
A common mistake
This is NOT a frequency distribution
![Page 5: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/5.jpg)
Describing Scientific Data
Length of coho salmon returning to North Creek October 8, 2003 *
* pretend data
These data need to be included in a report to Pacific Salmon Commission on salmon return in streams of the Lake Washington watershed
Next steps? What information does a frequency distribution reveal?
Frequency Distribution of Coho Lengths
0
2
4
6
10 20 30 40 50
Length Class (cm)
Fre
qu
ency
This is also known as a “Histogram”
![Page 6: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/6.jpg)
Frequency Distribution of Coho Lengths
0
2
4
6
10 20 30 40 50
Length Class (cm)
Fre
qu
en
cy
Describing Scientific Data
Length of coho salmon returning to North Creek
October 8, 2003
![Page 7: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/7.jpg)
Describing Scientific Data
Mean values can hide important information
Valiela (2001)
Populations with
• the same mean value but
• different frequency distributions
Non-normal distribution
Normal distribution of data
![Page 8: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/8.jpg)
Describing Scientific Data
Normal distributions are desirable and required for many statistical methods.
Even simple comparisons of mean values is NOT good if distributions underlying those means are not “normal”.
Thus, data with non-normal distributions are usually mathematically transformed to create a normal distribution.
Problems with non-normal distributions
![Page 9: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/9.jpg)
Describing Scientific Data
Mean values can hide important information
Populations with
• the same mean value but
• different frequency distributions
![Page 10: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/10.jpg)
Describing Scientific Data
Expressing Variation in a Set of Numbers
32.6 cm 23.2
23.2 31.6
14.1 35.6
35.2 26.2
36.8 36.7
45.1 32.4
33.5 42.6
33.9 27.8
16.6 42.8
38.2 47.6
Range: difference between largest and smallest sample
Range = 33.5
Range = 14.1 – 47.6
(or sometimes expressed as both smallest and largest values)
![Page 11: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/11.jpg)
Describing Scientific Data
Expressing Variation in a Set of Numbers
32.6 cm 23.2
23.2 31.6
14.1 35.6
35.2 26.2
36.8 36.7
45.1 32.4
33.5 42.6
33.9 27.8
16.6 42.8
38.2 47.6
Variance & Standard Deviation: single-number expressions of the degree of spread in the data
Variance
x – x
For each value, calculate its deviation from the mean
![Page 12: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/12.jpg)
Describing Scientific Data
Expressing Variation in a Set of Numbers
32.6 cm 23.2
23.2 31.6
14.1 35.6
35.2 26.2
36.8 36.7
45.1 32.4
33.5 42.6
33.9 27.8
16.6 42.8
38.2 47.6
Variance & Standard Deviation: single-number expressions of the degree of spread in the data
Variance
( x – x )2
Square each deviation to get absolute value of each
deviation
![Page 13: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/13.jpg)
Describing Scientific Data
Expressing Variation in a Set of Numbers
32.6 cm 23.2
23.2 31.6
14.1 35.6
35.2 26.2
36.8 36.7
45.1 32.4
33.5 42.6
33.9 27.8
16.6 42.8
38.2 47.6
Variance & Standard Deviation: single-number expressions of the degree of spread in the data
Variance
( x – x )2
n - 1
Σ
Add up all the deviations and divide by the number of values (to get average
deviation from the mean)
![Page 14: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/14.jpg)
Describing Scientific Data
Expressing Variation in a Set of Numbers
Variance : An expression of the mean amount of deviation of the sample points from the mean value
Example of 2 data points: 20.0 & 28.6
1. Calculate the mean: (20.0 + 28.6)/2 = 24.3
2. Calculate the deviations from the mean: 20.0 – 24.3 = - 4.3 28.6 – 24.3 = 4.3
3. Square the deviations to remove the signs (negative): (-4.3)2 = 18.5(4.3)2 = 18.5
4. Sum the squared deviations: 18.5 + 18.5 = 37.0
5. Divide the sum of squares by the number of samples (minus one*) to standardize the variation per sample: 37.0 / (2-1) = 37.0
* Differs from textbook
Variance = 37.0
![Page 15: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/15.jpg)
Describing Scientific Data
Expressing Variation in a Set of Numbers
Standard Deviation (SD): Also an expression of the mean amount of deviation of the sample points from the mean value
Using the square root of the variance places the expression of variation back into the same units as the original measured values
(37.0)0.5 = 6.1
SD = 6.1
SD = √variance
![Page 16: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/16.jpg)
Frequency Distribution of Coho Lengths
0
2
4
6
10 20 30 40 50
Length Class (cm)
Fre
qu
en
cy
Describing Scientific Data
Length of coho salmon returning to North Creek October 8, 2003
SD captures the degree of spread in the data
32.8 ± 8.9Mean Standard
Deviation
Mean
Conventional expression
Mean ± 1 SD
![Page 17: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/17.jpg)
CV = (variance / x) * 100
Describing Scientific Data
Expressing Variation in a Set of Numbers
Coefficient of Variation (CV): An expression of variation present relative to the size of the mean value
Particularly useful when comparing the variation in means that differ considerably in magnitude or comparing the degree of variation of measurements with different units
Sometimes presented as SD / x
![Page 18: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/18.jpg)
An example of how the CV is useful
Describing Scientific Data
Coefficient of Variation (CV): An expression of variation present relative to the mean value
Soil moisture in two sites
Site Mean SD
1 4.8 2.05
2 14.9 3.6
CV
![Page 19: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/19.jpg)
Describing Scientific DataThe Bottom Line on expressing variation
Expression
What it is When to use it
Range Spread of dataWhen extreme absolute values are of importance
Variance Average deviation of samples from the mean
Often an intermediate calculation – not usually presented
Standard Deviation
Average deviation of samples from the mean on scale of original values
Simple & standard presentation of variation; okay when mean values for comparison are similar in magnitude
Coefficient of Variation
Average deviation of samples from the mean on scale of original values relative to size of the mean
Use to compare amount of variation among samples whose mean values differ in magnitude
Bottom Line: use the most appropriate expression; not just what everyone else uses!
![Page 20: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/20.jpg)
Does variation only obscure the true values we seek?
![Page 21: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/21.jpg)
The importance of knowing the degree of variation
Why are measures of variation important?
![Page 22: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/22.jpg)
The importance of knowing the degree of variation
Spawning site density (# / meter)
North Creek 0.40 ± 5.2 a
Bear Creek 0.42 ± 0.2 a
Density of sockeye salmon spawning sites along two area creeks (mean ± SD)
What can you conclude from the means?
What can you conclude from the SDs?
![Page 23: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/23.jpg)
The importance of knowing the degree of variation
Annual temperature (°F)
Darrington (western WA)
49.0 ± 11.2 a
Leavenworth (eastern WA)
48.4 ± 24.7 a
Mean annual air temperature at sites in eastern and western WA at 1,000 feet elevation (mean ± SD)
What can you conclude from the means?
What can you conclude from the SDs?
![Page 24: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/24.jpg)
Look before you leap:
The value of examining raw data
![Page 25: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/25.jpg)
The value of examining raw data
Types of variables?
Wetland# Plant Species
Mean WLF (cm)
1 11 2
2 10 5
3 11 6
4 2 19
5 2 12
6 5 7
7 10 12
8 10 13
9 10 46
10 2 50
How might we describe these data?
Example Problem: Do fluctuating water levels affect the ecology of urban wetlands?
![Page 26: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/26.jpg)
Puget Sound Wetlands
(Cooke & Azous 2001)
Water Level Fluctuation
# P
lan
t sp
ecie
s
Scatter plots help you to assess the
nature of a
relationshipbetween variables
The value of examining raw data
Example Problem: Do fluctuating water levels affect the ecology of urban wetlands?
![Page 27: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/27.jpg)
Puget Sound Wetlands
Water Level Fluctuation
# P
lan
t sp
ecie
s
What might you do to describe this
relationship?
The value of examining raw data
Example Problem: Do fluctuating water levels affect the ecology of urban wetlands?
![Page 28: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/28.jpg)
Puget Sound Wetlands
Water Level Fluctuation
# P
lan
t sp
ecie
s
So – does this capture the nature
of this relationship?
Group work time to devise strategy for
expressing relationship
The value of examining raw data
Example Problem: Do fluctuating water levels affect the ecology of urban wetlands?
![Page 29: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/29.jpg)
Puget Sound Wetlands
Water Level Fluctuation
# P
lan
t sp
ecie
sDescribing Scientific Data
![Page 30: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/30.jpg)
BOTTOM LINEPuget Sound Wetlands
Water Level Fluctuation
# P
lan
t sp
ecie
sDescribing Scientific Data
![Page 31: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/31.jpg)
Sampling the World
![Page 32: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/32.jpg)
Rarely can we measure everything –
Instead we usually SAMPLE the world
The “entire world” is called the
“population”
What we actually
measure is called the “sample”
![Page 33: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/33.jpg)
QUESTION
Does Douglas-fir grow taller on the east slopes of the Cascades or on the west slopes of the Cascades?
Populations & Samples
How would you study this?
![Page 34: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/34.jpg)
Does Douglas-fir grow taller on the east slopes of the Cascades or on the west slopes of the Cascades?
Populations & Samples
1.
![Page 35: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/35.jpg)
Does Douglas-fir grow taller on the east slopes of the Cascades or on the west slopes of the Cascades?
Populations & Samples
2.
![Page 36: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/36.jpg)
Does Douglas-fir grow taller on the east slopes of the Cascades or on the west slopes of the Cascades?
Populations & Samples
3. A major research question:
How many samples do you need to accurately describe (1) each population and (2) their possible difference?
• Depends on sample variability (standard error – see textbook pages 150-152) and degree of difference between populations
• Sample size calculations before study are important (statistics course for more detail)
![Page 37: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/37.jpg)
Does Douglas-fir grow taller on the east slopes of the Cascades or on the west slopes of the Cascades?
Populations & Samples
4. Samples are taken that are representative of the population
![Page 38: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/38.jpg)
Samples are taken representative of the population
What criteria are used to locate the samples?
Does Douglas-fir grow taller on the east slopes of the Cascades or on the west slopes of the Cascades?
Populations & Samples
![Page 39: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/39.jpg)
Does Douglas-fir grow taller on the east slopes of the Cascades or on the west slopes of the Cascades?
Populations & Samples
Samples are taken representative of the population
What criteria are used to locate the samples?
![Page 40: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/40.jpg)
Samples are taken representative of the population
What criteria are used to locate the samples?
• RandomizationCommonly, but not in all studies
(statistics course)
Bottom Line: sampling scheme needs to be developed INTENTIONALLY. It must match the question
& situation, not what is “usually done”.
• StratificationAccounts for uncontrolled variables
(e.g., elevation, aspect)
Does Douglas-fir grow taller on the east slopes of the Cascades or on the west slopes of the Cascades?
Populations & Samples
![Page 41: Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009](https://reader036.vdocuments.us/reader036/viewer/2022062519/5697c00e1a28abf838cca06d/html5/thumbnails/41.jpg)
Describing & Taking Data
Conventional Wisdom Best Practices
Examine the data
Compare averages Look at raw data as well as summaries
Summarize the data
Take averages Frequency distributions, Means, etc.
Describe variation
Use Standard Deviation Use measure appropriate to need
Use variation data
As an adjunct to means (testing for differences)
Use for understanding system as well as for statistical tests
Describing relationships
Fit a line to data Use approach appropriate to need; examine raw data
Creating a sampling scheme
Take random samples Use approach appropriate to need
SOME BOTTOM LINES FOR THE LAST 2 DAYS