statistical analysis why?? (besides making your life difficult …) scientists must collect data...
TRANSCRIPT
Statistical analysis
Why?? (besides making your life difficult …)
Scientists must collect data AND analyze it
Does your data support your hypothesis? Is it valid?
Statistics helps us find relationships between sets of data.
You are the scientist now, you must be comfortable with analysis of your data
Let’s look at two sets of data
Sample 1-10, 0, 10, 20, 30
Sample 28, 9, 10, 11, 12
What can you tell me about this data???
Mean: the “average” of the data or the central tendency
Sample 1
-10, 0, 10, 20, 30
-10 + 0 + 10 + 20 + 30
5
Sample 2
8, 9, 10, 11, 12
8 + 9 + 10 + 11 + 12
5
Mean = 10 Mean = 10
Is this analysis complete???NO!
Range: how far is the spread?Largest # - smallest #
Sample 1
-10, 0, 10, 20, 30
30 – (-10)
Sample 2
8, 9, 10, 11, 12
12 - 8
Range = 40 Range = 4
Does this data help?Yes, Sample 1 is more dispersedObvious? Perhaps, but now shown mathematically
still notenough
Something more … standard deviation
SD is a measure to show how individual data points are dispersed around the mean
Assuming normal data distribution (bell curve)
68% of all collected values lie within +/- 1 SD
95% of all collected values lie within +/- 2 SD
So what???
Standard deviation A small SD indicates the
data values are clustered around the mean May also indicate few
exteme data points
A large SD indicates the data values are spread out May also indicate extreme
data points Outliers??
Standard deviation
= each data point = the meann = the total number
of data pointsΣ = the sum of all
the values
Let’s practice … Sample 1 -10, 0, 10, 20, 30
Remember = 10 (-10 – 10)2 + (0 – 10)2 + (10 – 10)2 + (20 – 10)2 + (30 – 10)2
(-20)2 + (-10)2 + (0)2 + (10)2 + (20)2
400 + 100 + 0 + 100 + 400 1000, divide by n – 1 (5 – 1 = 4) 1000/4 = 250, now √250 15.8
Let’s practice … Sample 2 8, 9, 10, 11, 12
Remember = 10 (8– 10)2 + (9 – 10)2 + (10 – 10)2 + (11 – 10)2 + (12 – 10)2
(-2)2 + (-1)2 + (0)2 + (1)2 + (2)2
4 + 1 + 0 + 1 + 4 10, divide by n – 1 (5 – 1 = 4) 10/4 = 2.5, now √2.5 1.58
Let’s compare …Sample 1
SD = 15.8Sample 2
SD = 1.58
How can I use this in my lab?
Error bars
Error bars represent the variability of your dataSTANDARD DEVIATIONrangemeasurement
uncertainties
Error bars
On a bar graph, the bar represents the mean of your data and the error bars represent +/- 1 sd
mean
sd
Error bars
On a line graph, the point represents the mean of your data and the error bars represent +/- 1 sd
mean
sd
t-test t-test determines statistical significance between 2
sample means Is the difference significant? Is the difference due to your variable?? Or is it random
chance?? How valid is your data?
t-test determines the probability that difference is due to random chance A p value (probability) of 0.05 (5%) shows a 5% chance of
randomness, but a 95% chance of confidence …
Key word!!!!!
You want 95% or higher!your difference IS DUE TO YOUR VARIABLE
t-testFor tests, you do
NOT need to calculate t-values, but you must be able to read a t-chart!!
For internal assessments, you may use calculators or excel to calculate t-values
This is therange youare hoping for
The difference between your samples has a HIGH probability of being due to your variable (and not chance)
Need to be able to calculate degrees of freedom
Calculating degrees of freedom
df = (n1 + n2) - 2
Size of sample 1
Size of sample 2
# of samples
Calculating degrees of freedom df = (n1 + n2) – 2
Population 1 -10, 0, 10, 20, 30
n1 = 5
Population 2 8, 9, 10, 11, 12
n2 = 5
df = (5 + 5) -2 df = 8
Using the t-table If df = 8 and t = 3.5, is
this a significant difference?
Less than 1% probability difference in data is due to chance
Therefore, greater than 99% probability difference in data is due to our variable
Other options, less commonly used in our class Median
The middle #, when arranged in numeric order
Sample 1 -10, 0, 10, 20, 30 Median = 10
Sample 2 8, 9, 10, 11, 12 Median = 10
Mode The # that occurs most
often
Sample 1 -10, 0, 10, 20, 30 No mode
Sample 2 8, 9, 10, 11, 12 No mode
Some practice: looking at plant height
Height in sun (cm)
Height in shade (cm)
124 131
120 60
153 131
98 160
124 212
141 117
156 131
128 95
139 145
117 118
Calculate the mean for both samples
Sun = 130 cmShade = 130 cm
Some practice: looking at plant height
Height in sun (cm)
Height in shade (cm)
124 131
120 60
153 131
98 160
124 212
141 117
156 131
128 95
139 145
117 118
Calculate the range for both samples
Sun = 58 cmShade = 152 cm
Some practice: looking at plant height
Height in sun (cm)
Height in shade (cm)
124 131
120 60
153 131
98 160
124 212
141 117
156 131
128 95
139 145
117 118
Calculate the median for both samples
Sun = 126 cmShade = 131 cm
If even # of samples, find the average of the two middle numbers
Some practice: looking at plant height
Height in sun (cm)
Height in shade (cm)
124 131
120 60
153 131
98 160
124 212
141 117
156 131
128 95
139 145
117 118
Calculate the mode for both samples
Sun = 124 cmShade = 131 cm
Some practice: looking at plant height
Height in sun (cm)
Height in shade (cm)
124 131
120 60
153 131
98 160
124 212
141 117
156 131
128 95
139 145
117 118
Calculate the sd for both samples
Sun = 17.56 cmShade = 39.85 cm
What does this mean?
Some practice: looking at plant height
Height in sun (cm)
Height in shade (cm)
124 131
120 60
153 131
98 160
124 212
141 117
156 131
128 95
139 145
117 118
Sun: sd = 17.56 cm Low sd indicates even
(close) distribution of data points
More valid
Shade: sd = 39.85 cm High sd indicates wide
spread of data points MAY indicate a problem
with your experimental design
Some practice: looking at plant height
Height in sun (cm)
Height in shade (cm)
124 131
120 60
153 131
98 160
124 212
141 117
156 131
128 95
139 145
117 118
If t = 1.5, is this a significant difference?No
Be careful: correlation vs. cause Observations (and carefully chosen data) may imply
a CORRELATION, but does NOT necessarily demonstrate a cause
The average global temperature has increased over the past 100 years.
The number of pirates in the world has decreased over the past 100 years.
Therefore, decreased number of pirates causes increased global temperatures
NO!
Be careful: correlation vs. cause
no
no
no !
Be careful: correlation vs. causeTo discern a
CAUSE, a valid EXPERIMENT must be done
Other scientists must also be able to repeat your experiment
Last word …Remember, it is
ALWAYS better to PROVE your experiment failed to support your hypothesis, than to lie about it being a success!!!
Any questions?