ex st 801 statistical methods introduction. basic definitions statistics : area of science concerned...
TRANSCRIPT
Ex St 801Statistical Methods
Introduction
Basic Definitions
STATISTICS: Area of science
concerned with extraction of
information from numerical data
and its use in making inference
about a population from data that
are obtained from a sample.
Basic Definitions (cont.)
POPULATION: set representing all measurements of interest to the investigator.
PARAMETER: an unknown
population characteristic of
interest to the investigator.
Basic Definitions (cont.)
SAMPLE: subset of measurements
selected from the population of
interest.
STATISTIC: a sample characteristic
of interest to the investigator.
Some Frequently Used Statistics and Parameters
SAMPLE POPULATION
MEAN y
VARIANCE s2
STANDARDDEVIATION
s
PROPORTION
Basic Definitions (cont.)
STATISTICAL INFERENCE :
making an "INFORMED GUESS" about
a parameter based on a statistic.
(This is the main objective of statistics.)
STATISTICAL INFERENCE
GATHER DATA
MAKE INFERENCES
POPULATION SAMPLE
PARAMETERS SAMPLE STATISTICS
etc. ,ˆ s, ,s , 2 y .,,,, 2 etc
More Basic Definitions
• A VARIABLE is a characteristic of an individual or object that may vary for different observations.
• A QUANTITATIVE VARIABLE measures a variable scale.
• A QUALITATIVE VARIABLE categorizes the values of the variable.
RAISIN BRAN EXAMPLE
• A cereal company claims that the average amount of raisins in its boxes of raisin bran is two scoops.
• A random sample of five boxes was taken off the production line, and an analysis revealed an average of 1.9 scoops per box.
Components of the Problem
• Identify the population
• Identify the sample
• Identify the symbol for the parameter
• Identify the symbol for the statistic
• Is the variable quantitative or qualitative?
ASPIRIN AND HEART ATTACKS 1
• Twenty thousand doctors participated in a study to determine if taking an aspirin every other day would result in a reduction of heart attacks.
ASPIRIN AND HEART ATTACKS 2
• The physicians were randomly divided into two groups. The first group (called the treatment group) received an aspirin every other day, while the other group (called the control group) received a placebo.
ASPIRIN AND HEART ATTACKS 3
• At the end of the study, there had been 104 heart attacks in the treatment group and 189 heart attacks in the control group.
Identifying Components of the Problem
• Identify the population
• Identify the sample
• Identify the symbol for the parameter
• Identify the symbol for the statistic
• Is the variable quantitative or qualitative?
Five Steps in a Statistical Study:
1. Stating the problem
2. Gathering the data
3. Summarizing the data
4. Analyzing the data
5. Reporting the results
Stating the Problem
• Specifically identifying the population to be sampled
• Identifying the parameter(s) being studied
Stating the Problem Example
• A researcher wanted to determine if a vitamin supplement would reduce the rate of certain cancers.
• A large study was conducted in China and the results indicated that people who had the vitamin supplement had a significantly lower cancer rate.
• Do the results of this study apply to Americans? Why or why not?
Gathering the Data
• SURVEYS
–Random Sampling
–Stratified Sampling
–Cluster Sampling
–Systematic sampling
Gathering the Data
• EXPERIMENTS
–Completely Randomized Design
–Randomized Block Design
–Factorial Design
More Definitions
DESCRIPTIVE STATISTICS:
Organizing and describing sample
information.
(Descriptive Statistics describe how things are.)
Graphical Displays for Qualitative Data
• PIE CHART• BAR CHART
Major Volcanoes in the World
30%
13%
11%
35%
3%
8%Africa
Antarctica
Asia
Europe
North America
South America
Major Volcanoes in the World
0 10 20 30 40 50
Africa
Antarctica
Asia
Europe
North America
South America
Graphical Displays for Quantitative Data
• HISTOGRAM• STEM AND LEAF DISPLAY
Histogram of Major Volcanoes in the World
0
5
10
15
20
25
30
2500 5000 7500 10000 12500 15000 17500 20000
Elevation
Fre
qu
en
cy
Life Expectancies in 33 Developed Nations
CountryLifeExpectancy Country
LifeExpectancy
Austrialia 76.3 Italy 75.5Austria 75.1 Japan 79.1Belgium 74.3 Luxembourg 74.1Britain 75.3 Malta 74.8Bulgaria 71.5 The
Netherlands76.5
Canada 76.5 New Zealand 74.2Czechoslovakia 71.0 Norway 76.3Demark 74.9 Poland 71.0East Germany 73.2 Portugal 74.1West Germany 75.8 Rumania 69.9Finland 74.8 Soviet Union 69.8France 75.9 Spain 76.6Greece 76.5 Sweden 77.1Hungary 69.7 Switzerland 77.6Iceland 77.4 United States 75.0Ireland 73.5 Yugoslavia 71.0Israel 75.2
Histogram of Life Expectanciesin 33 Developed Nations
0
1
2
3
4
5
6
7
8
9
10
71.20 72.80 74.40 76.00 77.60 79.20
Life Expectancy
Fre
qu
ency
Stem-Leaf Display for Elevation
KEY:UNIT = 1000
1 | 2 REPRESENTS
12000
STEM LEAF
0 001111
0 222333
0 444444444455555555
0 6666667777777
0 8888888999999999999
1 0000000000000111111
1 22222222333333
1 44555
1 67777
1 8889999
Construction of a Stem-Leaf Display
• List the stem values, in order, in a vertical column
• Draw a vertical line to the right of the stem values
• For each observation, record the leaf portion of the observation in the row corresponding to the appropriate stem
• Reorder the leaves from the lowest to highest within each stem row
Construction of a Stem-Leaf Display (cont.)
• If the number of leaves appearing in each stem is too large, divide the stems into two groups, the first corresponding to leaves 0 through 4, and the second corresponding to leaves 5 through 9. (This subdivision can be increased to five groups if necessary).
• Provide a key to your stem and leaf coding, so the reader can reconstruct the actual measurements.
Numerical Measures for Summarizing Data
TYPES:1. Measures of CENTRAL TENDENCY2. Measures of VARIABILITY3. Measures OF RELATIVE LOCATION
The Arithmetic Mean
The ARITHMETIC MEAN of a set of n
measurements (y1, y2, ..., yn ) is equal to
the sum of the measurements divided by
n.
n
y
y
n
i
i 1
The mathematical notation for the
ARITHMETIC MEAN is:
The Median
The MEDIAN of a set of n
measurements (y1, y2, ..., yn ) is the
value that falls in the middle position
when the measurements are ordered
from the smallest to the largest.
RULE FOR CALCULATINGTHE MEDIAN
1 Order the measurements from the
smallest to the largest.
2 A) If the sample size is odd, the
median is the middle
measurement.
B) If the sample size is even, the
median is the average of the two
middle measurements.
Example
A random sample of six values weretaken from a population. These values were:
y1=7, y2=1, y3=10, y4=8, y5=4, and y6=12.
What are the sample mean andsample median for these data?
Sample Mean
n
yyyyyyy 654321
CALCULATIONS FOR THE SAMPLE MEDIAN
( Ordered Sample)
MEDIAN = ( 7 + 8 ) / 2 = 7.5
y2=1, y5=5, y1=7, y4=8, y3=10, y6=12
Consider the following sample: 4 18 36 39 41 42 43 44 44
45
46 47 48 49 49 50 51 53 54 60
Which measure of central tendency best describes the central location of the data:
THE SAMPLE MEAN OR SAMPLE MEDIAN?
STEM LEAF 0 4 0 1 1 8 2 2 3 3 69 4 12344 4 567899 5 0134 5 6 0
MEASUREMENTS OF VARIABILITY
• RANGE• VARIANCE• STANDARD DEVIATION
DeviationThe DEVIATION of an observation yi from the sample mean is equal to:
Deviations to the left of the sample mean are negative and deviations to the right of the sample mean are positive.
Also, notice that the larger the squared deviation, the further away the observation is from the mean.
)( yyi
Formula for theSample Variance
11
2
1
1
2
1
2
2
n
n
y
y
n
yy
S
n
iin
ii
n
ii
Obs.
1 7 0 0
2 1 -6 36
3 10 3 9
4 8 1 1
5 4 -3 9
6 12 5 25
80
1 7 49
2 1 1
3 10 100
4 8 64
5 4 16
6 12 144
42 374
Y (Y-Y) (Y-Y)2 Obs. Y Y2
y 7
Calculation of Sample Variance
1616
56
42374
5
80
11
2
2
1
1
2
21
2
2
n
n
y
y
Sn
yy
S
n
iin
ii
n
ii
THE EMPIRICAL RULE
Given a large set of measurements
possessing a mound-shaped histogram, then
• the interval contains approximately 68% of the measurements.
• the interval contains approximately 95% of the measurements.
• the interval contains approximately 99.7% of the measurements.
y s
y 2s
y 3s
Percent of Observations Included between Certain Values of the Standard Deviation
-4 -3 -2 -1 0 1 2 3 4s s s s s s s s
68%
95%
99.7%
Major Volcanoes in the World
Emprical RuleInterval
Pecentage ofObservations Expected to
Fall within the Inteval
Actual Percentage ofObservations Foundwithin the Interval
4912 to 14058 68% 66.6%
339 to 18630 95% 95.7%
-4232 to 23202 99.7% 100%
TWO MEASURES OF RELATIVE STANDING
• Percentile• Quartile
The Pth Percentile is the value Xp such that p% of the measurements will fall below that value and (100-p)% of the measurements will fall above that value.
p% (100-p)%
Xp
Quartiles divide the measurements into four parts such that 25% of the measurements are contained in each part. The first quartile (Lower Quartile) is denoted by Q1, the second by Q2, and the third (Upper Quartile) by Q3.
Q1 Q2 Q3
25% 25%25%25%
Box and Whisker Plot Life Expectancies in 33 Developed
Nations
Life Expectancy
68
70
72
74
76
78
80
Calculating Fence Values
Lower Inner Fence: Q1 - 1.5 (IQR)
Upper Inner Fence: Q3 + 1.5 (IQR)
Lower Outer Fence: Q1 - 3 (IQR)
Upper Outer Fence: Q3 + 3 (IQR)
EXAMPLE: Construct a Box-and-Whisker Plot for the elevations of volcanoes in Africa
1,650 5,981 7,745 9,281 10,023 11,400 12,198
13,451 19,340
Median = Q1 = Q2 = IQR =
Lower Inner Fence = Upper Inner Fence = Lower Outer Fence = Upper Outer Fence =
BOX AND WHISKER PLOTMAJOR VOLCANOES IN AFRICA
Elevation0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
Ex St 801Statistical Methods
The End