arm lecture 7
TRANSCRIPT
-
8/14/2019 ARM Lecture 7
1/152
LOGO
Data AnalysisPart-A
Data AnalysisPart-A
Lecture # 7
1
-
8/14/2019 ARM Lecture 7
2/152
joke
Three statisticians go hunting. Whenthey see a rabbit, the first oneshoots, missing it on the left. Thesecond one shoots and misses it onthe right. The third one shouts:"We've hit it!"
2
-
8/14/2019 ARM Lecture 7
3/152
joke
Two statisticians were travelling in an airplane from LA to
New York. About an hour into the flight, the pilot announcedthat they had lost an engine, but don't worry, there arethree left. However, instead of 5 hours it would take 7 hoursto get to New York. A little later, he announced that asecond engine failed, and they still had two left, but itwould take 10 hours to get to New York. Somewhat later,
the pilot again came on the intercom and announced that athird engine had died...
Never fear, he announced, because the plane could fly on asingle engine. However, it would now take 18 hours to getto New York. At this point, one statistician turned to theother and said, "Gee, I hope we don't lose that last engine,
or we'll be up here forever!"
3
-
8/14/2019 ARM Lecture 7
4/152
Contents
What is data analysis?1.
Descriptive Statistics2.
4
-
8/14/2019 ARM Lecture 7
5/152
Steps for data analysis
(1) preparing the data foranalysis,
(2) analyzing the data, and
(3) interpreting the data (i.e.,testing the research hypothesesand drawing valid inferences)
5
-
8/14/2019 ARM Lecture 7
6/152
LOGO
DATA PREPARATIONDATA PREPARATION
6
-
8/14/2019 ARM Lecture 7
7/152
DATA PREPARATION
Data represent the fruit of researcherslabor because they provide theinformation that will ultimately allowthem to:
describe phenomena,
predict events,
identify and quantify differences between conditions, and
establish the effectiveness of interventions
Because of their critical nature, datashould be treated with the utmostrespect and care.
7
-
8/14/2019 ARM Lecture 7
8/152
Researcher should carefully planhow the data will be: logged,
entered, transformed (as necessary), and
organized into a database that will facilitate
accurate and efficient statistical analysis.
8
-
8/14/2019 ARM Lecture 7
9/152
Logging and Tracking Data
Computer applications to facilitate the
processMicrosoft Access, Microsoft Excel, Claris
FileMaker, SPSS, SAS
The recruitment log is a comprehensive
record of all individuals approachedaboutparticipation in a study.
The log can also serve to record the datesand times that potential participants were
approached, whether they met eligibilitycriteria, and whether they agreed andprovided informed consent to participate inthe study
9
-
8/14/2019 ARM Lecture 7
10/152
Data Screening
Prior to data entry, the researchershould carefully screen all data foraccuracy.
You may need to recontact studyparticipants to address any omissions,errors, or inaccuracies.
Purposes of screening:
(1) responses are legible and understandable,
(2) responses are within an acceptable range,
(3) responses are complete, and
(4) all of the necessary information has been included.
10
-
8/14/2019 ARM Lecture 7
11/152
Constructing a Database
Data should be entered into a well-structured database.
Researcher should carefully consider
the structure of the database andhow it will be used
While designing the generalstructure of the database, the
researcher must carefully considerall of the variables that will need tobe entered.
11
-
8/14/2019 ARM Lecture 7
12/152
The Data Codebook
A data codebook is a written orcomputerized list that provides aclear and comprehensive descriptionof the variables that will be includedin the database.
A detailed codebook is essentialwhen the researcher begins to
analyze the data
12
-
8/14/2019 ARM Lecture 7
13/152
At a bare minimum, a data codebookshould contain the followingelements for each variable: Variable name
Variable description
Variable format (number, data, text)
Instrument or method of collection
Date collected Respondent or group
Variable location (in database)
Notes
13
-
8/14/2019 ARM Lecture 7
14/152
LOGO
Types of StatisticsTypes of Statistics
14
-
8/14/2019 ARM Lecture 7
15/152
Types of Statistics
Descriptive statistics allow theresearcher to describe the data andexamine relationships betweenvariables,
Inferential statistics allow theresearcher to examine causalrelationships
15
-
8/14/2019 ARM Lecture 7
16/152
Descriptive Statistics
Frequency distribution is simply acomplete list of all possible valuesor scores for a particular variable,along with the number of times(frequency) that each value or scoreappears in the data set.
This information can be delineated
in what is known as a frequencytable,
16
Example of Frequency Distribution
-
8/14/2019 ARM Lecture 7
17/152
Example of Frequency DistributionTable
17
-
8/14/2019 ARM Lecture 7
18/152
Histogram:
Still another way that thisdistribution may be depicted is inwhat is known as a histogram.
18
-
8/14/2019 ARM Lecture 7
19/152
Central TendencyThe central tendency of a
distribution is a number that
represents the typicalor mostrepresentative value in thedistribution. The most widely usedmeasures of central tendency are
the mean, median, and mode.
19
-
8/14/2019 ARM Lecture 7
20/152
LOGO
Mean, Median, ModeMean, Median, Mode& Range& RangeMean, Median, ModeMean, Median, Mode& Range& Range
20
-
8/14/2019 ARM Lecture 7
21/152
Vocabulary Review
Sum the answer to anaddition problem.
Addend the numbersyou added together to getthe sum.
6 + 9 = 15
21
-
8/14/2019 ARM Lecture 7
22/152
Definition
MeanMean
MeansMeans
AverageAverage
22
-
8/14/2019 ARM Lecture 7
23/152
Definition
MeanMean the average ofa group of numbers.
2, 5, 2, 1, 5
MeanMean ==
(2+5+2+1+5)/5(2+5+2+1+5)/5
= 3= 3 23 M i f d b
-
8/14/2019 ARM Lecture 7
24/152
Mean is found byevening out the
numbers2, 5, 2, 1, 5
24
-
8/14/2019 ARM Lecture 7
25/152
Mean is found byevening out the
numbers2, 5, 2, 1, 5
25
M i f d b
-
8/14/2019 ARM Lecture 7
26/152
Mean is found byevening out the
numbers2, 5, 2, 1, 5
mean =
3
26
-
8/14/2019 ARM Lecture 7
27/152
How to Find the Mean of
a Group of NumbersStep 1 Add all the numbers.
8, 10, 12, 18, 22, 26
8+10+12+18+22+26 = 96
27
-
8/14/2019 ARM Lecture 7
28/152
How to Find the Mean ofa Group of Numbers
Step 2 Divide the sum by thenumber of addends.
8, 10, 12, 18, 22, 26
8+10+12+18+22+26 = 96How many addends are there?
28
-
8/14/2019 ARM Lecture 7
29/152
How to Find the Mean ofa Group of Numbers
Step 2 Divide the sum by thenumber of addends.
16
29
-
8/14/2019 ARM Lecture 7
30/152
How to Find the Mean ofa Group of Numbers
The mean or average ofthese numbers is 16.
8, 10, 12, 18, 22, 26
30
-
8/14/2019 ARM Lecture 7
31/152
What is the mean ofthese numbers?
7, 10, 16
11
31
-
8/14/2019 ARM Lecture 7
32/152
What is the mean ofthese numbers?
2, 9, 14, 27
13
32
-
8/14/2019 ARM Lecture 7
33/152
What is the mean ofthese numbers?
1, 2, 7, 11, 19
8
33
-
8/14/2019 ARM Lecture 7
34/152
What is the mean ofthese numbers?
26, 33, 41, 52
38
34
D fi iti
-
8/14/2019 ARM Lecture 7
35/152
Definition
MeMeddianian
is in theis in the
MiMiddddlele
35
D fi iti
-
8/14/2019 ARM Lecture 7
36/152
Definition
MedianMedian the middlenumber in a set of orderednumbers.
1, 3, 7, 10, 13
Median = 7Median = 7
36
ow o n e e an
-
8/14/2019 ARM Lecture 7
37/152
ow o n e e anin a Group of Numbers
Step 1 Arrange the numbers inorder from least to greatest.
21, 18, 24, 19, 27
18, 19, 21, 24, 27
37
-
8/14/2019 ARM Lecture 7
38/152
How to Find the Medianin a Group of Numbers
Step 2 Find themiddle number.
21, 18, 24, 19, 27
18, 19, 21, 24, 27
38
How to Find the Median
-
8/14/2019 ARM Lecture 7
39/152
How to Find the Medianin a Group of Numbers
Step 2 Find themiddle number.
18, 19, 21, 24, 27
This is your median number.
39
H t Fi d th M di
-
8/14/2019 ARM Lecture 7
40/152
How to Find the Medianin a Group of Numbers
Step 3 If there are two middle numbers,find the mean of these two numbers.
18, 19, 21, 25, 27,28
40
How to Find the Median
-
8/14/2019 ARM Lecture 7
41/152
How to Find the Medianin a Group of Numbers
Step 3 If there are two middle numbers,
find the mean of these two numbers.
21+ 25
=
46
46/2 = 23 median
41
What is the median of
-
8/14/2019 ARM Lecture 7
42/152
What is the median ofthese numbers?
16, 10, 7
10
7, 10, 16
42
Wh i h di f
-
8/14/2019 ARM Lecture 7
43/152
What is the median ofthese numbers?
29, 8, 4, 11, 19
11
4, 8, 11, 19, 29
43
Wh t i th di f
-
8/14/2019 ARM Lecture 7
44/152
What is the median ofthese numbers?
31, 7, 2, 12, 14, 19
132, 7, 12, 14, 19, 31
12 + 14= 26 2) 26
44
What is the median of
-
8/14/2019 ARM Lecture 7
45/152
What is the median ofthese numbers?
53, 5, 81, 67, 25, 78
6053 + 67= 120
5, 25, 53, 67, 78, 81
45
-
8/14/2019 ARM Lecture 7
46/152
-
8/14/2019 ARM Lecture 7
47/152
Definition
ModeMode the mostpopular or that whichis in fashion.
Baseball caps are a mode today.
47
Definition
-
8/14/2019 ARM Lecture 7
48/152
Definition
ModeMode the number thatappears most frequently in a
set of numbers.
1, 1, 3, 7, 10, 13
Mode = 1Mode = 1
48
How to Find the Mode in
-
8/14/2019 ARM Lecture 7
49/152
How to Find the Mode ina Group of Numbers
Step 1 Arrange the numbers inorder from least to greatest.
21, 18, 24, 19, 18
18, 18, 19, 21,24
49
How to Find the Mode in
-
8/14/2019 ARM Lecture 7
50/152
How to Find the Mode ina Group of Numbers
Step 2 Find the number that isrepeated the most.
21, 18, 24, 19, 18
18, 18, 19, 21, 24
50
Which number is the
-
8/14/2019 ARM Lecture 7
51/152
Which number is themode?
29, 8, 4, 8, 19
8
4, 8, 8, 19, 29
51
Which number is the
-
8/14/2019 ARM Lecture 7
52/152
Which number is themode?
1, 2, 2, 9, 9, 4, 9, 10
9
1, 2, 2, 4, 9, 9, 9, 10
52
Which number is the
-
8/14/2019 ARM Lecture 7
53/152
Which number is themode?
22, 21, 27, 31, 21, 32
21
21, 21, 22, 27, 31, 32
53
Calculation of Mode
-
8/14/2019 ARM Lecture 7
54/152
Calculation of Mode
Data set (30.0, 32.0, 31.5, 33.5,32.0, 33.0, 29.0, 29.5, 31.0, 32.5,34.5, 33.5, 31.5, 30.5, 30.0, 34.0,32.0, 32.0, 35.0, 32.5.) mg/ L
54
Joke
-
8/14/2019 ARM Lecture 7
55/152
Joke
Three professors (a physicist, a chemist, and astatistician) are called in to see their dean. Justas they arrive the dean is called out of his office,leaving the three professors there. Theprofessors see with alarm that there is a fire in
the wastebasket.
The physicist says, "I know what to do! We mustcool down the materials until their temperature islower than the ignition temperature and then the
fire will go out."
55
-
8/14/2019 ARM Lecture 7
56/152
The chemist says, "No! No! I know what todo! We must cut off the supply of oxygenso that the fire will go out due to lack ofone of the reactants."
While the physicist and chemist debatewhat course to take, they both arealarmed to see the statistician runningaround the room starting other fires. They
both scream, "What are you doing?"
To which the statistician replies, "Trying toget an adequate sample size."
56
-
8/14/2019 ARM Lecture 7
57/152
LOGO
Normal DistributionNormal Distribution
57
Normal Distribution
-
8/14/2019 ARM Lecture 7
58/152
Normal Distribution
All values are symmetrically distributed
around the meanA normal distribution is a distribution of the
values of a variable that, when plotted,produces a symmetrical, bell-shaped curve
that rises smoothly from a small number ofcases at each extreme to a large number ofcases in the middle.
Characteristic bell-shaped curve
Assumed for all quality control statistics
58
Normal Distribution
-
8/14/2019 ARM Lecture 7
59/152
Normal Distribution
B l o o d U r e a m g
0
1
2
3
4
5
2 9 2 9 .5 3 0 3 0. 5 3 1 3 1 .5 3 2 3 2. 5 3 3 3 3 .5 34 3 4 .5 3 5
V a l u
Frequency
59
Accuracy and Precision
-
8/14/2019 ARM Lecture 7
60/152
Accuracy and Precision
Precision is the closeness ofrepeated measurements to eachother.
Accuracy is the closeness of
measurements to the true value.Quality Control monitors both
precision and the accuracy of theassay in order to provide reliable
results.
60
Precise and inaccurate
-
8/14/2019 ARM Lecture 7
61/152
Precise and inaccurate
61
Imprecise and inaccurate
-
8/14/2019 ARM Lecture 7
62/152
Imprecise and inaccurate
62
Precise and accurate
-
8/14/2019 ARM Lecture 7
63/152
Precise and accurate
63
LOGO
-
8/14/2019 ARM Lecture 7
64/152
LOGO
Measures ofDispersion
Measures ofDispersion
64
Measures of Dispersion
-
8/14/2019 ARM Lecture 7
65/152
Measures of Dispersionor Variability
There are several terms thatdescribe the dispersion or variabilityof the data around the mean:
Range
Variance
Standard Deviation
Coefficient of Variation
65
Range
-
8/14/2019 ARM Lecture 7
66/152
Range
Range is the difference or spreadbetween the highest and lowestobservations.
It is the simplest measure ofdispersion.
It makes no assumption about thecentral tendency of the data.
66
Definition
-
8/14/2019 ARM Lecture 7
67/152
Definition
RangeRange
is the distanceis the distance
BetweenBetween
67
Definition
-
8/14/2019 ARM Lecture 7
68/152
Definition
RangeRange the differencebetween the greatest andthe least value in a set of
numbers.
1, 1, 3, 7, 10, 13
Range = 12Range = 12
68
How to Find the Range in
-
8/14/2019 ARM Lecture 7
69/152
How to Find the Range ina Group of Numbers
Step 1 Arrange the numbers inorder from least to greatest.
21, 18, 24, 19, 27
18, 19, 21, 24, 27
69
How to Find the Range in
-
8/14/2019 ARM Lecture 7
70/152
How to Find the Range ina Group of Numbers
Step 2 Find the lowestand highest numbers.
21, 18, 24, 19, 27
18, 19, 21, 24, 27
70
How to Find the Range in
-
8/14/2019 ARM Lecture 7
71/152
How to Find the Range ina Group of Numbers
Step 3 Find the differencebetween these 2 numbers.
18, 19, 21, 24, 27
27 18 = 9The range is 9
71
What is the range?
-
8/14/2019 ARM Lecture 7
72/152
What is the range?
29, 8, 4, 8, 19
29 4= 25
4, 8, 8, 19, 29
72
What is the range?
-
8/14/2019 ARM Lecture 7
73/152
What is the range?
22, 21, 27, 31, 21, 32
32 21 = 11
21, 21, 22, 27, 31, 32
73
What is the range?
-
8/14/2019 ARM Lecture 7
74/152
What is the range?
31, 8, 3, 11, 19
31 3 = 28
3, 8, 11, 19, 31
74
What is the range?
-
8/14/2019 ARM Lecture 7
75/152
at s t e a ge
23, 7, 9, 41, 19
41 7 = 34
7, 9, 23, 19, 41
75
Calculation of Variance
-
8/14/2019 ARM Lecture 7
76/152
Variance is the measure ofvariability about the mean.
It is calculated as the averagesquared deviation from the mean.
the sum of the deviations from the mean, squared,divided by the number of observations (correctedfor degrees of freedom)
76
Calculation of Variance (S2)
-
8/14/2019 ARM Lecture 7
77/152
Ca cu a o o a a ce (S )
2
12
1n
)X(X)(SVariance
=
77
Calculation of Variance
-
8/14/2019 ARM Lecture 7
78/152
2
12
2.75
52.25/19
1n
)X(X)(SVariance
=
=
=
78
Calculation of Standard
-
8/14/2019 ARM Lecture 7
79/152
Calculation of StandardDeviation
The standard deviation (SD) is the square rootof the variance
-SD is the square root of the average squareddeviation from the mean
-SD is commonly used due to the same units asthe mean and the original observations
-SD is the principle calculation used to measuredispersion of results around a mean
79
Calculation of Standard
-
8/14/2019 ARM Lecture 7
80/152
variance
1
2
=
= n
)X(Xs
i
Calculation of StandardDeviations
80
Calculation of 1, 2 & 3 Standard
-
8/14/2019 ARM Lecture 7
81/152
Calculation of 1, 2 & 3 StandardDeviations
3s = 1.66 x 3 = 4.98
3.322x1.662s
1.662.751s
==
=
81
Standard Deviation and
-
8/14/2019 ARM Lecture 7
82/152
Probability
68.2%
95.5%
99.7%99.7%
Frequen
cy
-3s-3s -2s-2s -1s-1s MeanMean +1s+1s +2s+2s +3s+3s
X
82
Standard Deviation and
-
8/14/2019 ARM Lecture 7
83/152
Standard Deviation andProbability
For a data set of normal distribution,a value will fall within a range of:
+/- 1 SD 68.2% of the time +/- 2 SD 95.5% of the time
+/- 3 SD 99.7% of the time
83
Calculation of Range
-
8/14/2019 ARM Lecture 7
84/152
Calculation of Range
68.2% confidence limit: (1SD)
Mean + s = 32.0+1.66
Mean - s = 32.0-1.66Range 33.66- 30.34
84
Calculation of Range
-
8/14/2019 ARM Lecture 7
85/152
95. 5% confidence limit:(2SD)
Mean + 2s = 32.0+3.32
Mean - 2s = 32.0-3.32
Range 28.68 35.32
85
Calculation of Range
-
8/14/2019 ARM Lecture 7
86/152
99. 7 % confidence limit: (3SD)
Mean + 3s = 32.0+4.98
Mean - 3s = 32.0-4.98
Range 27.02 36.98
86
Standard Deviation and
-
8/14/2019 ARM Lecture 7
87/152
Probability
In general, people use the +/- 2 SDcriteria for the limits of theacceptable range for a test
When the measurement falls within
that range, there is 95.5%confidence that the measurement iscorrect
Only 4.5% of the time will a valuefall outside of that range due tochance; more likely it will be due toerror
87
Example
-
8/14/2019 ARM Lecture 7
88/152
Consider the following threedatasets:(1) 5, 25, 25, 25, 25, 25, 45(2) 5, 15, 20, 25, 30, 35, 45
(3) 5, 5, 5, 25, 45, 45, 45
88
Solution
-
8/14/2019 ARM Lecture 7
89/152
Case Standard Deviation1 11.55
2 13.23
3 20.00
The standard deviations for the datasets are
11.55, 13.23, and 20. The larger standard
deviations indicate greater variability in the data,
and in general we can say that smaller standarddeviations indicate less variability in the data.
89
Example 2
-
8/14/2019 ARM Lecture 7
90/152
Canal 1: Average width = 3 ft (max 4 and min 2 ft)
Canal 2: Average width = 3 ft (max 6 and min 1 ft)
90
Example 3
-
8/14/2019 ARM Lecture 7
91/152
Class 1 score: 25, 36, 45, 53, 69, 89Class 2 score: 45, 46, 47, 48, 50, 52
91
Example 4
-
8/14/2019 ARM Lecture 7
92/152
For example, each of the three populations{0, 0, 14, 14}, {0, 6, 8, 14} and {6, 6, 8,8} has a mean of 7.
92
Solution
-
8/14/2019 ARM Lecture 7
93/152
Their standard deviations are 7, 5, and1, respectively.The third population has a much
smaller standard deviation than theother two because its values are all
close to 7.In a loose sense, the standard
deviation tells us how far from themean the data points tend to be.
It will have the same units as the data
points themselves. If, for instance, thedata set {0, 6, 8, 14} represents theages of a population of 4 cows, thestandard deviation is 5 years.
93
Example 5
-
8/14/2019 ARM Lecture 7
94/152
Consider average temperatures for cities. Whiletwo cities may each have an average temperatureof 15 C, it's helpful to understand that the rangefor cities near the coast is smaller than for citiesinland, which clarifies that, while the average issimilar, the chance for variation is greater inlandthan near the coast.
So, an average of 15 occurs for one city with highsof 25 C and lows of 5 C, and also occurs foranother city with highs of 18 and lows of 12. Thestandard deviation allows us to recognize that theaverage for the city with the wider variation, andthus a higher standard deviation, will not offer asreliable a prediction of temperature as the citywith the smaller variation and lower standarddeviation.
94
Sigma
-
8/14/2019 ARM Lecture 7
95/152
z
percentage within percentage outside ratio outside
1 68.2689492% 31.7310508% 1 / 3.1514871
1.645 90% 10% 1 / 10
1.960 95% 5% 1 / 20
2 95.4499736% 4.5500264% 1 / 21.977894
2.576 99% 1% 1 / 100
3 99.7300204% 0.2699796% 1 / 370.398
3.2906 99.9% 0.1% 1 / 1000
4 99.993666% 0.006334% 1 / 15,788
5 99.9999426697% 0.0000573303% 1 / 1,744,278
6 99.9999998027% 0.0000001973% 1 / 506,800,000
7 99.9999999997440% 0.0000000002560% 1 / 390,600,000,000
95
Coefficient of Variation
-
8/14/2019 ARM Lecture 7
96/152
The Coefficient of Variation (CV) isthe standard Deviation (SD)expressed as a percentage of themean
Also known as Relative Standarddeviation (RSD)
CV % = (SD mean) x 100
96
Estimation Process
-
8/14/2019 ARM Lecture 7
97/152
Mean, , isunknown
Population Random SampleI am 95% confident
that is between40 & 60.Mean
X = 50
Estimation Process
Sample
97
Conclusion
-
8/14/2019 ARM Lecture 7
98/152
SD is a measure of dispersionaround the mean. In a normaldistribution, 68% of cases fallwithin one standard deviation of
the mean and 95% of cases fallwithin two standard deviations.
98
Standard Error of Mean.
-
8/14/2019 ARM Lecture 7
99/152
A measure of how much the value of themean may vary from sample to sampletaken from the same distribution.
It can be used to roughly compare the
observed mean to a hypothesized value(that is, you can conclude the two valuesare different if the ratio of the differenceto the standard error is less than -2 or
greater than +2).
99
Type I and II Erros
-
8/14/2019 ARM Lecture 7
100/152
Analyzing variables that are notnormally distributed can lead to: serious overestimation (Type I error) or
underestimation (Type II error).
So you must examin each variables skewness, which measures the overall lack of
symmetry of the distribution, and whether it looks
the same to the left and right of the center point;
and
kurtosis, which measures whether the data are
peaked or flat relative to a normal distribution
100
Kurtosis
-
8/14/2019 ARM Lecture 7
101/152
Kurtosis value tells whetherdistribution is peaked, flat, ornormal.
If Kurtosis value is zero, distribution
is normal, if it is positive, thendistribution is more peaked thannormal and if it is negative, thendistribution is flatter than normal.
Kurtosis values ranging from -1 to+1 are considered excellent. (George& Mallery, 2006, p. 98)
101
-
8/14/2019 ARM Lecture 7
102/152
For a normal distribution, the value of the kurtosis
statistic is zeroBell-shaped curves = describe in terms of its kurtosis
(curvature)
1. Leptokurtic = thin distribution
(concentrated at midpoint) (-)
2. Mesokurtic = normal distribution
3. Platykurtic = flat distribution (+)
102
-
8/14/2019 ARM Lecture 7
103/152
The large positive kurtosis tells youthat the distribution of data is morepeaked and has heavier tails thanthe normal distribution.
103
Skewness
-
8/14/2019 ARM Lecture 7
104/152
Skewness value tells whether
distribution is symmetrical orasymmetrical.
If Skewness value is zero, distribution issymmetrical, if it is positive, then
smaller values are in greater number indistribution and if it is negative, thenlarger values are greater in number indistribution.
Skewness values ranging from -2 to +2are acceptable.
104
Non-symmetrical
-
8/14/2019 ARM Lecture 7
105/152
1. Positive Skew = highnumber of low scores
2. Negative Skew = highnumber of high scores
105
Intro to Statistics ToolboxStatistics Toolbox/Descriptive Statistics
-
8/14/2019 ARM Lecture 7
106/152
Examples of Skewness & Kurtosis:
106
Skewness value = 0
-
8/14/2019 ARM Lecture 7
107/152
107
-
8/14/2019 ARM Lecture 7
108/152
Large positive skewness shows thatsale has a long right tail.
That is, the distribution isasymmetric, with some distant
values in a positive direction fromthe center of the distribution.
108
month-wise average temp (mm)
-
8/14/2019 ARM Lecture 7
109/152
Month Karachi Peshawar
January 30 -1February 31 4
March 32 25
April 33 35
May 34 40
June 35 48
July 35 50
August 34 45
September 33 38
October 32 35
November 31 25
December 30 4
Calculate CoV and see whether meaningful conclusion can be drawn109
-
8/14/2019 ARM Lecture 7
110/152
Example - Grapefruit Juice Study
-
8/14/2019 ARM Lecture 7
111/152
Descriptive Statistics
8 38 120 621 77.63 8.63 24.401 595.411
8
CRCL
Valid N (listwise)
Statist ic Stat ist ic Statis tic Statist ic Statist ic Std. Error Statist ic Statist ic
N Minimum Maximum Sum Mean Std.
DeviationVariance
111
Example - Smoking Status
-
8/14/2019 ARM Lecture 7
112/152
SMKSTTS
1990 37.9 37.9 37.91063 20.3 20.3 58.2
609 11.6 11.6 69.8
1332 25.4 25.4 95.2
253 4.8 4.8 100.0
5247 100.0 100.0
Never SmokedQuit > 10 Years Ago
Quit < 10 Years Ago
Current Cigarette Smoker
Other Tobacco User
Total
Valid
Frequency Percent Valid Percent
Cumulative
Percent
112
How to improve normality?
-
8/14/2019 ARM Lecture 7
113/152
Researchers often rely on one ofseveral transformations topotentially improve the normality ofcertain variables.
The most frequently usedtransformations are the square roottransformation, the logtransformation, and the inversetransformation.
113
-
8/14/2019 ARM Lecture 7
114/152
Square root transformation: Described simply, this type of transformation
involves taking the square root of each value within
a certain variable.
The one caveat is that you cannot take a squareroot of a negative number.
Fortunately, this can be easily remedied by adding
a constant, such as 1, to each item before
computing the square root.
114
-
8/14/2019 ARM Lecture 7
115/152
Log transformation: There is a wide variety of log transformations.
In general, however, a logarithm is the power (also
known as the exponent) to which a base number
has to be raised to get the original number. As with square root transformation, if a variable
contains values less than 1, a constant must be
added to move the minimum value of the
distribution.
115
-
8/14/2019 ARM Lecture 7
116/152
Inverse transformation:
This type of transformation involves taking the
inverse of each value by dividing it into 1.
For example, the inverse of 3 would be computed
as 1/3.
Essentially, this procedure makes very small values
very large, and very large values very small, and it
has the effect of reversing the order of a variables
scores.
Therefore, researchers using this transformationprocedure should be careful not to misinterpret the
scores following their analysis.
116
LOGO
-
8/14/2019 ARM Lecture 7
117/152
Box Plot
(Box andWhiskers Plot)
Box Plot
(Box andWhiskers Plot)
117
Box and Whisker Diagrams.
-
8/14/2019 ARM Lecture 7
118/152
4 5 6 7 8 9 10 11 12
Median
LowerQuartile
UpperQuartile
LowestValue
HighestValue
Box WhiskerWhisker
130 140 150 160 170 180 190
Boys
Girlscm
g
Box plots are useful for comparing two or more sets of data like
that shown below for heights of boys and girls in a class.
Anatomy of a Box and Whisker Diagram.
118
-
8/14/2019 ARM Lecture 7
119/152
If one whisker is longer , thedistribution of data is skewed in thedirection of longer whisker
119
Drawing a Box Plot.
-
8/14/2019 ARM Lecture 7
120/152
LowerQuartile= 5
Q1
UpperQuartile
= 9
Q3
Median= 8
Q2
4 5 6 7 8 9 10 11 12
4, 4, 5, 6, 8, 8, 8, 9, 9, 9, 10, 12
Example 1: Draw a Box plot for the data below
120
Drawing a Box Plot.
-
8/14/2019 ARM Lecture 7
121/152
UpperQuartile= 10
Q3
LowerQuartile
= 4
Q1
Median= 8
Q2
3, 4, 4, 6, 8, 8, 8, 9, 10, 10, 15,
Example 2: Draw a Box plot for the data below
3 4 5 6 7 8 9 10 11 12 13 14 15
121
Drawing a Box Plot.
-
8/14/2019 ARM Lecture 7
122/152
UpperQuartile= 180
Qu
LowerQuartile= 158
QL
Median= 171
Q2
Question: Stuart recorded the heights in cm of boys in his
class as shown below. Draw a box plot for this data.
137, 148, 155, 158, 165, 166, 166, 171, 171, 173, 175, 180, 184, 186, 186
130 140 150 160 170 180 190cm
122
Drawing a Box Plot.
-
8/14/2019 ARM Lecture 7
123/152
2.The boys are taller on average.
Question: Gemma recorded the heights in cm of girls in the same class andconstructed a box plot from the data. The box plots for both boys and girlsare shown below. Use the box plots to choose some correct statementscomparing heights of boys and girls in the class. Justify your answers.
130 140 150 160 170 180 190
Boys
Girls
cm
1.The girls are taller on average.
3.The girls show less variability in height.
4.The boys show less variability in height.
5.The smallest person is a girl
6.The tallest person is a boy123
LOGO
-
8/14/2019 ARM Lecture 7
124/152
Measures ofAssociation
Measures of
Association
124
Correlations
-
8/14/2019 ARM Lecture 7
125/152
Correlations are perhaps the mostbasic and most useful measure ofassociation between two or morevariables.
Expressed in a single number calleda correlation coefficient (r),correlations provide informationabout the direction of therelationship (either positive ornegative) and the intensity of therelationship (1.0 to +1.0).
125
-
8/14/2019 ARM Lecture 7
126/152
In addition to the direction and strength of
a correlation, the coefficient can be usedto determine the proportion of varianceaccounted for by the association. This isknown as the coefficient of determination
(r
2
).R2 is calculated quite easily by squaring the
correlation coefficient. in the followingmanner:
.70 .70 = .49 It explains approximately 49% of the
variance
126
-
8/14/2019 ARM Lecture 7
127/152
The Pearson r
The Pearson r is used to examineassociations between two variablesthat are measured on either ratio or
interval scales.For example, the Pearson r could be
used to examine the correlation
between days of exercise andpounds of weight loss.
127
-
8/14/2019 ARM Lecture 7
128/152
Point-biserial (rpbi): This is used toexamine the relationship between avariable measured on a naturallyoccurring dichotomous nominal
scale and a variable measured on aninterval (or ratio) scale
e.g., a correlation between gender[dichotomous] and SAT scores[interval].
128
-
8/14/2019 ARM Lecture 7
129/152
Spearman rank-order (rs):
This is used to examine therelationship between two variablesmeasured on ordinal scales
e.g., a correlation of class rank[ordinal] and socioeconomic status[ordinal]
129
-
8/14/2019 ARM Lecture 7
130/152
-
8/14/2019 ARM Lecture 7
131/152
Gamma ():This is used to examine the
relationship between one nominalvariable and one variable measured
on an ordinal scalee.g., a correlation of ethnicity
[nominal] and socioeconomic status
[ordinal]
131
LOGO
-
8/14/2019 ARM Lecture 7
132/152
Testing ofHypothesis
Testing of
Hypothesis
132
Hypothesis Testing
-
8/14/2019 ARM Lecture 7
133/152
Goal: Make statement(s) regardingunknown population parameter valuesbased on sample data
Elements of a hypothesis test:
Null hypothesis - Statement regarding the value(s) ofunknown parameter(s). Typically will imply no association
between explanatory and response variables in our
applications (will always contain an equality)
Alternative hypothesis - Statement contradictory to thenull hypothesis (will always contain an inequality)
133
-
8/14/2019 ARM Lecture 7
134/152
Common Statistical Tests
Large sample tests (z test)
Small sample tests (student t test)
Paired t test
Chi-square test
134
Determine The Hypothesis:Whether There is an Association
or Not
-
8/14/2019 ARM Lecture 7
135/152
135
or Not
Ho: The two variables are
independent
Ha: The two variables are
associated
-
8/14/2019 ARM Lecture 7
136/152
Exposure Outcome
Yes NoYes 2020 55
No 55 2525
Out of 25 men who had cancer, 20 claimed to have used
estrogens. Out of 30 men without cancer 5 claimed tohave used estrogens.
Total
Total
25 30
30
25
55136
4. Calculating Test Statistics
-
8/14/2019 ARM Lecture 7
137/152
137
=
e
eo
F
FF
22 )(
Observed
frequencies
Exp
ecte
d
freq
uenc
y
Exp
ecte
d
frequ
ency
5. Determine Degrees of Freedom
-
8/14/2019 ARM Lecture 7
138/152
df= (R-1)(C-1)
138
Numbero
f
levelsin
column
varia
ble
Numberof
levelsinrow
variable
Compare computed teststatistic against a
-
8/14/2019 ARM Lecture 7
139/152
139
statistic against a
tabled/critical valueThe computed value of the Pearson
chi- square statistic is comparedwith the critical value to determine if
the computed value is improbable
The critical tabled values are basedon sampling distributions of the
Pearson chi-square statisticIf calculated 2 is greater than 2
table value, reject Ho
Example
-
8/14/2019 ARM Lecture 7
140/152
140
Suppose a researcher is interestedin voting preferences on NRO issue.
A questionnaire was developed andsent to a random sample of 90
voters.The researcher also collects
information about the political partymembership of the sample of 90respondents.
-
8/14/2019 ARM Lecture 7
141/152
var a e requency a eor Contingency Table
-
8/14/2019 ARM Lecture 7
142/152
142
Favor Neutral Oppose f row
PML 10 10 30 50
PPP 15 15 10 40
fcolumn 25 25 40 n = 90
Obs
erve
d
frequ
encies
var a e requency a eor Contingency Table
Row
frequ
-
8/14/2019 ARM Lecture 7
143/152
143
Favor Neutral Oppose f row
PML 10 10 30 50
PPP 15 15 10 40
fcolumn 25 25 40 n = 90
uency
-
8/14/2019 ARM Lecture 7
144/152
Calculating Test Statistics
-
8/14/2019 ARM Lecture 7
145/152
145
Favor Neutral Oppose f row
Democrat f o =10
fe =13.9
fo =10
fe =13.9
fo =30
fe=22.2
50
Republican fo =15
fe
=11.1
fo =15
fe
=11.1
fo =10
fe
=17.8
40
fcolumn 25 25 40 n = 90
= 50*25/90
Calculating Test Statistics
-
8/14/2019 ARM Lecture 7
146/152
146
Favor Neutral Oppose f row
PML f o =10
fe =13.9
fo =10
fe =13.9
fo =30
fe=22.2
50
PPP f o =15
fe
=11.1
fo =15
fe
=11.1
fo =10
fe
=17.8
40
fcolumn 25 25 40 n = 90
= 40* 25/90
Calculating Test Statistics
-
8/14/2019 ARM Lecture 7
147/152
147
Favor Neutral Oppose f row
PML f o =10
fe =13.9
fo =10
fe =13.9
fo =30
fe=22.2
50
PPP f o =15
fe
=11.1
fo =15
fe
=11.1
fo =10
fe
=17.8
40
fcolumn 25 25 40 n = 90
Calculating Test Statistics
-
8/14/2019 ARM Lecture 7
148/152
148
=
e
eo
F
FF
2
2 )(
Observed
frequencies
Exp
ecte
d
freq
uenc
y
Exp
ecte
d
frequency
-
8/14/2019 ARM Lecture 7
149/152
Determine Degrees of Freedom
-
8/14/2019 ARM Lecture 7
150/152
df = (R-1)(C-1) =(2-1)(3-1) = 2
150
Compare computed test statistic againsta tabled/critical value
-
8/14/2019 ARM Lecture 7
151/152
151
= 0.05
df = 2
Critical tabled value = 5.991
Test statistic, 11.03, exceeds critical
valueNull hypothesis is rejected
PML & PPP differ significantly intheir opinions on gun control issues
LOGO
-
8/14/2019 ARM Lecture 7
152/152
www.themegallery.com