statistics for college students-part 2
TRANSCRIPT
-
8/6/2019 Statistics for College Students-part 2
1/43
OVERVIEW
-
8/6/2019 Statistics for College Students-part 2
2/43
STATISTICS
Modules 5-8
Prepared by:
Mrs. Cristina H. Price
-
8/6/2019 Statistics for College Students-part 2
3/43
Module 5 The Normal Curve
The normal curve or the normal frequency distribution is a hypothetical distribution of scoresthat is widely used in statistical analysis. Since many psychological and physicalmeasurements are normally distributed, the concept of the normal curve can be used withmany scores. The characteristics of the normal curve make it useful in education and in thephysical and social sciences.
Characteristics of the Normal Curve
Some of the important characteristics of the normal curve are:
1. The normal curve is a symmetrical distribution of scores with an equal number of scoresabove and below the midpoint of the abscissa (horizontal axis of the curve).
2. The tails of the curve are asymptotic to the horizontal axis.
3. Since the distribution of scores is symmetrical the mean, median, and mode are all at thesame point on the abscissa. In other words, the mean = the median = the mode.
4. If we divide the distribution up into standard deviation units, a known proportion of scoreslies within each portion of the curve. The total area of the curve is equal to 1.
5. Tables exist so that we can find the proportion of scores above and below any part of thecurve, expressed in standard deviation units. Scores expressed in standard deviation unitsare referred to as Z-scores.
-
8/6/2019 Statistics for College Students-part 2
4/43
Standard score
It is the distance of an observed value (x) from the
mean in terms of the standard deviation. It tells
how many standard deviations the observed value
lies above or below the mean of its distribution.
s
xxz
=
=
xzOR
Where: x = observed value or raw score
= sample mean
s = sample standard deviation
= population mean = population standard deviation
x
-
8/6/2019 Statistics for College Students-part 2
5/43
Using MS Excels Statistical
Functions
Conversion of raw score to standard score
=standardize(x, , s)
Finding the area/probability value given the value of z=normsdist(z)
Finding the value of z given the probability value
=normsinv(p-value)
x
-
8/6/2019 Statistics for College Students-part 2
6/43
Sample problem
The average daily income of 2000 workers isP362.00 with a standard deviation ofP15.00. Assuming that the daily incomesare normally distributed,
a) what percent of the workers earn at leastP380.00 per day?
b) what percent of the workers earn below
P350.00 per day?c) determine the number of workers who earn
from P350.00 to P375.00 per day.
-
8/6/2019 Statistics for College Students-part 2
7/43
Exercises
1. In a departmental examination in statistics, the mean grade was74 and the standard deviation was 10. If the grades areapproximately normally distributed and 40 students got gradesbetween 70 and 80, how many students took the examination?
2. The experience of a certain hospital showed that the distributionof length of stay of its patients is normal with a mean of 11.5 days
and a standard deviation of 2 days.a) What percent of the patients stayed 9 days or less?
b) If a new method in nursing care is to be administered to themiddle 95% of the group, how long should a patient stay to beincluded in the study?
3. A study finds that the time spent on advertisement per hour on a
certain TV station is approximately distributed with mean equal to12.8 minutes and standard deviation equal to 2.2 minutes. Duringa randomly selected hour, what is the probability that between 14and 16 minutes were devoted to advertisements?
-
8/6/2019 Statistics for College Students-part 2
8/43
Hypothesis statement that is formulated which
cannot be accepted to be true unless otherwise
proven
Assumption statement that is formulated andaccepted to be true without the necessity of a
proof. It serves as the springboard of the study
Types of hypothesis null and alternative
Module 6 - Hypothesis Testing
-
8/6/2019 Statistics for College Students-part 2
9/43
-
8/6/2019 Statistics for College Students-part 2
10/43
Other key concepts
Types of test one-tailed and two-tailed
Level of significance alpha (0.01, 0.05, 0.1)
Observed value the obtained computed
value based on the data gathered
Critical value the value obtained from the
table; the value that divides the distribution of
the test into the rejection and the acceptanceregion
-
8/6/2019 Statistics for College Students-part 2
11/43
Critical Values of z
Test Type Level of significance
0.05 0.01One-tailed 1.645 2.33
Two-tailed 1.96 2.575
-
8/6/2019 Statistics for College Students-part 2
12/43
Steps to performing hypothesis testing
1. Write the original claim and identify whether it is the nullhypothesis or the alternative hypothesis.
2. Write the null and alternative hypotheses. Use thealternative hypothesis to identify the type of test.
3. Write down all information from the problem.
4. Determine the appropriate test statistics. Find thecritical value using the tables.
5. Compute the test statistic.
6. Make a decision to reject or fail to reject the nullhypothesis. A picture showing the critical value and teststatistic may be useful.
7. Write the conclusion.
Steps in hypothesis testing
-
8/6/2019 Statistics for College Students-part 2
13/43
Bivariate Distribution
Involving two variables
- Significant difference (t-test, z-test, ANOVA)
- Significant relationship (Pearson r,
Spearmans rho, Chi-square and other
correlational techniques)
-
8/6/2019 Statistics for College Students-part 2
14/43
Testing significant difference using
parametric test (two groups)
t test
distribution is normal
homogeneous variance
sample std. deviation is knownn < 30
z test
distribution is normal
homogeneous variance
population std. deviation is known
n 30
-
8/6/2019 Statistics for College Students-part 2
15/43
t-test & z test
(sample vs population)
1;)(
=
= ndf
s
nxt
nxz )( =
-
8/6/2019 Statistics for College Students-part 2
16/43
Decision rules
Observed value < Critical value
OR
p-value > level of significance()
Accept the null hypothesis (Theres not enoughevidence to reject the null hypothesis)
Observed value Critical value
OR
p-value < level of significance()
Reject the null hypothesis
-
8/6/2019 Statistics for College Students-part 2
17/43
Sample Problems
1. A certain rice miller claims that the average weightof a cavan of rice is 50 kilograms with a standarddeviation of 5 kilograms. A retailer sampled 20cavans of this rice and got an average weight of
46.6 kilograms. Is the claim of the rice miller validusing 5% level of significance?
2. A standardized test was administered to thousandsof pupils with a mean score of 85 and a standarddeviation of 8. A random sample of 50 pupils were
given the same test and showed an average scoreof 83.20. Is there evidence to show that this grouphas a lower performance than the ones in general at0.05 level of significance?
-
8/6/2019 Statistics for College Students-part 2
18/43
t-test & z test
(two-sample groups)
y
y
x
x
yx
nn
yxz22
)()(
+
=
2;
11
2
)1()1(
)()(
22+=
+
+
+
= yx
yxyx
yyxx
yxnndf
nnnn
SnSn
yxt
-
8/6/2019 Statistics for College Students-part 2
19/43
Sample Problems
1. A random sample of 20 newly-born baby boys showed an average weight of7.4 pounds while a sample of 25 newly-born baby girls showed a meanweight of 6.5 pounds. If the variance of all newly-born babies is 1.25 pounds,can we say that newly-born baby boys are heavier than newly-born babygirls?
2. Two hamburger stores were compared in terms of the number of orders of
hamburger per day. The results of the ten-day observation were as follows:
Using the 0.05 level of significance, test if there is a significant difference inthe number of orders of hamburger from the two stores.
Day 1 2 3 4 5 6 7 8 9 10
Nutri 148 126 103 169 135 152 144 124 132 128
Deli 150 127 125 152 129 146 153 118 126 119
-
8/6/2019 Statistics for College Students-part 2
20/43
t-test for paired observations
(dependent groups)
1;
)1(
22 =
=
ndf
nn
dnd
d
t
-
8/6/2019 Statistics for College Students-part 2
21/43
Sample Problem
A certain diet pill was developed by a pharmaceutical company. To testthe efficacy of the said pill, 10 randomly selected individuals were selected.The results of the study are presented in the following table:
Use hypothesis testing to determine whether the diet pill is effective or not.
SUBJECTS WEIGHT BEFORE WEIGHT AFTER
1 148 150
2 142 1393 131 130
4 128 128
5 121 123
6 118 115
7 120 119
8 152 151
9 112 110
10 110 105
-
8/6/2019 Statistics for College Students-part 2
22/43
Module 7 - Correlation and Simple
Linear Regression
1. Pearsons product-moment correlation
coefficient (Pearson r)
2. Spearmans rank-order correlation
coefficient (Spearmans )
P d l i
-
8/6/2019 Statistics for College Students-part 2
23/43
Pearsons product-moment correlation
coefficient
( ) ( )
( )[ ] ( )[ ]
=
2222yynxxn
yxx ynr
-
8/6/2019 Statistics for College Students-part 2
24/43
Sample Problem
Determine if there is a relationship between the number of years of service
and the employees monthly salary based on the data gathered from a
certain company.
No. of yrs. of service Monthly salary (in T)
5 25
7 28
8 29
10 32
12 34
2 18
11 32
15 35
20 40
25 50
-
8/6/2019 Statistics for College Students-part 2
25/43
Legend for Pearsons r and
Spearmans rho
0.00 0.3 Little or no positive correlation
0.31 0.5 Low positive correlation
0.51 0.7 Moderately positive correlation
0.71 0.9 High positive correlation
0.91 1.0 Very high positive correlation
Overview
-
8/6/2019 Statistics for College Students-part 2
26/43
Testing the significance of the
relationship
2,1
2
2=
=
ndfr
n
rt
Where:
r = the correlation coefficientn = no. of pairs
df = degrees of freedom
-
8/6/2019 Statistics for College Students-part 2
27/43
Key concepts to remember
Correlation simply describes a relationship
between two variables. It does not explain
why the two variables are related.
Specifically, a correlation should not andcannot be interpreted as proof of a cause-
and-effect relationship between the two
variables.
The value of a correlation can be affectedgreatly by the range of scores represented in
the data.
-
8/6/2019 Statistics for College Students-part 2
28/43
Key concepts to remember
One or two extreme data points, often called
outliers, can have a dramatic effect on the
value of a correlation.
A correlation measures the degree ofrelationship between two variables.
The values ofrrange from -1.00 to +1.00.
The valuer2
is called thecoefficient of
determination because it measures the
proportion of variability in one variable that
can be determined from the relationship with
the other variable.
S k d l ti
-
8/6/2019 Statistics for College Students-part 2
29/43
Spearmans rank-order correlation
coefficient (Spearmans )
)1(
6
1 2
2
=
nn
d
Where:n = no. of pairs
d = difference between the ranks of each pair
Statistics to test the
-
8/6/2019 Statistics for College Students-part 2
30/43
Statistics to test the
significance of
1= nz
-
8/6/2019 Statistics for College Students-part 2
31/43
Sample Problem
Seven instructors are rated by
freshmen and sophomore
students on clarity of
presentation and the results
are tabulated. What is the
Spearman rho for the
following?
Instructor Freshmen Sophomore
1 44 58
2 39 42
3 36 18
4 35 22
5 33 31
6 29 38
7 22 38
Where a is the intercept and b is the slope or the incremental change in Y when X changes by one unit
-
8/6/2019 Statistics for College Students-part 2
32/43
Regression Analysis
Regression Analysis is a statistical technique used to
describe relationships among variables. This
relationship is expressed in a form of mathematical
equation. The simplest case of such a relationship is
when there is a single independent variable (X)explaining the dependent variable (Y) in a linear
fashion.
bxay +=
Where a is the intercept and b is the slope or the incremental change in Y when X changes by one unit.
Where a is the intercept and b is the slope or the incremental change in
Y when X changes by one unit.
-
8/6/2019 Statistics for College Students-part 2
33/43
Key concepts to remember
Regression analysis is the most widely used technique of
Multivariate Analysis with applications across all types of
problems and all disciplines.
It is a statistical technique that is concerned with describing and
evaluating the relationship between a metric variable called
dependent variable and one or more metric or non-metric
variables called independent variables orregressors.
It attempts to predict the change in the dependent variable as a
result of changes in the independent variables. In addition, the
analysis of the independent variables allows assessment of their
respective explanatory impact on the dependent variable.
NSAT Achievement
G d
SUMMARY OUTPUT
-
8/6/2019 Statistics for College Students-part 2
34/43
Grade
78 82
79 83 Regression Statistics
80 82 Multiple R 0.95
92 91 0.90
93 94 Adjusted 0.8866686 85 Standard
Error1.45918
88 87 Observations 8
86 86
ANOVA
df SS MS F Significance F Regression 1 118.7248 118.725 55.7605 0.0003
Residual 6 12.77516 2.1292
Total 7 131.5
Coefficients StandardError
t Stat P-value Lower 95%
Upper95%
Intercept 25.46 8.16 3.12 0.02 5.50 45.42
NSAT 0.71 0.10 7.47 0.00 0.48 0.95
ab
-
8/6/2019 Statistics for College Students-part 2
35/43
Using the formula:
Predicted Achievement grade = 25.46 + 0.71 * 70
= 75.38
The value of r-squared indicates the percentage ofrelationship between the NSAT scores and the
achievement grade. Thus, there is 90.29 %
association.
-
8/6/2019 Statistics for College Students-part 2
36/43
Module 8 Selected Nonparametric
Statistics
Chi-square test (2)
Mann-Whitney U test
Kruskal-Wallis H test
-
8/6/2019 Statistics for College Students-part 2
37/43
Chi-square Test
Significant relationship
Test of goodness-of-fit
Test of independence
( ))1)(1(;
2
2=
= crdf
EF
EFOF
Where:
OF= observed frequency
EF = expected frequency
Sample problem:
-
8/6/2019 Statistics for College Students-part 2
38/43
Red Yellow Green Blue Total
Introvert 10 3 15 22 50
Extrovert 90 17 25 18 150
100 20 40 40 200
Sample problem:
Suppose we want to find out if there is a relationship between the students
color preference and personality. The data may be illustrated in the
contingency table below:
Observed frequencies:
Row
totals
(fr)
Column totals (fc)
Grand total (n)
-
8/6/2019 Statistics for College Students-part 2
39/43
To determine the expected frequencies for each cell, we
use the formula below:
n
ffef cr
))((=
Whereef = expected frequency
fr = total frequencies of the corresponding row
fc = total frequencies of the corresponding column
n = grand total
-
8/6/2019 Statistics for College Students-part 2
40/43
Testing significant difference using
nonparametric test (two groups)
Mann-Whitney U test
+
+= 111
2112
)1(R
NNNNU
+
+= 222
2122
)1(R
NNNNU
Where:
U is the lower value between U1 and U2.
-
8/6/2019 Statistics for College Students-part 2
41/43
Example:
Treatment Control
4 20
7 17
1 3
12 15
2 7
2 12
9 18
-
8/6/2019 Statistics for College Students-part 2
42/43
Testing significant difference
(3 or more groups)
Parametric test (distribution is normal)
ANOVA (Analysis of variance)
Nonparametric test
Kruskal-Wallis
1),1(3
)1(
12
1
2 =+
+
= =
kdfnR
nn
Hk
i
i
-
8/6/2019 Statistics for College Students-part 2
43/43
To compare four bowling balls, a professional bowler bowls five
games with each ball and gets the following scores:
Bowling ball A 221 232 207 198 212
Bowling ball B 202 225 252 218 226
Bowling ball C 210 205 189 196 216
Bowling ball D 229 192 247 220 208
Use the H test at 0.05 level of significance to test the nullhypothesis that on the average the bowler performs equallywell with the four bowling balls.