chapter 11 sampling foundations - boun.edu.trweb.boun.edu.tr/ulas.akkucuk/ad585/ad585-part1c.pdf•...
TRANSCRIPT
1
Copyright © by Houghton Mifflin Company, Inc. All rights reserved First Edition
Chapter 11
Sampling
Foundations
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-2
Chapter Objectives • Define and distinguish between sampling
and census studies
• Discuss when to use a probability versus a
nonprobability sampling method and
implement the different methods
• Explain sampling error and sampling
distribution
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-3
Chapter Objectives • Construct confidence intervals for
population means and proportions
• List the factors to consider in determining
sample size, and compute the required
sample size to achieve a specific degree of
precision at a desired confidence level
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-4
National Poll –Sample Size
• Harris Poll
– A weekly study that monitors the reactions of
the American public to a variety of economic,
political, and social issues
• Sample Size
– Based on a nationally representative telephone
survey of 1,000 adults age 18 or over
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-5
AC Nielsen SCANTRACK Index
• Offers valuable scanner-based sales and brand
share data on a regular basis to manufacturers of a
wide variety of consumer products such as food,
drugs, cosmetics
• Sample Size
– Sales and brand share estimates are gathered weekly
from a representative sample of more than 4,800 stores
representing over 800 retailers in 50 major markets
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-6
Sampling vs. Census Studies
• A census study draws inferences from the
entire body of units of interest
• A sample study, drawing inferences from a
sample drawn from the population
2
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-7
Advantages of Sampling
• Low Cost
• Reduced time
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-8
Sampling and
Nonsampling Errors
• Sampling error: The difference between a statistic
value that is generated through a sampling
procedure and the parameter value, which can be
determined only through a census study
• Nonsampling error: Any error in a research study
other than sampling error (which arises purely
because a sample, rather than the entire
population, is studied)
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-9
Minimizing Sampling Errors
• Increase the sample size
• Use a statistically efficient sampling plan
• Make the sample as representative of the
population as possible
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-10
Types of Nonsampling Errors
• Nonsampling Error
– Any error other than sampling error
• Sampling Frame Error
– Sampling frame not being representative of ideal population
• Nonresponse Error
– Final sample not representative of planned sample
• Data Error
– Distortions in collected data and mistakes in data coding, analysis, or interpretation
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-11
Potential Causes of Sampling
Frame Errors
• Incomplete sampling frame over-represents
some population segments and
underrepresents others
• Sampling frame contains irrelevant units
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-12
Minimizing Sampling Frame
Errors
• Start with a complete sampling frame
• Modify the sampling frame to make it
representative of the ideal population using
plus-one dialing in telephone surveys
3
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-13
Potential Causes of Nonresponse
Errors
• Mail surveys/Internet Surveys
– Certain types of sample units being more likely
to respond than others
• Telephone and personal interview surveys
– Person not-at-home problem and respondent
refusal problem
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-14
Minimizing Nonresponse Errors
• Mail surveys: increase response rates through the use of incentives, follow-up mailings, etc.
– Caution:increase in response rate per se may not reduce non-response error
• Telephone and personal interview surveys: make call-backs and spread out the time blocks during which interviews are conducted
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-15
Potential Causes of Data Errors
• Respondents’ reluctance/ inability to give
accurate answers
• Ill-trained interviewers
• Unscrupulous interviewers
• Poorly designed questionnaire
• Mistakes in coding data
• Erroneous analysis
• Incorrect/ inappropriate interpretation of results
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-16
Exhibit 11.1 Types and Potential Causes of
Nonsampling Errors Telephone survey Online survey
Total population of interest
Portion of population that has access to the medium (telephone, online)
Portion of population that has access and volunteers (does not refuse,opts in)
Portion of population that has access, volunteers and completes(responds, does not opt out)
Source:Adapted from Thomas W. Miller, “Can We Trust the Data of Online Research,” Marketing Research (Summer 2001),
Vol. 13, No.2, p. 31.
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-17
When Census Studies Are
Appropriate
• The feasibility condition
– Whenever a population is relatively small or
can be accessed easily
• The necessity condition
– When the population units are extremely varied
and each population unit is likely to be very
different from all the other units
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-18
Probability and Nonprobability
Sampling
• Probability sampling is an objective
procedure in which the probability of
selection is known in advance for each
population unit
• Nonprobability sampling is a subjective
procedure in which the probability of
selection for each population unit is
unknown beforehand
4
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-19
Sampling methods
Probability sampling Nonprobability sampling
Stratified sampling Simple random
sampling Cluster Sampling
Judgment
sampling
Convenience sampling Quota sampling
Proportionate stratified
random sampling
Disproportionate
Stratified random sampling
Simple cluster sampling Systematic sampling
Exhibit 11.3 Classification of Sampling Methods
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-20
Probability Sampling Methods
• Simple Random Sampling
• Stratified Random Sampling
• Cluster Sampling
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-21
Gallup Poll: USA
• Identify and describe the population that a given
poll is attempting to represent
• Choose or design a method that will enable Gallup
to sample the target population randomly
• Random Digit Dialing(RDD)--a procedure that
creates a list of all possible household phone
numbers in America and then selects a sub-set of
numbers from that list for Gallup to call
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-22
Simple Random Sampling
• Every possible sample of a certain size
within a population has a known and equal
probability of being chosen as the study
sample
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-23
Stratified Random Sampling
• Two Types of Stratified Random Sampling:
– Proportionate Stratified Random Sampling
– Disproportionate Stratified Random Sampling
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-24
Proportionate Stratified Random
Sampling
• Sample consists of units selected from each
population stratum in proportion to the total
number of units in the stratum
5
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-25
Kirkwood University-
Proportionate Stratified Random
Sampling
• Administrators of Kirkwood University wanted to
determine the attitudes of their students toward
various aspects of the university
• They selected a proportionate stratified random
sample of 500 students for conducting the attitude
survey
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-26
Table 11.2 Proportionate Allocation of Total
Sample of Kirkwood University Students
500 10,000 Total
100 2,000 Seniors
100 2,000 Juniors
150 3,000 Sophomores
150 3,000 Freshman
Number of
Sample Units
Allocated
Number of
Population Units
Population Strata
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-27
Gallup Poll on Sampling: China
• 12,500 counties, cities, and urban districts were
divided into 50 strata based on their geographic
location, degree of economic development, and
proportion of non-agricultural population
• One primary sampling unit (PSU), consisting of
either a county or a city, was selected from each
stratum based on probability proportional to
population size
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-28
Gallup Poll on Sampling: China
(Cont’d)
• Within each PSU, the populations of all
neighborhoods and villages were compiled.
From this listing, four neighborhoods or
villages were selected proportional to size.
• From each of these four neighborhoods or
villages, five households were selected at
random
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-29
Gallup Poll on Sampling: China
(Cont’d)
• One respondent was selected from each of
the selected households, ensuring proper
representation in the sample of all age
groups by both genders
• The respondent to be interviewed is then
selected according to a prescribed
systematic procedure
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-30
Gallup Poll on Sampling: China (Cont’d)
• If the designated respondent was not at home, or could not be reached, a second or, if needed, a third adult family member was selected systematically from among the household members remaining on the list
• If contact with the designated respondent could not be made after a total of three separate visits to the household, an interview with a respondent in a substitute household in the same locality was permitted
• Two substitute households were kept in reserve for each five assigned households in the interviewing area
6
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-31
Gallup: India
• Design of the sample:
– GALLUP INDIA PVT. LTD. interviewed a
total of 5,122 Indian adults age 18 years and
over (one per household) in late March and
early April 1996
– Nationwide survey involved in-person
interviews in 144 villages and 84 towns and
cities across India
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-32
Gallup: India (Cont’d)
• Urban Sample (design = 1,600 interviews)
– Three hundred eighty districts in India (excluding those in Jammu-Kashmir, the northeastern states, and other difficult-to-access areas such as the Andaman and Nicobar Islands) were classified into 20 strata based on their geographical (zonal) location and urban population
– Across these 20 strata, 40 districts were chosen • In each selected district, two towns were picked on the basis of
probability proportional to size
– From the selected towns, 2 colonies were selected randomly, and 10 households were selected from each colony
• From each household,one respondent was chosen i.e., either male or female above 18 years of age
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-33
Gallup: India (Cont’d)
• Rural Sample (design = 1,440 interviews)
– 40 districts were chosen for the urban sample, the
remaining 340 districts were divided into 12 strata
based on their geographical (zonal) location and rural
population
• On average, two districts were selected from each stratum.
– From each household, one respondent was chosen on
the same criterion of demographics, i.e., either male or
female above 18 years of age
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-34
Gallup: India (Cont’d)
• Urban Oversample (design = 2,000 interviews, 400 per metro)
– The urban oversample represented five of the country’s major metropolitan areas: Bombay, Delhi, Calcutta, Madras,and Bangalore
– Within each metropolitan area, an average of 13 electoral wards were chosen on a probability proportional to size basis
– Within each electoral ward, four colonies were randomly selected
• In each colony,eight households were randomly selected
– One respondent was interviewed per household
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-35
Gallup: India (Cont’d)
• Results are projectable to within ±3 percent
for India as a whole, ±2 percent for urban
India in general, and ±7 percent for each of
India’s five largest cities
• Urban and rural India were considered as
separate domains for purposes of sampling
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-36
Disproportionate Stratified
Random Sampling
• Sample consists of units selected from each
population stratum according to how varied
the units are within the stratum
7
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-37
Exhibit 11.4 Disproportionate Stratified Random
Sampling Used by A.C. Nielsen Company
Chain
(Includes Convenience
Chains)
Large Independent
(Over $500,000)
Medium Independent
($100,000 - $500,000)
Small Independent
(Under $100,000)
25.2%
12.8%
32.6%
29.4%
47.9%
24.9%
17.6%
9.6%
$2,445,000
$1,700,000
$234,000
$55,000
1 out of
every 39
1 out
of
every
69
1 out
of
every
248
1 out of
every
360
In Universe
Percent of stores
In NFL Sample Average Store Size Take Ratio
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-38
Cluster Sampling
• Clusters of population units are selected at
random and then all or some units in the
chosen clusters are studied
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-39
Systematic Sampling Steps
• An organized procedure, selecting a sample from a list containing all the population units
• Steps:
1) Determine the sampling interval, k:
number of units in the population
k = ------------------------------------------
number of units desired in the sample
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-40
Systematic Sampling Steps
(Cont’d)
• Steps (cont’d):
• 2) Choose randomly one unit between the
first and kth units in the population list
• 3) The randomly chosen unit and every kth
unit thereafter are designated as part of the
sample
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-41
Practical Considerations:
Probability Sampling Methods
• Probability sampling techniques are
generally used by large commercial
marketing research firms that maintain
national samples or panels that can be
readily accessed for conducting periodic
research surveys
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-42
Nonprobability Sampling
Methods
• Convenience Sampling
• Judgment Sampling
• Quota sampling
8
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-43
Convenience Sampling
• Researcher's convenience forms the basis for selecting a sample of units
– The administrators of a college have announced a sharp increase in tuition fees for the next year.
– A TV reporter covering this news item is shown standing on campus talking to several students, one at a time, about their reactions to the proposed tuition fee increase.
– TV Reporter says: “While some of the students feel that the 10 percent fee hike is justified, most of them consider it to be unfair.”
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-44
Judgment Sampling
• A procedure in which a researcher exerts some effort in selecting a sample that he or she believes is most appropriate for a study
• Example:
– The administrators of a college have announced a sharp increase in tuition fees for the next year
– A judgment sample of student officers may be more representative than a convenience sample of students
– The researcher should be knowledgeable about the ideal population for a study
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-45
Quota Sampling
• Involves sampling a quota of units to be selected from each population cell based on the judgment of the researchers and/or decision makers
• Steps:
– 1) Divide the population into segments (referred to as
cells) based on certain control characteristics
– 2) Determine the quota of units for each cell (quotas
are determined by the researchers and/or decision
makers)
– 3) Instruct the interviewers to fill the quotas assigned
to the cells
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-46
Quota Sampling Plan for the
Newspaper Subscriber Survey
Geographic
Segment Male Female
I 30 30
II 30 30
III 30 30
IV 30 30
V 30 30
Total sample size = 300
Gender
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-47
Quota Sampling Plan for a Survey of Attitudes
Toward Social Welfare Programs
Highest Education Level
Less than High School Some College
Age High School Diploma College Degree
18-30 100 100 100 100
31-45 100 100 100 100
46-60 100 100 100 100
Over 60 100 100 100 100
Total sample size = 1600
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-48
Parameter & Statistic
• Parameter
– The actual, or true, population mean value or
population proportion for any variable
• income, product ownership
• Statistic
– An estimate of a parameter from sample data
9
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-49
Sampling Error
• Sampling Error = Parameter Value -
Statistic Value
• Difference between a statistic value that is
generated through a sampling procedure and
the parameter value, which can be
determined only through a census study
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-50
Sampling Distribution
• Representation of the sample statistic values
obtained from every conceivable sample of
a certain size chosen from a population by
using a specified sampling procedure along
with the relative frequency of occurrence of
those statistic values
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-51
Sampling Distribution
µX SX
C
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-52
Table 11.4 Expenditures for Eating Out for
a Hypothetical Population
500 10
450 9
400 8
350 7
300 6
250 5
200 4
150 3
100 2
50 1
Annual expenditure for
eating out($)
Family Number
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-53
Table 11.5 Partial List of Possible
Samples and Sample Means
475 9,10
375 5,10;6,9;7,8
275 1,10;2,9;3,8;4,7;5,6
175 1,6;2,5;3,4
75 1,2
Sample Mean Values
($)
Samples of Two
Families
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-54
Exhibit 11.5 Sampling Distribution (Bar
Chart) for Simple Random Samples of Two
Units
Sample Mean Values ($)
475
450
425
400
375
350
325
300
275
250
225
200
175
150
125
100
75
6/45
5/45
4/45
3/45
2/45
1/45
0
10
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-55
Exhibit 11.6 Sampling Distribution
Shown as a Histogram
Sample Mean Values
500.0450.0400.0350.0300.0250.0200.0150.0100.0
Fr
q
u
e
n
c
y
o
f
O
c
cu
r
r
e
n
c
e
Population mean value
Normal
probability
distribution
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-56
Central Limit Theorem
Distribution Mean Standard
Deviation
Population
Sample x S
Sampling x Sx
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-57
Confidence Estimation for
Interval Data
n = number of units in the sample
X = sample mean value
Sx = s / n
S = standard deviation
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-58
Confidence Estimation for
Interval Data (Cont’d)
• Given n = 100, x = 1,278 units, and s = 399 units
• To Construct 95 percent confidence interval
s 399
sx = --- = ----- = 39.9 units
n 100
• The 95 percent confidence interval is
x ± 1.96 sx = 1,278 ± (1.96)(39.9) = 1,278 ±
78.204 = 1,278 ± 78,approximately
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-59
Confidence Estimation for
Interval Data (Cont’d)
• Interpretation
– From the sample data, we can be 95 percent
confident that the average annual sales of men's
suits, across all men's clothing stores in the
population, are between 1,200 and 1,356 units
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-60
Finding Confidence Intervals for
Population Proportions = true population proportion (i.e., the parameter value) Confidence Intervals for Population proportion: p - 1.96sp p + 1.96sp p = proportion obtained from a single sample (i.e., the statistic value) sp = estimate of the standard error of the sample proportion p =number of sample units having a certain feature total number of sample units (i.e., n) sp = p (1 - p) n
11
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-61
Finding Confidence Intervals for
Population Proportions (Cont’d)
Given n = 100 and p = .64. To Construct a 95 percent
confidence interval for the population proportion
sp = p (1 - p)
n
(.64)(.36) = .048
100
The 95 percent confidence interval is
p ± 1.96 sp = .64 ± (1.96)(.048)
= .64 ± .09408
= .64 ± .09, approximately.
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-62
Finding Confidence Intervals for
Population Proportions (Cont’d)
• Interpretation
– This confidence interval can also be expressed
in percentage terms: 64% ± 9%
– In other words, we can be 95 percent confident
that between 55 and 73 percent of all grocery
stores in the city carry potted plants
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-63
Factors Influencing Sample Size
• Desired precision level
• Desired confidence level
• Degree of variability
• Resources available
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-64
Methods for Determining Sample
Size
• The desired precision level
• The desired confidence level
• An estimate of the degree of variability in
the population, expressed in the form of a
standard deviation
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-65
Sample Size Estimation
• H-> Desired precision level
• q-> Desired confidence level
• S-> Sample Standard deviation
• N-> Population mean
zq2 s
2
N = ------
H2
zqs
H = ----
n
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-66
Sample Size Estimation (Cont’d)
• A marketing manager of a frozen-foods firm
wants to estimate within ±$10 the average annual
amount that families in a certain city spend on
frozen foods per year and have 99 percent
confidence in the estimate
• He estimates that the standard deviation of annual
family expenditures on frozen foods is about $100
• How many families must be chosen for this
study?
12
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-67
Sample Size Estimation (Cont’d)
H = $10, s = $100, and zq = 2.575
(corresponding to a confidence level of 99 percent)
n = (2.575)2(100)2 = 663 families,approximately
(10)2
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-68
Determining Sample Size
• A sporting goods marketer wants to estimate the proportion of tennis players among high school students in the United States
• The marketer wants the estimate to be accurate within ±.02 and wants to have 95 percent confidence in the interval estimate
• A pilot telephone survey of 50 high school students showed that 20 of them played tennis. Estimate the required sample size for the final study from the given data
• What should the sample size be if the desired precision and confidence levels are to be guaranteed?
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-69
Determining Sample Size (Cont’d) H = .02 and zq = 1.96. p = 20/50 =0.4
s = (20/50)(1 - 20/50) = (.4)(.6) = .24
z2q s
2 (l.96)2(.24 )2
n = ------------ = ------------------
H2 (.02)2
= 2,305 students, approximately
The maximum sample size is
.25z2q
nmax = ------------ = 2,401 students H2
Copyright © by Houghton Mifflin Company, Inc. All rights reserved First Edition
Chapter 12
Quality
Control and
Initial Analysis
of Data
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-71
Chapter Objectives
• Define editing and distinguish between a field edit
and an office edit
• Define coding and outline the steps it involves
• Compute measures of central tendency and
dispersion of the data for each variable in a data
set
• State the potential uses of frequency distribution
or one- way tables
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-72
Data Analysis at Rockbridge
Associates: Data Integrity • Data integrity is the foundation for successful
marketing research
• Rockbridge ensures integrity in the collection and processing of the data by a number of quality control checks for
– mail surveys
– telephone surveys
– web surveys
• Rockbridge ensures data integrity in how the results are interpreted and explained to management
13
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-73
Editing
• Editing is the process of examining
completed data collection forms and taking
whatever corrective action is needed to
ensure the data are of high quality
– Preliminary or field edit
– Final or office edit
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-74
Field Edit
• A field edit, or preliminary edit, is a quick
examination of completed data collection forms,
usually on the same day they are filled out
• Objectives
– Ensure that proper procedures are being followed in
selecting respondents, interviewing them, and recording
their responses
– Fix fieldwork deficiencies before they turn into major
problems
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-75
Office Edit
• A final, or office edit, verifies response
consistency and accuracy
– Makes necessary corrections
– Determines whether some or all parts of a data
collection form should be discarded
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-76
What Is Wrong With this
Response…
• A respondent said he was 18 years old but
indicated that he had a Ph.D. when asked
for his highest level of education.
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-77
Editing Can Help Uncover
• Improper field procedures
• Incomplete interviews
• Improperly conducted interviews
• Technical problems with the questionnaire or interview
• Respondent rapport problems
• Consistency problems that can be isolated and reconciled
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-78
Improper Field Procedures
• Wrong questionnaire form used
• Interview inadvertently not taken
14
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-79
Incomplete Interviews
• Questions not asked
• Directions not followed (proper segments of
the questionnaire were not administered)
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-80
Improperly Conducted Interviews • The wrong respondent interviewed (e.g., son
instead of father)
• Questions misinterpreted by interviewer or respondent
• Evidence of bias or influencing of answers.
• Failure to probe for adequate answers or the use of poor probes
• Interviewer's illegible writing and/or style.
• Interviewer recorded information which identified a respondent whose anonymity should have been protected
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-81
Improperly Conducted Interviews
(Cont’d) • Interviewer apparently does not understand what
type of responses constitute an answer to the actual question asked
• Interviewer does not understand what the objective of the question is and thus accepts an improper frame of reference for the respondent's answer
• Other evidence of need for training or instructions to be given to interviewer
– failure to write down probes, wrong abbreviations, failure to follow directions
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-82
Technical Problems With the
Questionnaire or Interview • Space was not provided for needed information
• The presence of unanticipated or unusually frequent extreme responses to questions, indicating a possible need for rewording of certain questions
• Inappropriate or unworkable interviewer instructions not detected in the pretest
• The order in which questions were asked introduces confusion, resentment, or bias into the respondent's answers
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-83
Respondent Rapport Problems
• Frequent refusal to answer certain questions.
• Reports of abnormal termination of the interview
(or presence of hostility) due to sensitive questions
• Evidence that respondent and interviewer are
playing the "game" of "What answer do you want
me to give?"
• Evidence that the presence of other people in the
interview situation is causing problems
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-84
Consistency Problems That Can
Be Isolated and Reconciled • Contradictory answers
– reports no savings in one section of the interview but reports interest from bank accounts in another section
• Misclassification
– mortgage debt improperly reported as installment debt
• Impossible answers
– reports paying $600 for a new Edsel in 1970--the car should have been recorded as a "used" car; or weekly income reported on the income-per-month line
15
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-85
Consistency Problems That Can Be
Isolated and Reconciled (Cont’d)
• Unreasonable (and probably erroneous) responses
– Respondent reports borrowing $2000 for two years to
buy a car but reported monthly payments multiplied by
24 months are less than $2000
– Respondent reports that the house value is $90,000
while income is $2000 per year and the respondent
claims less than a high school education
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-86
Preventing Errors
• Careful planning before fieldwork begins
• Automating data entry
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-87
Coding
• Coding broadly refers to the set of all tasks
associated with transforming edited responses into
a form that is ready for analysis
• Steps
– Transforming responses to each question into a set of
meaningful categories
– Assigning numerical codes to the categories
– Creating a data set suitable for computer analysis
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-88
Transforming Responses into
Meaningful Categories
• A structured question is pre-categorized
• Responses to a nonstructured or open-ended
question to be grouped into a meaningful
and manageable set of categories
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-89
The Best Way to Treat "Don't
Know" Responses
• Infer an actual response –dubious validity
• Classify the "don't know's" as a separate
response category for each question
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-90
Missing-Value Category
• A missing value can stem from
– A respondent's refusal to answer a question
– An interviewer's failure to ask a question or
record an answer or a "don't know" that does
not seem legitimate
• Best way to treat missing value responses
– Sound questionnaire design
– Tight control over fieldwork
16
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-91
Assigning Numerical Codes
• Assign appropriate numerical codes to
responses that are not already in quantified
form
• To assign numerical codes, the researcher
should facilitate computer manipulation and
analysis of the responses
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-92
Coding Multiple Response
• Which of the following countries have you visited during the past 12 months?
________Canada
________England
________France
________Germany
________Japan
________Mexico
• Need six variables, each relating to a specific country and having two possible values --for example, 1= “No” and 2 = “Yes”
• Six columns must be set aside in the data spreadsheet to record responses to this question
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-93
Multiple Response Question –
Rank Order Question • Please rank the following fast-food restaurants by
placing a 1 beside the restaurant you think is best overall, a 2 beside the restaurant you think is second best, and so on. __________Burger King __________McDonald's __________Wendy's __________Whataburger
• This question requires as many variables (and columns) as there are objects to be ranked
• 4 separate variables are needed
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-94
Creating a Data Set
• Organized collection of data records
• Each sample unit within the data set is called a Case or Observation
• Structure of a Data Set
– The number of observations = n
– The total number of variables embedded in the questionnaire is m, then
• Data set = n x m matrix of numbers
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-95
Table 12.3 Structure of a Data Sheet
Variables
Observation 1 2 …… j …… m
1 x 11 x 12 x 1j x 1m
2 X 21 X 22 X 2j X 2m
…
i X i1 X i2 X ij X im
…
n X n1 X n2 X nj X nm
Respondent 1’s
response to variable 1.
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-96
Preliminary Data Analysis:
Basic Descriptive Statistics
• Preliminary data analysis examines the
central tendency and the dispersion of the
data on each variable in the data set
17
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-97
Measures of Central Tendency and
Dispersion for Different Types of Variables
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-98
Measurement Level of Data
Pertaining to Variable–Nominal
• Measures of Central Tendency
– Mode: Most frequently occurring response
• Measures of Dispersion
– Strictly speaking, the concept of dispersion is
not meaningful for nominal data
– An idea about the distribution of responses can
be obtained by examining their relative
frequencies of occurrence
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-99
Measurement Level of Data
Pertaining to Variable –Ordinal
• Measures of Central Tendency
– Median: 50th percentile response
• Measures of Dispersion
– Range: Defined by the highest and lowest
response values
– Interquartile range: Difference between the
75th and 25th percentile responses
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-100
Measurement Level of Data
Pertaining to Variable– Interval
• Measures of Central Tendency
– Mean: Arithmetic average of response values
• Measures of Dispersion
– Standard deviation: As defined in Chapter 9
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-101
Measurement Level of Data
Pertaining to Variable– Ratio
• Measures of Central Tendency
– Mean: Arithmetic average of response values
• Measures of Dispersion
– Standard deviation: As defined in Chapter 9
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-102
Mode
• The value that occurs most frequently
18
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-103
Table 12.5 How Long Have You Been
Using the Services of National? –
Computing Mode
Assigned
Count/
Length of Service
(USE) Value Frequency
Less than 1 year 1 36
1 to less than 2 years 2 16
2 to less than 5 years 3 26
5 years or more 4 193 (Mode = 4 most occurring value)
Total 271
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-104
Table 12.5 How Long Have You Been
Using the Services of National? –
Computing Mode (Cont’d)
In SPSS: 1. Select ANALYZE;
2. Click DESCRIPTIVE STATISTICS,
3. Select FREQUENCIES,
4. Move the variable “USE” to the Variable(s) box,
5. Click STATISTICS box,
6. Select MODE,
7. Click CONTINUE, and
8. Click OK.
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-105
Table 12.5 How Long Have You Been Using the
Services of National? –Computing Mode (Cont’d)
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-106
Table 12.5 How Long Have You Been
Using the Services of National? –
Computing Mode (Cont’d)
1= Less than a year
2 = 1 to less than 2 years
3 = 2 to less than 5 years
4 = 5 years or more
most frequently occurring value
= mode = 4
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-107
Median
• The observation below which 50 percent of
the observations fall
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-108
Table 12.6 Length of Time Service Used –
Responses from 20 Customers How long have you been using the services of National?
4 3 4 1 4 4 4 4 4 4 3
4 4 3 4 4 4 3 1 1
1= Less than a year; 2 = 1 to less than 2 years; 3 = 2 to less than 5 years;
4 = 5 years or more
Arranging the 20 values in ascending order:
1 1 1 3 3 3 3 4 4 4 4
4 4 4 4 4 4 4 4 4
Because the sample size = 20, there are two middle values: 4 and 4. The
median is, therefore, the average of the two middle values = 4.
19
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-109
Table 12.7 Computing Median
for Length of Time Service Used
In SPSS:
1. Select ANALYZE;
2. Click DESCRIPTIVE STATISTICS,
3. Select FREQUENCIES,
4. Move the variable “USE” to the Variable(s) box,
5. Click STATISTICS box,
6. Select MEDIAN,
7. Click CONTINUE, and
8. Click OK.
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-110
Table 12.7 Computing Median for
Length of Time Service Used (Cont’d)
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-111
Table 12.7 Computing Median for
Length of Time Service Used (Cont’d)
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-112
Mean
n = Number of units in the sample
xi = data obtained from each sample unit I
x = sample mean value, given by
n
(xi )
---------
i=1 n
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-113
Table 12.8 Overall Quality of Services Provided
by National– Computing Mean
On a scale of 1 to 10, how would you rate the overall quality of service
provided by National?
Extremely Extremely
Poor Good
1 2 3 4 5 6 7 8 9 10
In SPSS
1. Select ANALYZE
2. Click DESCRIPTIVE STATISTICS
3. Select FREQUENCIES
4. Move the variable “OQ- Labeled as OVERALL SERVICE
QUALITY” to the Variable(s) box
5. Click STATISTICS box
6. Select MEAN, MEDIAN, AND MODE
7. Click CONTINUE
8. Click OK
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-114
Table 12.8 Overall Quality of Services Provided by
National– Computing Mean (Cont’d)
20
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-115
Table 12.8 Overall Quality of Services Provided
by National– Computing Mean (Cont’d)
Since the level of measurement is interval
scale, we can compute mean, median and
mode. Since the distribution is skewed to
the left, the mean is influenced by smaller
values than the median. Therefore the mean
is smaller than median. The median is
smaller than mode.
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-116
Measures of Dispersion
• Range
• Variance
• Standard Deviation
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-117
Range
• Range is the difference between the largest
and smallest value
• The simplest measure of dispersion
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-118
Variance
• Variance of a set of data is a measure of
deviation of the data around the arithmetic
mean
(xi –x )2
S2 = ----------
n-1
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-119
Standard Deviation
• Standard deviation is the square root of the
variance
n
(xi –x )2
i=1----------
n-1
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-120
Table 12.9 Overall Quality of Services Provided by
National: Computing Range, Variance, and Standard
Deviation On a scale of 1 to 10, how would you rate the overall quality of service
provided by National?
Extremely Extremely
Poor Good
1 2 3 4 5 6 7 8 9 10
In SPSS
1. Select ANALYZE
2. Click DESCRIPTIVE STATISTICS
3. Select FREQUENCIES
4. Move the variable “OQ- Labeled as OVERALL SERVICE QUALITY”
to the Variable(s) box
5. Click STATISTICS box
6. Select STANDARD DEVIATION, VARIANCE, and RANGE 7. Click CONTINUE
8. Click OK
21
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-121
Table 12.9 Overall Quality of Services Provided
by National: Computing Range, Variance, and
Standard Deviation (Cont’d)
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-122
Standard deviation is square root of variance =
2.33
Variance =5.43
Range = highest value-lowest value = 10-1 = 9
Table 12.9 Overall Quality of Services Provided
by National: Computing Range, Variance, and
Standard Deviation (Cont’d)
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-123
Frequency Distribution: One-
Way Tabulation
• One-way tabulation is a table showing the
distribution of data pertaining to categories
of a single variable
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-124
Table 12.10 Age and Length of
Time Service Used
• In SPSS:
1. Select ANALYZE
2. Click DESCRIPTIVE STATISTICS
3. Select FREQUENCIES
4. Move the variable “AGE” to the Variable(s) box
5. Click CHARTS box
6. Select BAR CHARTS
7. Click on CONTINUE
8. Click OK
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-125
Table 12.10 Age and Length of Time
Service Used (Cont’d)
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-126
Table 12.10 Age and Length of Time
Service Used (Cont’d)
22
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-127
Table 12.10 Age and Length of Time
Service Used (Cont’d)
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-128
Table 12.10 Age and Length of Time
Service Used (Cont’d)
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-129
Table 12.10 Age and Length of Time
Service Used (Cont’d)
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-130
Table 12.10 Age and Length of Time
Service Used (Cont’d)
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-131
Why Averages May be
Misleading
• Researchers tested a new sauce product and
found
– Mean rating of the taste test was close to the
middle of the scale, which had "very mild" and
"very hot" as its bipolar adjectives
• Researcher’s conclusion
– Consumers need really neither really hot nor
really mild sauce
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 12-132
Why Averages May be
Misleading (Cont’d)
• Deeper examination revealed
– The existence of a large proportion of consumers who wanted the sauce to be mild and an equally large proportion who wanted it to be hot nor really mild sauce
• Moral of the story:
– A clear understanding of the distribution of responses can help a researcher avoid erroneous inferences
23
Copyright © by Houghton Mifflin Company, Inc. All rights reserved First Edition
Chapter 13
Hypothesis
Testing
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-134
Chapter Objectives
• Distinguish between descriptive analysis and
inferential analysis.
• State the null and alternative hypotheses
pertaining to a variety of decision situations
requiring formal hypothesis testing.
• Define Type I and Type II errors and state the
relationship between them.
• Define significance level and power of a
hypothesis test.
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-135
Chapter Objectives (Cont’d)
• Lay out the steps involved in conducting a
hypothesis test
• Interpret two-way tabulation and a chi-square
contingency test
• Use the appropriate test pertaining to hypotheses
involving a single mean, a single proportion, two
means (when the two samples are independent
and when they are dependent), and two
proportions
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-136
Hypothesis Testing: Key to Actionable
Strategies By Dave Moxley,
President… • We start all research projects with in-depth
interviews of the business heads generating
hypotheses or hunches about the topic being
researched.
• Involve the business leaders in the early hypothesis generation
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-137
Client-Researcher Involvement –
Dave Moxley
• Ensure to ask necessary questions to collect the
data for testing relevant assumptions
• Increase business buy-in to the process as a full
project partner,thereby dramatically increasing the
likelihood of subsequent market action
• Improve the image of the research function as an
integrated and valued contributor to the strategic
direction and tactical program implementation of
the business
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-138
Hypotheses Testing-Dave Moxley
• Oversimplified or incorrect assumptions must be
subjected to more formal hypothesis testing
24
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-139
Interesting Hypotheses – Dave Moxley
• Bankers assumed high-income earners are more profitable than low-income earners
• Clients who carefully balance their checkbooks every month and minimize fees due to overdrafts are unprofitable checking account customers
• Old clients were more likely to diminish CD balances by large amounts compared to younger clients
– This was nonintutive because conventional wisdom suggested that older clients have a larger portfolio of assets and seek less risky investments
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-140
Data Analysis
• Descriptive
– Computing measures of central tendency and
dispersion,as well as constructing one-way tables
• Inferential
– Data analysis aimed at testing specific hypotheses is
usually called inferential analysis
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-141
Null and Alternative Hypotheses
H0 -> Null Hypotheses
Ha -> Alternative Hypotheses
• Hypotheses always pertain to population
parameters or characteristics rather than to sample
characteristics. It is the population, not the sample,
that we want to make an infernece about from
limited data
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-142
Steps in Conducting a Hypothesis Test
• Step 1. Set up H0 and Ha.
• Step 2. Identify the nature of the sampling
distribution curve and specify the appropriate test
statistic.
• Step 3. Determine whether the hypothesis test is
one-tailed or two-tailed.
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-143
Steps in Conducting a Hypothesis Test
• Step 4. Taking into account the specified significance level, determine the critical value (two critical values for a two-tailed test) for the test statistic from the appropriate statistical table.
• Step 5. State the decision rule for rejecting H0.
• Step 6. Compute the value for the test statistic from the sample data.
• Step 7. Using the decision rule specified in step 5, either reject H0 or reject Ha.
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-144
Launching a Product Line Into a New
Market Area
• Karen, product manager for a line of apparel, to
introduce the product line into a new market area
• Survey of a random sample of 400 households in
that market showed a mean income per household
of $30,000.Karen strongly believes the product
line will be adequately profitable only in markets
where the mean household income is greater than
$29,000. Should Karen introduce the product line
into the new market?
25
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-145
Karen’s Criterion for Decision
Making
• To reach a final decision, Karen has to make a
general inference (about the population) from the
sample data
• Criterion-- mean income across across all
households in the market area under consideration
• If the mean population household income is
greater than $29,000, Karen should introduce the
product line into the new market
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-146
Karen’s Hypothesis
• Karen’s decision making is equivalent to either
accepting or rejecting the hypothesis:
– The population mean household income in the new
market area is greater than $29,000
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-147
One-Tailed Hypothesis Test
• The term one-tailed signifies that all - or z-values
that would cause Karen to reject H0, are in just one
tail of the sampling distribution
-> Population Mean
H0: $29,000
Ha: $29,000
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-148
Type I and Type II Errors
• Type I error occurs if the null hypothesis is
rejected when it is true
• Type II error occurs if the null hypothesis is not rejected when it is false
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-149
Significance Level
• -> Significance level --The upper-bound
probability of a Type I error
• 1 - ->confidence level -- the complement of significance level
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-150
InferenceBased on
Sample Data
Real State of Affairs
H0 is True H0 is False
H0 is True
Correct decisionConfidence level
= 1-
Type II error
P (Type II error) =
H0 is False
Type I errorSignificance level
= *
Correct decision
Power = 1-
*Term represents the maximum probability ofcommitting a Type I error
Summary of Errors Involved in
Hypothesis Testing
26
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-151
Level of Risk
• Two firms considering introducing a new product
that radically differs from their current product
line
– Firm ABC
• Well-established customer base, distinct reputation for its
existing product line
– Firm XYZ
• No loyal clientele, no distinct image for its present
products
Which of these two firms should be more cautious
in making a decision to introduce the new
product? Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-152
Scenario - Firms ABC & XYZ
• Firm ABC
– ABC should be more cautious
• Firm XYZ
– XYZ should be less cautious
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-153
Identifying the Critical Sample Mean Value--
Sampling Distribution Sample mean (x) values greater than $29,000--that is x-values on the right-hand side
of the sampling distribution centered on µ = $29,000--suggest that H0 may be false.
More important the farther to the right x is , the stronger is the evidence against H0
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-154
Karen’s Decision Rule for Rejecting
the Null Hypothesis
• Reject H0 if the sample mean exceeds xc
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-155
Every mean x has a corresponding equivalent
standard Normal Deviate:
The expression for z
x-
Z = ---------
sx
x = + zsx
Substituting xc for x and zc for z
xc = + zcsx where zc is standard normal deviate
corresponding to the critical sample mean, xc.
Criterion Value
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-156
Computing the Criterion Value
Standard deviation for the sample of 400 households is
$8,000. The standard error of the mean (sx ) is given by
S
s = ---- = $400
n
Critical mean household income xc through the
following two steps: 1. Determine the critical z-value, zc. For =.05, From
Appendix 1, zc = 1.645.
2. Substitute the values of zc, s, and (under the assumption
that H0 is "just" true ), xc = + zc s = $29,658.
x
27
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-157
Karen’s Decision Rule
• If the sample mean household income is greater
than $29,658, reject the null hypothesis and
introduce the product line into the new market
area.
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-158
Test Statistic
The value of the test statistic is simply the z-value
corresponding to = $30,000.
x-
Z = ------ = 2.5
s
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-159
Critical Value for Rejecting the Null
Hypothesis
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-160
P - Value – Actual Significance Level
• The probability of obtaining an x-value as high
as $30,000 or more when is only $29,000 =
.0062.
• This value is sometimes called the actual
significance level, or the p-value
• The actual significance level of .0062 in this case
means the odds are less than 62 out of 10,000 that
the sample mean income of $30,000 would have
occurred entirely due to chance (when the
population mean income is $29,000 or less).
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-161
T-test
Conduct T-Test when sample is small.
Let the sample size, n = 25
X = $30,000 , s = $8,000
From the t-table in Appendix 3, tc = 1.71 for = .05 and
d.f. = 24.
Decision rule: “Reject H0 if t 1.7l.”
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-162
T-test (Cont’d)
The value of t from the sample data:
S = 8000/25 = $1,600
x-
t = ------ = 0.625
sx
The computed value of t is less than 1.71, H0 cannot
be rejected.
Karen should not introduce the product line into the
new market area.
28
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-163
Two-Tailed Hypothesis Test
• Two-tailed test is one in whichvalues of the test
statistic leading to rejectioin of the null hypothesis
fall in both tails of the sampling distribution curve
H0: = $29,000
Ha: $29,000
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-164
Test of Two Means • A health service agency has designed a public
service campaign to promote physical fitness and
the importance of regular exercise. Since the
campaign is a major one, the agency wants to
make sure of its potential effectiveness before
running it on a national scale.
– To conduct a controlled test of the campaign’s
effectiveness, the agency needs two similar cities.
– The agency identified two similar cities:
• city 1 will serve as the test city
• city 2 will serve as a control city
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-165
Test of Two Means
• Random survey of 300 adults in city 1,200 adults
in city 2 was conducted to measure the average
time per day a typical adult in each city spent on
some form of exercise.
– Results of the survey : average was 30 minutes per day
(with a standard deviation of 22 minutes) in city 1 and
35 minutes per day (with a standard deviation of 25
minutes) in city 2.
• Question:
– From these results, can the agency conclude confidently
that the two cities are well matched for the controlled
test?
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-166
City 1: n1 = 300 x1 = 30 s1 = 22
City 2: n2 = 200 x2 = 35 s2 = 25
The hypotheses are
H0: 1 =2 or 1 -2 = 0
Ha: 1 2 or 1 -2 0
Basic Statistics and Hypotheses
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-167
Test statistic is the z-statistic, given by
(x1 - x 2) - (1 - 2 )
z = -------------------------------
s12/n1 + s2
2/n2
n1 and n2 are greater than 30.
The z-statistic can therefore be used as the test statistic.
Test Statistic
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-168
Decision – Two-Tailed Test
• For Two-Tailed tests
– Identify two critical values of z, one for each tail of the
sampling distribution.
– The probability corresponding to each tail is .025, since
= .05.
– From the Normal Table, the z-value, for /2 =.025 is
1.96.
• Decision rule : “Reject H0 if z -1.96 or if z
1.96.”
29
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-169
Computing the value of z from the survey results
and under the customary assumption that the null
hypothesis is true (i.e., 1 - 2 = 0):
(30 - 35) - (0)
z = --------------------------------- = -2.29
(22)2/300 + (25)2/200
Since z -1.96, we should reject H0.
Computing Z-value – Two-Tailed Test
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-170
Hypothesis Test Related to Mean Exercising in
Two Cities
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-171
Test statistic
(x1 - x2) - (1 - 2 )
t = -------------------------
s* ( 1/n1 + 1/n2 )
with d.f. = n1 + n2 - 2. In this expression, s* is the pooled
standard deviation, given by
(n1 – 1)s12 + (n2 – 1)s2
2
s* = ---------------------------------
n1 + n2 - 2
T- Test for Independent Samples
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-172
n1 = 20 x1 = 30 s1 = 22
n2 = 10 x2 = 35 s2 = 25
The degrees of freedom for the t-statistic are
d.f. = 28
Critical value of t with 28 d.f for a tail probability
of .025 is 2.05.
Decision rule : “Reject H0 if t -2.05 or if t
2.05." The pooled standard deviation is
s* = 529 (approximately) = 23
T- Test for Independent Samples- Two
Cities
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-173
The test statistic is
t = -.56
Since t is neither less than -2.05 nor greater than 2.05,
we cannot reject H0
The sample evidence is not strong enough to conclude
that the two cities differ in terms of levels of
exercising activity of their residents.
T- Test for Independent Samples
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-174
National Insurance Company Study –
Perceived Service Quality Differences
Between Males and Females
• Test of Two Means Using the SPSS T-TEST
Program
– On the 10-point scale, males gave a mean rating of
approximately 7.87, while females gave a mean rating
of approximately 7.83.
30
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-175
National Insurance Company Study –
Perceived Service Quality Differences
Between Males and Females • In SPSS,
1. Select ANALYZE from the menu,
2. Click COMPARE MEANS
3. Select INDEPENDENT-SAMPLES T -TEST
4. Move “OQ – Over all Service Quality” to the “TEST
VARIABLES(S)” box
5. Move “gender” to “GROUPING VARIABLE” box
6. DEFINE GROUPS (SEX = 1 for male and 2 for
female)
7. Click OK.
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-176
OQ – Overall Perceived Service Quality Gender – Sex = 1 for male
Sex = 2 for female
National Insurance Company Study –
Perceived Service Quality Differences
Between Males and Females
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-177
Group Statistics
137 7.87 2.26 .19
126 7.83 2.31 .21
gender
male
f emale
OQ
N Mean Std. Dev iation
Std. Error
Mean
National Insurance Company Study –
Perceived Service Quality Differences
Between Males and Females
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-178
F-Test--to see if the variance of the 2 groups are
assumed to be equal p-value = .210 --> null
hypothesis cannot be rejected at = 0.05
P-value > = 0.05 -- Do not Reject,
Equal variance assumed is correct
Use this row
when the null
hypothesis of
equality of
variance is
rejected
National Insurance Company Study –
Perceived Service Quality Differences
Between Males and Females
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-179
P-value=.88 is greater than
the = of 0.05.
Do not reject Ho.
The p-value implies that the odds are 88 to 100 that a difference of
magnitude .04 (i.e., 7.87 - 7.83) could have occurred from chance.
The null hypothesis cannot be rejected at the customary
significance level of .05.
National Insurance Company Study –
Perceived Service Quality Differences
Between Males and Females
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-180
Test of Two Means When Samples
Are Dependent
• The need to check for significant differences
between two mean values when the samples are
not independent
31
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-181
Test of Two Means When Samples
Are Dependent
• A retail chain ran a special promotion in a
representative sample of 10 of its stores to boost
sales.
• Weekly sales per store before and after the
introduction of the special promotion are shown
• Did the special promotion lead to a significant
increase in sales ?
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-182
Sales Per Store Before and After a
Promotional Campaign Sales per Store (In Thousands)
StoreNumber (i)
BeforePromotion(xbi )
AfterPromotion(xai )
Change inSales (InThousands)xdi = xai - xbi
1 250 260 10
2 235 240 5
3 150 151 1
4 145 140 -5
5 120 124 4
6 98 100 2
7 75 70 -5
8 85 95 10
9 180 200 20
10 212 220 8
Total 50
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-183
One-Tailed Hypothesis Test:
H0: d 0; Ha: d 0.
The sample estimate of d is xd, given by n
Xdi i=1
xd = -----
n
where n is the sample size.
xd = 50/10 = 5
Test of Two Means When Samples
Are Dependent
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-184
Test statistic is
xd -
t = ----------- = 2.10
s/n
Test of Two Means When Samples
Are Dependent
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-185
Standard deviation (s) = 7.53, = 0.05,
tc for 9 d.f = 1.83 from the Appendix 3
Decision rule: “Reject H0 if t 1.83.”
Test Statistic, t 1.83, we reject H0 and conclude that
the mean change in sales per store was significantly
greater than zero.
The special promotion was indeed effective.
Test of Two Means When Samples
Are Dependent
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-186
Hypothesis Test Related to Change in
Weekly Sales Per Store
32
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-187
Test for a Single Proportion
• Ms.Jones wants to substantially increase the firm's
advertising budget--The firm sells a variety of
personal computer accessories
• Random sample : 20 / 100 know the brand name
• True awareness rate for the brand name across all
personal computer owners is less than .3
• Should Ms. Jones increase the advertising budget
on the basis of survey results?
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-188
Test for a Single Proportion
• Need to test the population proportion ( is the
symbol for population proportion) of personal
computer owners who are aware of the brand:
H0: .3
Ha: .3
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-189
The test statistic:
p -
Z = ---------------------
(1- )/n
where p is the sample proportion.
From the Normal Table, zc, = -1.645 for = .05.
Decision rule here is: “Reject Ho if z - 1.645.”
p = .2, = .3, and n = 100, z = -2.174
Test for a Single Proportion
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-190
Since -2.174 -1.645, we reject H0;
The sample awareness rate of .2 is too low to support
the hypothesis that the population awareness rate is .3 or
more.
The actual significance level (p-value) corresponding to
z = -2.174 is approximately .015 (from Appendix 1).
Level of significance implies that the odds are lower
than 15 in 1,000 that the sample awareness rate of .2
would have occurred entirely by chance(that is, when
the population awareness rate is .3 or higher).
Test for a Single Proportion
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-191
Hypothesis Test Related to Proportion
of Personal Computer Owners
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-192
Test of Two Proportions: Choosing
Between Commercial X & Commercial Y
For a New Product
Tom, advertising manager for a frozen-foods, company, is
in the process of deciding between two TV commercials, X
and Y for a new frozen food to be introduced
– Commercial X
• Runs for 20 seconds
• Random sample: 20 % awareness out of 200 respondents
– Commercial Y
• Runs for 30 seconds
• Random sample:25 % awareness out of 200 respondents
33
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-193
Test of Two Proportions (Cont’d)
• Question:
– Can Tom conclude that commercial Y will be more
effective in the total market for the new product?
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-194
Criterion for Decision Making
• To reach a final decision, Tom has to make a
general inference (about the population) from the
sample data
• Criterion-- relative degrees of awareness likely to
be created by the 2 commercials in the population
of all adult consumers
• Tom should conclude that commercial Y is more
effective than commercial X only if the anticipated
population awareness rate for commercial Y is
greater than that for X.
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-195
Hypothesis
• Tom’s Decision making is equvalent to either
accepting or rejecting the hypothesis:
– The potential awareness rate that commercial Y can
generate among the population of consumers is greater
than that which commercial X can generate
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-196
Commercial Commercial
X Y
Sample sizes: n1 = 200 n2 = 200
Sample proportions: p1 = .25 p2 = .20
The hypotheses are
H0: 1 2 or 1 - 2 0
Ha: 1 2 or 1 - 2 0
Null and Alternative Hypotheses
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-197
(p1 – p2) - (1 - 2)
z = ------------------------
p1 - p2 -- is estimated by the sample
standard error formula
Sample Standard Error
sp1 - p2 = PQ ( 1/n1 + 1/n2)
n1p1 + n2p2
P = -------------------
n1 + n2
Q = 1 - P
Test of Two Proportions-- Sample
Standard Error
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-198
For =.05, the critical value of z (from Appendix 1)
is 1.645.
Decision rule: “Reject H0 if z 1.645.”
First to compute P and Q, then sp1 - p2 and z:
200(.25) + 200(.2)
P = ----------------------- = .225
200 + 200
Q = 1 - .225 = .775
Test of Two Proportions
34
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-199
sp1 - p2 = (.225)(.775) (1/200 + 1/200)
=0.042
(.25 - .20) - (0)
z = ---------------------- = 1.19
.042
Since z 1.645, we cannot reject H0.
The sample evidence is not strong enough to suggest that
commercial Y will be more effective than commercial X.
Test of Two Proportions
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-200
Hypothesis Test Related to Awareness
Generated by Two Commercials
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-201
Cross-Tabulations: Chi-square
Contingency Test
• Technique used for determining whether there is a
statistically significant relationship between two
categorical (nominal or ordinal) variables
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-202
Telecommunications Company
• Marketing manager of a telecommunications
company is reviewing the results of a study of
potential users of a new cell phone
– Random sample of 200 respondents
• A cross-tabulation of data on whether target consumers
would buy the phone (Yes or No) and whether the cell
phone had access to the Internet (Yes or No)
• Question:
– Can the marketing manager infer that an association
exists between Internet access and buying the cell
phone?
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-203
Two-Way Tabulation of Internet Access
and Whether they Would Buy the
Cellular Phone
InternetAccess
Would Buy the Cellular Phone Yes No Total
Yes 80(80%) 20(20%) 100
No 20(20%) 80(80%) 100
Total 100(100%) 100(100%) 200
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-204
H0: There is no association between Internet access and
buying the cell phone (the two variables are
independent of each other).
Ha: There is some association between Internet access
and buying the cell phone (the two variables are not
independent of each other).
Cross Tabulations - Hypotheses
35
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-205
Conducting the Test
• Test involves comparing the actual, or observed,
cell frequencies in the cross-tabulation with a
corresponding set of expected cell frequencies(Eij)
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-206
Expected Values
ninj
Eij = -----
n
where ni and nj are the marginal frequencies, that
is, the total number of sample units in category i
of the row variable and category j of the column
variable, respectively
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-207
Computing Expected Values
The expected frequency for the first-row, first-
column cell is given by
100 100
E11 = ------------ = 50
200
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-208
Observed and Expected Cell
Frequencies InternetAccess
Would Buy the Cellular Phone Yes No Total
Yes 80(50) 20(50) 100
No 20(50) 80(50) 100
Total 100 100 200
Note: In each cell ij the number without parentheses is the
observed cell frequency (0ij) and the number in parentheses is
the expected cell frequency (Eij).
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-209
where r and c are the number of rows and columns, respectively,
in the contingency table. The number of degrees of freedom
associated with this chi-square statistic are given by the product
(r - 1)(c - 1).
r c (Oij - Eij)2
2 = -----------------
i=1 j=1 Eij
= 72.00
Chi-square Test Statistic
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-210
For d.f. = 1, Assuming =.05, from Appendix 2, the
critical chi-square value (2c) = 3.84.
Decision rule is-- “Reject H0 if 2 3.84.”
Computed 2 = 72.00
Since the computed Chi-square value is greater than
the critical value of 3.84, reject H0.
The apparent relationship between "Internet access"and
"would buy the cellular phone" revealed by the sample
data is unlikely to have occurred because of chance
Chi-square Test Statistic in a
Contingency Test
36
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-211
Interpretation
• The actual significance level associated with a chi-
square value of 72 is less than .001 (from
Appendix 2). Thus, the chances of getting a chi-
square value as high as 72 when there is no
relationship between Internet access and purchase
of cell phones are less than 1 in 1,000.
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-212
Cross-Tabulation Using SPSS for
National Insurance Company
• One crucial issue in the customer survey of
National Insurance Company was how a
customer's education was associated with whether
or not she or he would recommend National to a
friend.
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-213
Need to Conduct Chi-square Test to
Reach a Conclusion
• The hypotheses are:
– H0:There is no association between educational level
and willingness to recommend National to a friend (the
two variables are independent of each other).
– Ha:There is some association between educational level
and willingness to recommend National to a friend (the
two variables are not independent of each other).
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-214
Association Between Education and
Customer’s Willingness to recommend
National to a Friend For two-way tabulation:
1. Select ANALYZE on the SPSS menu,
2. Click on DESCRIPTIVE STATISTICS,
3. Select CROSS-TABS.
4. Move the “highest level of schooling” to ROW(S) box,
5. Move “rec” variable to “COLUMN(S) box.
6. Click on CELLS,
7. Select OBSERVED, and ROW PERCENTAGES.
8. Click CONTINUE and
9. Click OK.
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-215
Association Between Education and Customer’s
Willingness to recommend National to a Friend
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-216
COUNT
represents the
actual number of
customers in each
cell. The
percentages are
based on the
corresponding
Association Between Education and Customer’s
Willingness to recommend National to a Friend
37
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-217
Association Between Education and Customer’s
Willingness to recommend National to a Friend
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-218
For Chi-Square Assessment:
1. Select ANALYZE
2. Click on DESCRIPTIVE STATISTICS
3. Select CROSS-TABS
4. Move the variable “highest level of schooling” to
ROW(s) box
5. Move “rec” to COLUMN(s) box;
6. Click on “STATISTICS”
7. Select CHI-SQUARE, CONTINGENCY
COEFFICIENT, and CRAMER’S V
8. Click on CELLS,
9. Select OBSERVED and EXPECTED FREQUENCIES
10.Click CONTINUE
11.Click OK.
National Insurance Company Study -
Chi-Square Test
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-219
National Insurance Company Study -
Chi-Square Test
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-220
Interpret
the Table
National Insurance Company Study--
Expected Frequency Table
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-221
Computed Chi-
square value
P-value
National Insurance Company Study
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-222
National Insurance Company Study --
P-Value Significance
• The actual significance level (p-value) = 0.019
• the chances of getting a chi-square value as high
as 10.007 when there is no relationship between
education and recommendation are less than 19 in
1000.
• The apparent relationship between education and
recommendation revealed by the sample data is
unlikely to have occurred because of chance.
• Jill and Tom can safely reject null hypothesis.
38
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-223
Precautions in Interpreting Cross
Tabulation Results
• Two-way tables cannot show conclusive evidence
of a causal relationship
• Watch out for small cell sizes
• Increases the risk of drawing erroneous inferences
when more than two variables are involved
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-224
Patients whojog
Patients whodo not jog
Patients withheart disease
20 40
Patientswithout heartdisease
80 60
100 100
Is there a causal relationship between Patients who jog and
Patients with hearth disease ?
Two-way Table Based on a Survey of
200 Hospital Patients:
Copyright © by Houghton Mifflin Company, Inc. All rights reserved First Edition
Chapter 14
Examining
Associations:
Correlation
and Regression
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-226
Chapter Objectives
• Compute the Spearman correlation coefficient
between ordinal scaled variables and determine
whether or not it is statistically significant
• Compute the Pearson correlation coefficient
between two variables and assess its statistical
significance
• Explain simple regression analysis and state the
distinction between a dependent variable and an
independent variable
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-227
Chapter Objectives (Cont’d)
• Describe common indicators for checking
the usefulness of a regression equation
• Discuss practical applications of regression
analysis
• Interpret the results of a multiple regression
analysis
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-228
Did You Know That Experienced Women
in High-tech Jobs Earn More Than Men?
• General Belief: Men on an average earn more than women in similar occupations
– IEEE-USA: Survey results showed that in the electrotechnology and information-technology fields professional women with 20+ years of experience earned significantly more than men with similar experience
– Regression analysis revealed that gender and experience, along with ethnic background , were significantly related to income levels in the high-tech sector
39
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-229
Did You Know That Parents’ Education
May Have a Bearing on Children’s GPA’s?
• A study of high schools in Alberta, Canada,
showed a statistically significant, positive
association between parents’ education
levels and children’s grades
• Regression analysis revealed that 11
percent of the variation in student’s grades
could be attributed to differences in parents’
education levels
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-230
Did You Know That University Students’
Gender and Age May Be Unrelated To Their
Grades In An Introductory Marketing Course?
• The most important predictors of grades in an introductory
marketing course were
– Overall GPA
– Whether the student transferred to the university from a
community college
– Number of hours the student worked per week
• Regression analysis revealed that the predictor variables,
such as gender, age, and participation in extracurricular
activities showed no significant relationship to course
grades
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-231
Overview of Techniques for
Examining Associations
• Spearman Correlation Coefficient Technique
• The technique is appropriate when
– The degree of association between two sets of ranks (pertaining to two variables) is to be examined
• Illustrative Research Question(s) This Technique Can Answer: – Is there a significant relationship between motivation levels of
salespeople and the quality of their performance?
• Assume that the data on motivation and quality of performance are in the form of ranks, say, 1through 20, for 20 salespeople who were evaluated subjectively by their supervisor on each variable
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-232
Overview of Techniques for
Examining Associations (Cont’d)
• Pearson Correlation Coefficient Technique
• This technique is appropriate when
– The degree of association between two metric-scaled
(interval or ratio) variables is to be examined
• Illustrative Research Question(s) This Technique
Can Answer:
– Is there a significant relationship between customers'
age (measured in actual years) and their perceptions of
our company's image (measured on a scale of 1to 7)?
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-233
Overview of Techniques for
Examining Associations (Cont’d)
• Simple Regression Analysis Technique
• This technique is appropriate when
– A mathematical function or equation linking
two metric-scaled (interval or ratio) variables is
to be constructed, under the assumption that
values of one of the two variables is dependent
on the values of the other
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-234
Overview of Techniques for
Examining Associations–Simple
Regression Analysis (Cont’d)
• Illustrative Research Question(s) this Technique Can Answer:
– Are sales (measured in dollars) significantly affected by advertising expenditures (measured in dollars)?
– What proportion of the variation in sales is accounted for by variation in advertising expenditures? How sensitive are sales to changes in advertising expenditures?
40
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-235
Overview of Techniques for
Examining Associations (Cont’d)
• Multiple Regression Analysis Technique
• This technique is appropriate when
– Under the same conditions as simple regression
analysis except that more than two variables are
involved wherein one variable is assumed to be
dependent on the others
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-236
Overview of Techniques for
Examining Associations (Cont’d)
• Illustrative Research Question(s) this Technique Can Answer:
– Are sales significantly affected by advertising expenditures and price (where all three variables are measured in dollars)?
– What proportion of the variation in sales is accounted for by advertising and price? How sensitive are sales to changes in advertising and price?
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-237
Spearman Correlation Coefficient
A Spearman correlation coefficient is a measure of
association between two sets of ranks
di = the difference between the ith sample unit's ranks on the
two variables
n = the total sample size
n
6 d2
i
i =1
rs = 1 - ----------------------------
n(n2 - 1)
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-238
Scenario: Industrial Marketing Firm
• An industrial marketing firm has been hiring all its salespeople from among the graduates of 10 business schools in the vicinity of its headquarters
• The firm developed a subjective ranking of the perceived prestige levels of the 10 schools and the performance levels of the groups of graduates recruited from these schools
• Question:
– What is the degree of association between the prestige levels of the schools and the sales performance levels of their graduates hired by this company?
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-239
Table 14.2 Association Between School
Prestige and Performance of Graduates BusinessSchool
(i)
Ranking ofSchool'sPrestige
(SPi)
Ranking ofPerformanceof School'sGraduates
(GPi)
DifferenceBetweenRanks(di =
SPi-GPi)
SquaredDifference
(di2)
1 10 8 2 4
2 7 3 4 16
3 9 7 2 4
4 1 2 -1 1
5 6 9 -3 9
6 2 4 -2 4
7 3 5 -2 4
8 8 10 -2 4
9 5 6 -1 1
10 4 1 3 9
di2 = 56
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-240
(6)(56)
rs = 1 - ---------------- = .661
10(100 - 1)
Hypotheses
H0: s = 0
Ha: s 0
Spearman Correlation Co-efficient
41
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-241
n – 2
t = rs ---------- = 2.49
1 - rs2
t - Distribution
• For = .05, t for 8 degrees of freedom (d.f. = n - 2
= 10- 2 = 8) tc = +2.31 and -2.31
• Decision Rule:
– “Reject H0 if t 2.31 or if t -2.31.”
– Since t > 2.31, we reject H0 and conclude that there is
a true association between the prestige of business
schools and the job performance of its graduates.In
other words, the sample correlation of .661 is unlikely
to have occurred because of chance.
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-242
The Pearson correlation coefficient is the degree of association
between variables that are interval-or ratio-scaled.
Pearson correlation coefficient (rxy) between them is given by
n = sample size (total number of data points)
X and Y = means
Xi and Yi = values for any sample unit i
sx and sy = standard deviations
n
i = 1 (Xi – X)(Yi – Y)
rxy = ----------------------------- (n-1) sx sy
Pearson Correlation Coefficient
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-243
Market Area Dollar Sales of
Bright (in
Thousands)
Advertising
Expenditure
for Bright ($
in 100)
Number of
Competing
detergents
1 5 5 15
2 10 13 8
3 6 5 14
4 20 15 5
5 15 10 9
6 9 9 10
7 11 5 12
8 18 13 4
9 22 17 6
10 7 6 13
11 24 19 2
12 14 12 8
13 16 15 6
14 17 14 7
15 23 18 1
16 8 7 11
17 12 10 10
18 13 12 7
19 21 16 7
20 9 16 3
Bright Detergent Data
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-244
Scatter Diagram
• Plot in a two-dimensional graph
• Indicates how closely and in what fashion
the variables are associated
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-245
Exhibit 14.1 Scatter Diagram of Sales and
Advertising Data
Advertising Expenditures for Bright ($)
200018001600140012001000800600400
Do
llar
Sa
les
of
Bri
gh
t (T
ho
usa
nd
s)
30
20
10
0
What is the relationship between dollar sales and
advertising expenditure ? Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-246
Exhibit 14.2 Scatter Diagram of Sales and
Number of Competing Brands
Num ber of Competing Detergents
1614121086420
Dol
lar S
ales
of B
right
(Tho
usan
ds)
30
20
10
0
What is the relationship between dollar sales and number of
competing detergents ?
42
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-247
Pearson Correlation
• Correlation between sales and advertising is
.927
• Correlation between sales and number of
competing brands is .910
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-248
Two-Tailed Hypothesis Test For
Correlations
• H0: = 0;
• Ha: 0,
• For = .05, 19 degrees of freedom(d.f.= n -
1 = 19) rc = + .433 and rc = -.433
• Decision rule is: “Reject H0 if r .433 or if
r -.433.”
• Reject H0 in both cases
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-249
Exhibit 14.3 Scatter Diagram Showing a
Nonlinear Association Between Variables
X
14121086420
Y
70
60
50
40
30
20
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-250
National Insurance Company– Computing
Pearson Correlation Among Service Quality
Constructs
• National Insurance Company was interested in the
correlations between respondents’ overall service-
quality perceptions (on the 10-point scale) and
their average ratings along each of the five
dimensions of Service Quality
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-251
National Insurance Company– Computing
Pearson Correlation Among Service Quality
Constructs (Cont’d)
1. Click ANALYZE
2. Select CORRELATE
3. Select BIVARIATE
4. Move “oq, reliable, empathy, tangible,
response, and assure” to VARIABLES box
5. Click OK
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-252
National Insurance Company– Computing
Pearson Correlation Among Service Quality
Constructs (Cont’d)
43
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-253
National Insurance Company– Computing
Pearson Correlation Among Service Quality
Constructs Using SPSS
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-254
Interpreting Pearson Correlation
Coefficients • Each of the five service-quality measures
(reliability, empathy, tangibles, responsiveness, and assurance) is significantly related to the overall quality (OQ) at the .001 level of significance
• Responsiveness has the strongest correlation (.8625)
• Tangibles have the weakest correlation (.5038)
• All the correlations are strong enough to be meaningful
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-255
Simple Regression Analysis
• Generates a mathematical relationship
(called the regression equation) between
one variable designated as the dependent
variable (Y) and another designated as the
independent variable (X)
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-256
Independent Variable Vs.
Dependent Variable
• Independent variable
– Explanatory or predictor variable
– Often presumed to be a cause of the other
• Dependent variable
– Criterion Variable
– Influenced by the independent variable
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-257
Scenario: Curtis Construction
Industry Lobbyist
• Curtis, a construction industry lobbyist, is in an area of the country that has a high unemployment rate and a number of economically depressed construction projects
• His current charge is to convince local government officials to vote in favor of several tax concessions for the construction industry
• He is wondering whether he can generate any concrete evidence to show that increased construction activity (presumably spurred by the proposed tax concessions) would greatly benefit the state
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-258
Scenario: Curtis Construction
Industry Lobbyist (Cont’d) • Possible Dependent Variable
– Number of people unemployed or the unemployment rate
– Data on this variable may be gathered from a sample of areas from around the country
• Possible Independent Variable
– Number of construction permits issued or number of ongoing construction projects
– Data on this variable should be gathered from the same sample
44
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-259
Scenario: Carol, Chief Librarian
• Carol, chief librarian in a major university,
is eager to increase the number of students
borrowing books from the library as well as
the number of books borrowed per student
• She needs some persuasive evidence to
show how increased borrowing of books
might benefit students
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-260
Scenario: Carol, Chief Librarian
(Cont’d) • Possible Dependent Variable
– Cumulative grade point ratio
– Data on this variable should be gathered for a sample of students who have borrowed books in the past
• Possible Independent Variable
– Number of books borrowed
– Assuming that the library has records of the books borrowed by students, data on this variable can be obtained from those records for the same sample of students
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-261
Scenario: Jack, Trade Show Officer
• Jack, an officer in an association in charge
of putting together and promoting industrial
trade shows, is wondering about the impact
of the number of exhibitors in a trade show
on trade show attendance
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-262
Scenario: Jack, Trade Show
Officer (Cont’d)
• Possible Dependent Variable
– Number of people visiting a trade show
– Data on this variable can be obtained for a representative sample of trade shows from the association’s past records
• Possible Independent Variable
– Number of exhibitors in a trade show
– Necessary data can be obtained from the past records
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-263
Deriving a Regression Equation
• Y = a + bX, where a and b are constants
• Y-> Dependent Variable
• x-> Independent Variable
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-264
Market Area Dollar Salesof Bright (inThousands)
AdvertisingExpenditure
for Bright($ in 100)
Number ofCompetingdetergents
1 5 5 15
2 10 13 8
3 6 5 14
4 20 15 5
5 15 10 9
6 9 9 10
7 11 5 12
8 18 13 4
9 22 17 6
10 7 6 13
11 24 19 2
12 14 12 8
13 16 15 6
14 17 14 7
15 23 18 1
16 8 7 11
17 12 10 10
18 13 12 7
19 21 16 7
20 9 16 3
Bright Detergent Data
45
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-265
Exhibit 14.4 Several Subjectivity
Constructed Regression Lines
Advertising Expenditures for Bright ($)
200018001600140012001000800600400
Do
llar
Sa
les
of
Bri
gh
t (T
ho
us
an
ds
)
30
20
10
0
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-266
Regression Using SPSS--Sales
and Advertising Data
1. Click ANALYZE
2. Select REGRESSION
3. Click LINEAR
4. Move “Dollar Sales for Bright” to DEPENDENT
Box
5. Move “advertising expenditures for Bright” to
INDEPENDENT(S) box
6. Click OK
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-267
Exhibit 14.5 SPSS Computer Output or
Simple Regression Analysis of Sales and
Advertising Data
Model Summary
.927a .860 .852 2.28
Model
1
R R Square
Adjusted
R Square
Std. Error of
the Estimate
Predictors: (Constant), Advertis ing Expenditures for
Bright ($)
a.
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-268
Exhibit 14.5 SPSS Computer Output or
Simple Regression Analysis of Sales and
Advertising Data (Cont’d)
ANOVAb
571.646 1 571.646 110.221 .000a
93.354 18 5.186
665.000 19
Regression
Residual
Total
Model
1
Sum of
Squares df Mean Square F Sig.
Predictors: (Constant), Advertising Expenditures for Bright ($)a.
Dependent Variable: Dollar Sales of Bright (Thousands)b.
F is greater than the critical value
P value < = 0.05, we can infer that the R2–value of.860 is
statistically significant; it is unlikely to have occurred by chance
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-269
Exhibit 14.5 SPSS Computer Output or
Simple Regression Analysis of Sales and
Advertising Data (Cont’d)
Coefficientsa
.163 1.457 .112 .912
1.210 .115 .927 10.499 .000
(Constant)
Advertising Expenditures
for Bright ($in 100)
Model
1
B Std. Error
Unstandardized
Coeff icients
Beta
Standardi
zed
Coeff icien
ts
t Sig.
Dependent Variable: Dollar Sales of Bright ($ in Thousands)a.
t value >2.10 and p-value < =0.05 --Reject Null Hypothesis, that is the
coefficient is statistically significant
a =.163
b =1.210
The regression equation is
Yi = .163 + 1.210 Xi
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-270
Standard Error
SSE
Sy/x = -----------
n - k - 1
• The value of the standard error (sy/x) is
shown in the computer output as 2.277,
which is the square root of the error mean
square value of 5.186
46
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-271
Practical Applications of
Regression Equations
• The regression coefficient, or slope, can
indicate how sensitive the dependent
variable is to changes in the independent
variable
• The regression equation is a forecasting tool
for predicting the value of the dependent
variable for a given value of the
independent variable
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-272
Precautions In Using Regression
Analysis • Only capable of capturing linear associations
between dependent and independent variables
• A significant R2-value does not necessarily imply a cause-and-effect association between the independent and dependent variables
• A regression equation may not yield a trustworthy prediction of the dependent variable when the value of the independent variable at which the prediction is desired is outside the range of values used in constructing the equation
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-273
Precautions In Using Regression
Analysis (Cont’d)
• A regression equation based on relatively
few data points cannot be trusted
• The ranges of data on the dependent and
independent variables can affect the
meaningfulness of a regression equation
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-274
Multiple Regression Analysis
• Yi = a + b1X1i + b2X2i + … + bkXki
• Yi is the predicted value of the dependent variable
for some unit i;
• X1i, X2i, …, Xki are values on the independent
variables for unit i;
• bl, b2, . . . , bk are the regression coefficients;
• a is the Y-intercept representing the prediction for
Y when all independent variables are set to zero
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-275
National Insurance Company–
Multiple Regression Using SPSS
• Jill and Tom were interested in conducting a
multiple regression analysis wherein overall
service quality perceptions is the dependent
variable and the average ratings along the
five dimensions are the indpendent variable
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-276
National Insurance Company– Multiple
Regression Using SPSS (Cont’d)
1. Click ANALYZE
2. Select REGRESSION
3. Click LINEAR
4. Move “OQ” to DEPENDENT
Box
5. Move “reliable, empathy,
tangible, response, and assure”
to INDEPENDENT(S) box
6. Click OK
47
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-277
National Insurance Company– Multiple
Regression Using SPSS (Cont’d)
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-278
The R-square
of .810
indicates a
strong
relationship
between these
variables and
overall
quality.
National Insurance Company– Multiple
Regression Using SPSS (Cont’d)
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-279
National Insurance Company– Multiple
Regression Using SPSS (Cont’d)
All variables except empathy are significantly
related to overall service quality
(as indicated by the t-test of significance in the
far right column)
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-280
Bright Detergent Case – Multiple
Regression Using SPSS
1. Click ANALYZE
2. Select REGRESSION
3. Click LINEAR
4. Move “Dollar Sales for Bright” to DEPENDENT Box
5. Move “advertising expenditures for Bright and Number of
competing Brands” to INDEPENDENT(S) box
6. Click OK.
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-281
Bright Detergent Case – Multiple
Regression Using SPSS (Cont’d)
Model Summary
.934a .873 .858 2.23
Model
1
R R Square
Adjusted
R Square
Std. Error of
the Estimate
Predictors: (Constant), Number of Competing
Detergents, Advertising Expenditures for Bright ($in
100)
a.
ANOVAb
580.373 2 290.187 58.293 .000a
84.627 17 4.978
665.000 19
Regression
Residual
Total
Model
1
Sum of
Squares df Mean Square F Sig.
Predictors: (Constant), Number of Competing Detergents, Adv ertising Expenditures
f or Bright ($in 100)
a.
Dependent Variable: Dollar Sales of Bright ($ in Thousands)b.
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-282
Coefficientsa
8.854 6.717 1.318 .205
.808 .324 .619 2.496 .023
-.498 .376 -.328 -1.324 .203
(Constant)
Adv ertising Expenditures
f or Bright ($in 100)
Number of Competing
Detergents
Model
1
B Std. Error
Unstandardized
Coeff icients
Beta
Standardi
zed
Coeff icien
ts
t Sig.
Dependent Variable: Dollar Sales of Bright ($ in Thousands)a.
Bright Detergent Case – Multiple
Regression Using SPSS (Cont’d)
48
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-283
Multicollinearity
• Multicollinearity exists when independent
variables in a multiple regression equation
are highly correlated among themselves
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-284
Bright Detergent Case– Multicollinearity
Correlations
1.000 .927** -.909**
. .000 .000
20 20 20
.927** 1.000 -.937**
.000 . .000
20 20 20
-.909** -.937** 1.000
.000 .000 .
20 20 20
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Dollar Sales of Bright ($ in
Thousands)
Adv ertising Expenditures
f or Bright ($in 100)
Number of Compet ing
Detergents
Dollar Sales
of Bright ($ in
Thousands)
Adv ertising
Expenditures
f or Bright ($in
100)
Number of
Competing
Detergents
Correlation is signif icant at the 0.01 level (2-tailed).**.
Very high correlation between independent variables-presence of multicollinearity
Copyright © by Houghton Mifflin Company, Inc. All rights reserved First Edition
Chapter 15
Overview of
Other
Multivariate
Techniques
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-286
Chapter Objectives
• Distinguish between dependence and interdependence techniques
• Interpret interaction effect in a factorial ANOVA
• Identify two key purposes of discriminant analysis
• Discuss factor analysis and interpret a factor-loading matrix
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-287
Chapter Objectives (Cont’d)
• Distinguish between cluster analysis and
discriminant analysis
• Describe the potential uses of
multidimensional scaling and point out its
key limitations
• State the purpose of conjoint analysis and
use the results from such an analysis
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-288
Dependence and Interdependence
Techniques
• Dependence technique
– One variable is designated as the dependent variable and the rest are treated as independent variables
• Interdependence technique
– There are no dependent and independent variable designations, all variables are treated equally in a search for underlying patterns of relationships
49
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-289
Dependence Technique–
Regression Analysis
• Input Data
– Dependent variable(s) - metric
– Independent variable(s)- metric
• Primary Purpose of the Technique
– Ascertain the relative importance of independent variable(s) in explaining variation in the dependent variable
– Predict dependentvariable values for given values of the independent variable(s)
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-290
Overview of Multivariate
Techniques
• Analysis of Variance (ANOVA) Technique
• Usual Form of the Input Data
– Dependent variable, metric independent
variable(s), nonmetric
• Primary Purpose of the Technique
– See whether different levels (treatments) of
independent variable(s) have significantly
different impacts on the dependent variable
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-291
Overview of Multivariate
Techniques (Cont’d)
• Discriminant Analysis Technique
• Usual Form of the Input Data
– Dependant variable, nonmetric independent variable(s),
metric
• Primary Purpose of the Technique
– To identify independent variables that are critical in
distinguishing between subsamples defined by the
dependent-variable categories; also aid inclassifying
new units into one of the subsample categories
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-292
Overview of Multivariate
Techniques (Cont’d)
• Factor Analysis Technique
• Usual Form of the Input Data
– Metric
• Primary Purpose of the Technique
– To reduce data on a large number of variables into a
relatively small set of factors
– To identify key constructs underlying the original set of
measured variables in classifying new units into one of
the subsample categories
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-293
Overview of Multivariate
Techniques (Cont’d)
• Cluster Analysis Technique
• Usual Form of the Input Data
– Metric
• Primary Purpose of the Technique
– To identify natural clusters of objects on the
basis of similarities of the objects on a variety
of characteristics
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-294
Overview of Multivariate
Techniques (Cont’d)
• Multidimensional Scaling Technique
• Usual Form of the Input Data
– Nonmetric (similarity ranks based on comparison of
actual objects)
• Primary Purpose of the Technique
– To identify key dimensions underlying respondent
evaluations of products, brands, stores, etc.
– To determine the relative positions of the objects in
multidimensional space
50
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-295
Overview of Multivariate
Techniques (Cont’d) • Conjoint Analysis Technique
• Usual Form of the Input Data
– Nonmetric
• Primary Purpose of the Technique
– To derive utility values that respondents implicitly assign to various levels of key attributes used in evaluating objects
• the utility values themselves aid in ascertaining the relative importance of the attributes as well as the potential attractiveness of descriptive profiles defined by different combinations of attributes
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-296
Analysis of Variance
• ANOVA is appropriate in situations where
the independent variable is set at certain
specific levels (called treatments in an
ANOVA context) and metric measurements
of the dependent variable are obtained at
each of those levels
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-297
Example 24 Stores Chosen randomly for the study
8 Stores randomly chosen for each treatment
Treatment 1
Store brand sold at
the regular price
Treatment 2
Store brand sold at
50¢ off the regular
price
Treatment 3
Store brand sold at
75¢ off the regular
price
monitor sales of the store brand for a week in each store
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-298
Table 15.2 Unit Sales Data Under Three
Pricing Treatments Treatment Regular Price 50 ¢ off 75 ¢ off
Unit Sale ineach store
37 46 46
38 43 49
40 43 48
40 45 48
38 45 47
38 43 48
40 44 49
39 44 49
Number ofstores
8 8 8
Mean sales 38.75 44.13 48.00
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-299
EG1(R) X1 O1
EG2(R) X2 O2
EG3(R) X3 O3
EG1 -- Experiment Group 1, X1-- Regular Price
EG2 -- Experiment Group 2, X2-- 50c off
EG3 -- Experiment Group 3, X3-- 75c off
O1 -- Observation (monitoring unit sales data in each store)
O2 -- Observation (monitoring unit sales data in each store)
O3 -- Observation (monitoring unit sales data in each store)
After Only Design
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-300
ANOVA –Grocery Store
Hypothesis
• Grocery Store Example
– Ho 1 = 2 = 3
– Ha At least one is different from one or more of
the others
• Hypotheses for K Treatment groups or samples
– Ho 1 = 2 = ………..k
– Ha At least one is different from one or more of
the others
51
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-301
Exhibit 15.1 SPSS Computer
Output for ANOVA Analysis
Between-Subjects Factors
Regular
pri ce8
50 cents off 8
75 cents off 8
1
2
3
Treatment
group
Val ue Label N
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-302
Exhibit 15.1 SPSS Computer Output
for ANOVA Analysis (Cont’d)
Tests of Between-Subjects Effects
Dependent Variable: SALES
345.250a 2 172.625 137.445 .000
45675.375 1 45675.375 36367.123 .000
345.250 2 172.625 137.445 .000
26.375 21 1.256
46047.000 24
371.625 23
Source
Corrected Model
Intercept
TREAT
Error
Total
Corrected Total
Type III Sum
of Squares df Mean Square F Sig.
R Squared = .929 (Adjusted R Squared = .922)a.
There is less than a .001 probability of obtaining an F-
value as high as 137.447
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-303
Bank Customer Perceptions Study
Bank Customers
Gender
Male Female
< 35
Years
35-64
Years
> 64
Years
< 35
Years
35-64
Years
> 64
Years
Measure Overall Perceptions
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-304
Bank Customer Perceptions Study (Cont’d)
Tests Between-Subjects Effects
Dependent Variable:Overall Quality of the Company’s Services
Source Type III
Sum of
Squares
df Mean
Square
F Sig.
Corrected
Model
2156.112a 5 431.222 438.891 .000
Intercept 20665.912 1 20665.912 1033.424 .000
Gender 382.436 1 382.436 389.237 .000
Age 1311.623 2 655.811 667.474 .000
Gender * Age 260.433 2 30.216 132.532 .000
Error 459.823 468 .983
Total 24341.000 474
Corrected Total 2615.935 473
a. R Squared = .824 (Adjusted R Squared = .822)
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-305
Bank Customer Perceptions Study (Cont’d)
Descriptive Statistics
Dependent Variable: Overall Quality of the company 's serv ices
2.54 1.31 79
6.72 1.17 88
8.08 .82 85
5.87 2.57 252
6.49 1.39 55
6.95 .58 79
9.36 .48 88
7.79 1.53 222
4.16 2.36 134
6.83 .94 167
8.73 .93 173
6.77 2.35 474
Age
<35
35-64
>64
Total
<35
35-64
>64
Total
<35
35-64
>64
Total
Gender
Male
Female
Total
Mean Std. Dev iation N
Male and female
customers differed in
their overall
perceptions
Customers' perceptions
differed according to
their ages
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-306
Estimated Marginal Means of Overal l Qua li ty o f the company's services
Age
>6435-64<35
Es
tim
ate
d M
arg
in
al M
ea
ns
10
8
6
4
2
Gender
Male
Female
Sex and age interacted in influencing perceptions
Bank Customer Perceptions Study (Cont’d)
52
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-307
Factorial Anova
• The Factorial ANOVA is used to analyze
data from a factorial design experiment
variable
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-308
Exhibit 15.2 Illustrations of Main and
Interaction Effects
Grocery Store Experiment
Display
Present
Display
absent
(a) Main and Interaction Effects Present
Display
Present
Display
absent
(b) Only Main Effects Present
U
n
i
t
S
a
l
e
s
Price
U
n
i
t
S
a
l
e
s
Price
Regular
Price 50 ¢ off 75 ¢ off Regular
Price 50 ¢ off 75 ¢ off
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-309
Discriminant Analysis
• Identifies the distinguishing features of
prespecified subgroups of units that are formed on
the basis of some dependent variable
• Examples of Subgroups
– Heavy, moderate, and light users of a product
– Homeowners and renters
– Viewers and nonviewers of a television program
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-310
Discriminant Analysis (Cont’d)
• Dependent Variable
– Categorical: as many categories as there are subgroups
• Heavy, moderate, and light users: 3 categories
• Independent Variable
– Metric-scaled
• Purpose of discriminant analysis is to classify new
units into one of the subgroups given the new
units’ values of the independent variable
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-311
Example
Computer Manufacturer
Household
income
Number of years of
formal education
PC Ownership Not Owning A PC
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-312
Exhibit 15.3 Scatter Plot of Income and
Education Data for Personal Computer
Owners and Nonowners
Owners
Non
Owners
Income ($)
53
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-313
Using the Discriminant Function
• Y = v1X1 + v2X2
– Discriminant weights v1 and v2 can be interpreted as signifying the relative importance of X1 and X2 in being able to discriminate between the two groups
• Ynew = v1X1,new + v2X2, new
– The program assigns either to the owner group or to the non-owner group based on the criterion value
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-314
Evaluating a Discriminant
Function
• Confusion Matrix
– Indicates the degree of correspondence, or lack
thereof, between the actual groupings of the
sample units and the predicted groupings
obtained by classifying the same units through
the discriminant function
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-315
Table 15.3 Confusion Matrix
Predicted groupings
Households with Households without
Actual Groupings Personal Computers Personal Computers
Households with
Personal computers 17 3
Household without
Personal computers 4 16
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-316
Usefulness of Discriminant
Analysis
• Discriminant analysis is very useful for
– Defining customer segments
– Identifying critical characteristics capable of
distinguishing among them
– Classifying prospective customers into
appropriate segments
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-317
Factor Analysis
• A data and variable reduction technique that
attempts to partition a given set of variables
into groups of maximally correlated
variables
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-318
Intuitive Explanation
• Consider two statements from the Star
Brand Inc.(SBI) survey
• S1. “I have been satisfied with the Star
products I have purchased”
• S2. “When I have to purchase a home
appliance in the future, it will likely be a
Star product”
54
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-319
Exhibit 15.6 S1 and S2 Highly Correlated:
Factor Analysis Will Be Beneficial
S1 and S2 can be
combined into one
factor.
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-320
Exhibit 15.7 Situation Where Factor Analysis
Will Not Be Beneficial: S1 and S2 Poorly
Correlated
S1 and S2 cannot
be combined
into one factor.
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-321
Factor Analysis Output and Its
Interpretation
• Primary output of factor analysis is a factor-
loading matrix
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-322
Table 15.4 Factor-Loading Matrix Based on Data from
Study of Star Customers
Factor Loadings Factors F1 F2
AchievedCommunalities
X4: My friends are very
impressed with the Star VCR
0.96 0.06 .926
X6: No other brand of VCR
even comes close to matchingthe Star
0.92 0.17 .875
X1: I did not mind paying the
high Price for my Star VCR
0.89 0.15 .815
X3: I hardly ever worry about
anything going wrong with myStar VCR
0.18 0.94 .916
X5: The Star VCR has the
latest technology built into it
0.09 0.88 .782
X2: I am pleased with the
variety of things that a StarVCR can do
0.16 0.86 .766
VCR
Eigenvalues: Standardized
variance explained by eachfactor
2.626 2.454
Proportion of the total varianceexplained by each factor
0.438 0.409
3 Variables load
high on factor 1
3 Variables load
high on factor 2
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-323
Reducing Star Data
• X1, X4, and X6 can be combined into one
factor
• X2, X3, and X5 can be into a second factor
• 6 variables can be reduced to two factors
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-324
Potential Applications of Factor
Analysis
• Used to
– Develop concise but comprehensive, multiple-item scales for measuring various marketing constructs
– Illuminate the nature of distinct dimensions underlying an existing data set
– Convert a large volume of data into a set of factor scores on a limited number of uncorrelated factors
55
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-325
Cluster Analysis
• Segment objects into groups so that
members within each group are similar to
one another in a variety of ways
• Useful for segmenting customers, market
areas, and products
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-326
Use of Cluster Analysis
• Firm offering recreational services wanted to enter a new region of the country
• They gathered data on more than 100 characteristics including
– Demographics
– Expenditures on recreation
– Leisure time activities
– Interests of household members
• The firm identified one or several household segments that are likely to be most responsive to its advertising and to its services
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-327
How Does Cluster Analysis
Work?
• Cluster analysis measures the similarity
between objects on the basis of their values
on the various characteristics
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-328
Exhibit 15.8 Clusters Formed by
Using Data on Two Characteristics
High
High
Low
Low Extent of participation in outdoor sporting events
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-329
Multidemensional Scaling
• Uncovers key dimensions underlying
customers' evaluations from a series of
similarity and/or preference judgments
provided by customers about products or
brands within a given set
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-330
Multi-Dimensional Scaling on
SUV’s
• A customer is asked to compare pairs of
SUVs and rank the pairs from most similar
to least similar
56
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-331
Table 15.5 Similarity Rankings
of Six 2001 SUVs
LX 470 Lrover MBenz Acura Infiniti BMW
LX 470 15 14 12 11 13
Lrover 1 4 7 2
Mbenz 5 8 3
Acura 10 6
Infiniti 9
Note: Numbers are ranks indicating perceived similarities between pairs of SUVs; the smaller the number, the more
similar the pair of SUVs is.
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-332
Exhibit 15.9 Multidimensional Map of 2001
SUVs Based on Similarity Rankings
What do these dimensions
stand for ?
Maybe Value
Ma
yb
e Q
ua
lity
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-333
Conjoint Analysis
• Technique for deriving the utility values
that customers presumably attach to
different levels of an object's attributes
• Requires respondents to compare
hypothetical products, brands
• The hypothetical stimuli are descriptive profiles
formed by systematically combining varying levels
of certain key attributes
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-334
Personal Computer Study
• To assess the role played by attributes in
customer evaluations of personal compters
– Price: 3 levels - $839, $1039, $1259
– Processor: 2 levels – 800MHz , 1.1 GHz
– Speed: 4 levels - 10 GB, 14 GB, 18 GB, 20 GB
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-335
Personal Computer Study
(Cont’d)
• 3 Levels of Price X, 2 Levels of Processor
Speed X, 4 Levels of Hard Drive Capacity =
24 different descriptive profiles of personal
computers are possible
• Data Collection in Conjoint Analysis
– Two-Factors-at-a-Time Approach
– Full-Profile Approach
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-336
Personal Computer Study: Two-Factors-
At-a-Time Approach
$ 839 $1,039 $1259Processing
Speed
Price
800
MHz
1.1
GHz
Note: Customers are asked to rank the six possible combinations
of levels according to their preferences , Most Preferred = 1 and
Least Preferred = 6
57
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-337
PERSONAL
COMPUTER –
DESKTOP
Price
$839
Speed
800 MHz
Hard Drive
10 GB
PERSONAL
COMPUTER –
DESKTOP
Price
$839
Speed
800 MHz
Hard Drive
14 GB
PERSONAL COMPUTER
- DESKTOP
Price
$839
Speed
800 MHz
Hard Drive
18 GB
PERSONAL
COMPUTER -DESKTOP
Price
$839
Speed
800 MHz
Hard Drive
20 GB
Note: Customers are asked to rank order their preferences for the
24 different profiles representing all possible combinations of the
three attributes
Personal Computer Study: Full-
Profile Approach
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-338
Exhibit 15.10 Utility Values for Three
Personal-Computer Attributes
$ 839 $1,039 $1,259
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-339
800 MHz 1.1 GHz
Exhibit 15.10 Utility Values for Three
Personal-Computer Attributes (Cont’d)
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-340
10 GB 14 GB 18 GB 20 GB
Exhibit 15.10 Utility Values for Three
Personal-Computer Attributes (Cont’d)
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-341
Relative Attributes of the 3
Attributes
• Range for price = 0.8 - 0.3 = 0.5
– Price is the most critical
• Range for hard drive capacity = 0.8 - 0.4 =
0.4
– Hard drive capacity is the next most critical
• Range for processor speed = 0.9 - 0.6 = 0.3
– Processor speed Ii the least critical
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-342
Potential Attractiveness of Different
Personal Computer Configurations
• PC Configuration A
– 800 MHz, 14 GB, $1,059
– Total utility for the personal computer =
0.6 + 0.7 + 0.4 = 1.7
• PC Configuration B
– 1.1 GHz, 18 GB, $1,259
– Total utility for the personal computer =
0.9 + 0.8 + 0.3 = 2.0
• Personal Computer B is more attractive
58
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-343
Online (Virtual) Conjoint
Analysis Experiments at MIT
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-344
Virtual Consumer Initiative: mitsloan.mit.edu
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-345
Virtual Consumer Initiative:
Ski Resort
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-346
Virtual Consumer Initiative:
Ski Resort (Cont’d)
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-347
Virtual Consumer Initiative:
Ski Resort (Cont’d)
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-348
Virtual Consumer Initiative:
Ski Resort (Cont’d)
59
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-349
Virtual Consumer Initiative:
Ski Resort (Cont’d)
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-350
Virtual Consumer Initiative:
Ski Resort (Cont’d)
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-351
Virtual Consumer Initiative:
Ski Resort (Cont’d)
Copyright © by Houghton Mifflin Company, Inc. All rights reserved 15-352
Virtual Consumer Initiative:
Ski Resort (Cont’d)