chapter 11 sampling foundations - boun.edu.trweb.boun.edu.tr/ulas.akkucuk/ad585/ad585-part1c.pdf•...

1

Copyright © by Houghton Mifflin Company, Inc. All rights reserved First Edition

Chapter 11

Sampling

Foundations

Copyright © by Houghton Mifflin Company, Inc. All rights reserved 11-2

Chapter Objectives • Define and distinguish between sampling

and census studies

• Discuss when to use a probability versus a

nonprobability sampling method and

implement the different methods

• Explain sampling error and sampling

distribution


Chapter Objectives • Construct confidence intervals for

population means and proportions

• List the factors to consider in determining

sample size, and compute the required

sample size to achieve a specific degree of

precision at a desired confidence level


National Poll –Sample Size

• Harris Poll

– A weekly study that monitors the reactions of

the American public to a variety of economic,

political, and social issues

• Sample Size

– Based on a nationally representative telephone

survey of 1,000 adults age 18 or over


AC Nielsen SCANTRACK Index

• Offers valuable scanner-based sales and brand

share data on a regular basis to manufacturers of a

wide variety of consumer products such as food,

drugs, cosmetics

• Sample Size

– Sales and brand share estimates are gathered weekly

from a representative sample of more than 4,800 stores

representing over 800 retailers in 50 major markets


Sampling vs. Census Studies

• A census study draws inferences from the

entire body of units of interest

• A sample study, drawing inferences from a

sample drawn from the population

2


Advantages of Sampling

• Low Cost

• Reduced time


Sampling and

Nonsampling Errors

• Sampling error: The difference between a statistic

value that is generated through a sampling

procedure and the parameter value, which can be

determined only through a census study

• Nonsampling error: Any error in a research study

other than sampling error (which arises purely

because a sample, rather than the entire

population, is studied)


Minimizing Sampling Errors

• Increase the sample size

• Use a statistically efficient sampling plan

• Make the sample as representative of the

population as possible


Types of Nonsampling Errors

• Nonsampling Error

– Any error other than sampling error

• Sampling Frame Error

– Sampling frame not being representative of ideal population

• Nonresponse Error

– Final sample not representative of planned sample

• Data Error

– Distortions in collected data and mistakes in data coding, analysis, or interpretation


Potential Causes of Sampling

Frame Errors

• Incomplete sampling frame over-represents

some population segments and

underrepresents others

• Sampling frame contains irrelevant units


Minimizing Sampling Frame

Errors

• Start with a complete sampling frame

• Modify the sampling frame to make it

representative of the ideal population using

plus-one dialing in telephone surveys

3


Potential Causes of Nonresponse

Errors

• Mail surveys/Internet Surveys

– Certain types of sample units being more likely

to respond than others

• Telephone and personal interview surveys

– Person not-at-home problem and respondent

refusal problem


Minimizing Nonresponse Errors

• Mail surveys: increase response rates through the use of incentives, follow-up mailings, etc.

– Caution:increase in response rate per se may not reduce non-response error

• Telephone and personal interview surveys: make call-backs and spread out the time blocks during which interviews are conducted


Potential Causes of Data Errors

• Respondents’ reluctance/ inability to give

accurate answers

• Ill-trained interviewers

• Unscrupulous interviewers

• Poorly designed questionnaire

• Mistakes in coding data

• Erroneous analysis

• Incorrect/ inappropriate interpretation of results


Exhibit 11.1 Types and Potential Causes of

Nonsampling Errors Telephone survey Online survey

Total population of interest

Portion of population that has access to the medium (telephone, online)

Portion of population that has access and volunteers (does not refuse,opts in)

Portion of population that has access, volunteers and completes(responds, does not opt out)

Source:Adapted from Thomas W. Miller, “Can We Trust the Data of Online Research,” Marketing Research (Summer 2001),

Vol. 13, No.2, p. 31.


When Census Studies Are

Appropriate

• The feasibility condition

– Whenever a population is relatively small or

can be accessed easily

• The necessity condition

– When the population units are extremely varied

and each population unit is likely to be very

different from all the other units


Probability and Nonprobability

Sampling

• Probability sampling is an objective

procedure in which the probability of

selection is known in advance for each

population unit

• Nonprobability sampling is a subjective

procedure in which the probability of

selection for each population unit is

unknown beforehand

4


Sampling methods

Probability sampling Nonprobability sampling

Stratified sampling Simple random

sampling Cluster Sampling

Judgment

sampling

Convenience sampling Quota sampling

Proportionate stratified

random sampling

Disproportionate

Stratified random sampling

Simple cluster sampling Systematic sampling

Exhibit 11.3 Classification of Sampling Methods


Probability Sampling Methods

• Simple Random Sampling

• Stratified Random Sampling

• Cluster Sampling


Gallup Poll: USA

• Identify and describe the population that a given

poll is attempting to represent

• Choose or design a method that will enable Gallup

to sample the target population randomly

• Random Digit Dialing(RDD)--a procedure that

creates a list of all possible household phone

numbers in America and then selects a sub-set of

numbers from that list for Gallup to call


Simple Random Sampling

• Every possible sample of a certain size

within a population has a known and equal

probability of being chosen as the study

sample


Stratified Random Sampling

• Two Types of Stratified Random Sampling:

– Proportionate Stratified Random Sampling

– Disproportionate Stratified Random Sampling


Proportionate Stratified Random

Sampling

• Sample consists of units selected from each

population stratum in proportion to the total

number of units in the stratum

5


Kirkwood University-

Proportionate Stratified Random

Sampling

• Administrators of Kirkwood University wanted to

determine the attitudes of their students toward

various aspects of the university

• They selected a proportionate stratified random

sample of 500 students for conducting the attitude

survey


Table 11.2 Proportionate Allocation of Total

Sample of Kirkwood University Students

500 10,000 Total

100 2,000 Seniors

100 2,000 Juniors

150 3,000 Sophomores

150 3,000 Freshman

Number of

Sample Units

Allocated

Number of

Population Units

Population Strata


Gallup Poll on Sampling: China

• 12,500 counties, cities, and urban districts were

divided into 50 strata based on their geographic

location, degree of economic development, and

proportion of non-agricultural population

• One primary sampling unit (PSU), consisting of

either a county or a city, was selected from each

stratum based on probability proportional to

population size



(Cont’d)

• Within each PSU, the populations of all

neighborhoods and villages were compiled.

From this listing, four neighborhoods or

villages were selected proportional to size.

• From each of these four neighborhoods or

villages, five households were selected at

random



(Cont’d)

• One respondent was selected from each of

the selected households, ensuring proper

representation in the sample of all age

groups by both genders

• The respondent to be interviewed is then

selected according to a prescribed

systematic procedure


Gallup Poll on Sampling: China (Cont’d)

• If the designated respondent was not at home, or could not be reached, a second or, if needed, a third adult family member was selected systematically from among the household members remaining on the list

• If contact with the designated respondent could not be made after a total of three separate visits to the household, an interview with a respondent in a substitute household in the same locality was permitted

• Two substitute households were kept in reserve for each five assigned households in the interviewing area

6


Gallup: India

• Design of the sample:

– GALLUP INDIA PVT. LTD. interviewed a

total of 5,122 Indian adults age 18 years and

over (one per household) in late March and

early April 1996

– Nationwide survey involved in-person

interviews in 144 villages and 84 towns and

cities across India


Gallup: India (Cont’d)

• Urban Sample (design = 1,600 interviews)

– Three hundred eighty districts in India (excluding those in Jammu-Kashmir, the northeastern states, and other difficult-to-access areas such as the Andaman and Nicobar Islands) were classified into 20 strata based on their geographical (zonal) location and urban population

– Across these 20 strata, 40 districts were chosen • In each selected district, two towns were picked on the basis of

probability proportional to size

– From the selected towns, 2 colonies were selected randomly, and 10 households were selected from each colony

• From each household,one respondent was chosen i.e., either male or female above 18 years of age



• Rural Sample (design = 1,440 interviews)

– 40 districts were chosen for the urban sample, the

remaining 340 districts were divided into 12 strata

based on their geographical (zonal) location and rural

population

• On average, two districts were selected from each stratum.

– From each household, one respondent was chosen on

the same criterion of demographics, i.e., either male or

female above 18 years of age



• Urban Oversample (design = 2,000 interviews, 400 per metro)

– The urban oversample represented five of the country’s major metropolitan areas: Bombay, Delhi, Calcutta, Madras,and Bangalore

– Within each metropolitan area, an average of 13 electoral wards were chosen on a probability proportional to size basis

– Within each electoral ward, four colonies were randomly selected

• In each colony,eight households were randomly selected

– One respondent was interviewed per household



• Results are projectable to within ±3 percent

for India as a whole, ±2 percent for urban

India in general, and ±7 percent for each of

India’s five largest cities

• Urban and rural India were considered as

separate domains for purposes of sampling


Disproportionate Stratified

Random Sampling

• Sample consists of units selected from each

population stratum according to how varied

the units are within the stratum

7


Exhibit 11.4 Disproportionate Stratified Random

Sampling Used by A.C. Nielsen Company

Chain

(Includes Convenience

Chains)

Large Independent

(Over $500,000)

Medium Independent

($100,000 - $500,000)

Small Independent

(Under $100,000)

25.2%

12.8%

32.6%

29.4%

47.9%

24.9%

17.6%

9.6%

$2,445,000

$1,700,000

$234,000

$55,000

1 out of

every 39

1 out

of

every

69

1 out

of

every

248

1 out of

every

360

In Universe

Percent of stores

In NFL Sample Average Store Size Take Ratio


Cluster Sampling

• Clusters of population units are selected at

random and then all or some units in the

chosen clusters are studied


Systematic Sampling Steps

• An organized procedure, selecting a sample from a list containing all the population units

• Steps:

1) Determine the sampling interval, k:

number of units in the population

k = ------------------------------------------

number of units desired in the sample


Systematic Sampling Steps

(Cont’d)

• Steps (cont’d):

• 2) Choose randomly one unit between the

first and kth units in the population list

• 3) The randomly chosen unit and every kth

unit thereafter are designated as part of the

sample


Practical Considerations:

Probability Sampling Methods

• Probability sampling techniques are

generally used by large commercial

marketing research firms that maintain

national samples or panels that can be

readily accessed for conducting periodic

research surveys


Nonprobability Sampling

Methods

• Convenience Sampling

• Judgment Sampling

• Quota sampling

8


Convenience Sampling

• Researcher's convenience forms the basis for selecting a sample of units

– The administrators of a college have announced a sharp increase in tuition fees for the next year.

– A TV reporter covering this news item is shown standing on campus talking to several students, one at a time, about their reactions to the proposed tuition fee increase.

– TV Reporter says: “While some of the students feel that the 10 percent fee hike is justified, most of them consider it to be unfair.”


Judgment Sampling

• A procedure in which a researcher exerts some effort in selecting a sample that he or she believes is most appropriate for a study

• Example:

– The administrators of a college have announced a sharp increase in tuition fees for the next year

– A judgment sample of student officers may be more representative than a convenience sample of students

– The researcher should be knowledgeable about the ideal population for a study


Quota Sampling

• Involves sampling a quota of units to be selected from each population cell based on the judgment of the researchers and/or decision makers

• Steps:

– 1) Divide the population into segments (referred to as

cells) based on certain control characteristics

– 2) Determine the quota of units for each cell (quotas

are determined by the researchers and/or decision

makers)

– 3) Instruct the interviewers to fill the quotas assigned

to the cells


Quota Sampling Plan for the

Newspaper Subscriber Survey

Geographic

Segment Male Female

I 30 30

II 30 30

III 30 30

IV 30 30

V 30 30

Total sample size = 300

Gender


Quota Sampling Plan for a Survey of Attitudes

Toward Social Welfare Programs

Highest Education Level

Less than High School Some College

Age High School Diploma College Degree

18-30 100 100 100 100

31-45 100 100 100 100

46-60 100 100 100 100

Over 60 100 100 100 100

Total sample size = 1600


Parameter & Statistic

• Parameter

– The actual, or true, population mean value or

population proportion for any variable

• income, product ownership

• Statistic

– An estimate of a parameter from sample data

9


Sampling Error

• Sampling Error = Parameter Value -

Statistic Value

• Difference between a statistic value that is

generated through a sampling procedure and

the parameter value, which can be

determined only through a census study


Sampling Distribution

• Representation of the sample statistic values

obtained from every conceivable sample of

a certain size chosen from a population by

using a specified sampling procedure along

with the relative frequency of occurrence of

those statistic values


Sampling Distribution

µX SX

C


Table 11.4 Expenditures for Eating Out for

a Hypothetical Population

500 10

450 9

400 8

350 7

300 6

250 5

200 4

150 3

100 2

50 1

Annual expenditure for

eating out($)

Family Number


Table 11.5 Partial List of Possible

Samples and Sample Means

475 9,10

375 5,10;6,9;7,8

275 1,10;2,9;3,8;4,7;5,6

175 1,6;2,5;3,4

75 1,2

Sample Mean Values

($)

Samples of Two

Families


Exhibit 11.5 Sampling Distribution (Bar

Chart) for Simple Random Samples of Two

Units

Sample Mean Values ($)

475

450

425

400

375

350

325

300

275

250

225

200

175

150

125

100

75

6/45

5/45

4/45

3/45

2/45

1/45

0

10


Exhibit 11.6 Sampling Distribution

Shown as a Histogram

Sample Mean Values

500.0450.0400.0350.0300.0250.0200.0150.0100.0

Fr

q

u

e

n

c

y

o

f

O

c

cu

r

r

e

n

c

e

Population mean value

Normal

probability

distribution


Central Limit Theorem

Distribution Mean Standard

Deviation

Population

Sample x S

Sampling x Sx


Confidence Estimation for

Interval Data

n = number of units in the sample

X = sample mean value

Sx = s / n

S = standard deviation



Interval Data (Cont’d)

• Given n = 100, x = 1,278 units, and s = 399 units

• To Construct 95 percent confidence interval

s 399

sx = --- = ----- = 39.9 units

n 100

• The 95 percent confidence interval is

x ± 1.96 sx = 1,278 ± (1.96)(39.9) = 1,278 ±

78.204 = 1,278 ± 78,approximately



Interval Data (Cont’d)

• Interpretation

– From the sample data, we can be 95 percent

confident that the average annual sales of men's

suits, across all men's clothing stores in the

population, are between 1,200 and 1,356 units


Finding Confidence Intervals for

Population Proportions = true population proportion (i.e., the parameter value) Confidence Intervals for Population proportion: p - 1.96sp p + 1.96sp p = proportion obtained from a single sample (i.e., the statistic value) sp = estimate of the standard error of the sample proportion p =number of sample units having a certain feature total number of sample units (i.e., n) sp = p (1 - p) n

11



Population Proportions (Cont’d)

Given n = 100 and p = .64. To Construct a 95 percent

confidence interval for the population proportion

sp = p (1 - p)

n

(.64)(.36) = .048

100

The 95 percent confidence interval is

p ± 1.96 sp = .64 ± (1.96)(.048)

= .64 ± .09408

= .64 ± .09, approximately.



Population Proportions (Cont’d)

• Interpretation

– This confidence interval can also be expressed

in percentage terms: 64% ± 9%

– In other words, we can be 95 percent confident

that between 55 and 73 percent of all grocery

stores in the city carry potted plants


Factors Influencing Sample Size

• Desired precision level

• Desired confidence level

• Degree of variability

• Resources available


Methods for Determining Sample

Size

• The desired precision level

• The desired confidence level

• An estimate of the degree of variability in

the population, expressed in the form of a

standard deviation


Sample Size Estimation

• H-> Desired precision level

• q-> Desired confidence level

• S-> Sample Standard deviation

• N-> Population mean

zq2 s

2

N = ------

H2

zqs

H = ----

n


Sample Size Estimation (Cont’d)

• A marketing manager of a frozen-foods firm

wants to estimate within ±$10 the average annual

amount that families in a certain city spend on

frozen foods per year and have 99 percent

confidence in the estimate

• He estimates that the standard deviation of annual

family expenditures on frozen foods is about $100

• How many families must be chosen for this

study?

12


Sample Size Estimation (Cont’d)

H = $10, s = $100, and zq = 2.575

(corresponding to a confidence level of 99 percent)

n = (2.575)2(100)2 = 663 families,approximately

(10)2


Determining Sample Size

• A sporting goods marketer wants to estimate the proportion of tennis players among high school students in the United States

• The marketer wants the estimate to be accurate within ±.02 and wants to have 95 percent confidence in the interval estimate

• A pilot telephone survey of 50 high school students showed that 20 of them played tennis. Estimate the required sample size for the final study from the given data

• What should the sample size be if the desired precision and confidence levels are to be guaranteed?


Determining Sample Size (Cont’d) H = .02 and zq = 1.96. p = 20/50 =0.4

s = (20/50)(1 - 20/50) = (.4)(.6) = .24

z2q s

2 (l.96)2(.24 )2

n = ------------ = ------------------

H2 (.02)2

= 2,305 students, approximately

The maximum sample size is

.25z2q

nmax = ------------ = 2,401 students H2


Chapter 12

Quality

Control and

Initial Analysis

of Data


Chapter Objectives

• Define editing and distinguish between a field edit

and an office edit

• Define coding and outline the steps it involves

• Compute measures of central tendency and

dispersion of the data for each variable in a data

set

• State the potential uses of frequency distribution

or one- way tables


Data Analysis at Rockbridge

Associates: Data Integrity • Data integrity is the foundation for successful

marketing research

• Rockbridge ensures integrity in the collection and processing of the data by a number of quality control checks for

– mail surveys

– telephone surveys

– web surveys

• Rockbridge ensures data integrity in how the results are interpreted and explained to management

13


Editing

• Editing is the process of examining

completed data collection forms and taking

whatever corrective action is needed to

ensure the data are of high quality

– Preliminary or field edit

– Final or office edit


Field Edit

• A field edit, or preliminary edit, is a quick

examination of completed data collection forms,

usually on the same day they are filled out

• Objectives

– Ensure that proper procedures are being followed in

selecting respondents, interviewing them, and recording

their responses

– Fix fieldwork deficiencies before they turn into major

problems


Office Edit

• A final, or office edit, verifies response

consistency and accuracy

– Makes necessary corrections

– Determines whether some or all parts of a data

collection form should be discarded


What Is Wrong With this

Response…

• A respondent said he was 18 years old but

indicated that he had a Ph.D. when asked

for his highest level of education.


Editing Can Help Uncover

• Improper field procedures

• Incomplete interviews

• Improperly conducted interviews

• Technical problems with the questionnaire or interview

• Respondent rapport problems

• Consistency problems that can be isolated and reconciled


Improper Field Procedures

• Wrong questionnaire form used

• Interview inadvertently not taken

14


Incomplete Interviews

• Questions not asked

• Directions not followed (proper segments of

the questionnaire were not administered)


Improperly Conducted Interviews • The wrong respondent interviewed (e.g., son

instead of father)

• Questions misinterpreted by interviewer or respondent

• Evidence of bias or influencing of answers.

• Failure to probe for adequate answers or the use of poor probes

• Interviewer's illegible writing and/or style.

• Interviewer recorded information which identified a respondent whose anonymity should have been protected


Improperly Conducted Interviews

(Cont’d) • Interviewer apparently does not understand what

type of responses constitute an answer to the actual question asked

• Interviewer does not understand what the objective of the question is and thus accepts an improper frame of reference for the respondent's answer

• Other evidence of need for training or instructions to be given to interviewer

– failure to write down probes, wrong abbreviations, failure to follow directions


Technical Problems With the

Questionnaire or Interview • Space was not provided for needed information

• The presence of unanticipated or unusually frequent extreme responses to questions, indicating a possible need for rewording of certain questions

• Inappropriate or unworkable interviewer instructions not detected in the pretest

• The order in which questions were asked introduces confusion, resentment, or bias into the respondent's answers


Respondent Rapport Problems

• Frequent refusal to answer certain questions.

• Reports of abnormal termination of the interview

(or presence of hostility) due to sensitive questions

• Evidence that respondent and interviewer are

playing the "game" of "What answer do you want

me to give?"

• Evidence that the presence of other people in the

interview situation is causing problems


Consistency Problems That Can

Be Isolated and Reconciled • Contradictory answers

– reports no savings in one section of the interview but reports interest from bank accounts in another section

• Misclassification

– mortgage debt improperly reported as installment debt

• Impossible answers

– reports paying $600 for a new Edsel in 1970--the car should have been recorded as a "used" car; or weekly income reported on the income-per-month line

15


Consistency Problems That Can Be

Isolated and Reconciled (Cont’d)

• Unreasonable (and probably erroneous) responses

– Respondent reports borrowing $2000 for two years to

buy a car but reported monthly payments multiplied by

24 months are less than $2000

– Respondent reports that the house value is $90,000

while income is $2000 per year and the respondent

claims less than a high school education


Preventing Errors

• Careful planning before fieldwork begins

• Automating data entry


Coding

• Coding broadly refers to the set of all tasks

associated with transforming edited responses into

a form that is ready for analysis

• Steps

– Transforming responses to each question into a set of

meaningful categories

– Assigning numerical codes to the categories

– Creating a data set suitable for computer analysis


Transforming Responses into

Meaningful Categories

• A structured question is pre-categorized

• Responses to a nonstructured or open-ended

question to be grouped into a meaningful

and manageable set of categories


The Best Way to Treat "Don't

Know" Responses

• Infer an actual response –dubious validity

• Classify the "don't know's" as a separate

response category for each question


Missing-Value Category

• A missing value can stem from

– A respondent's refusal to answer a question

– An interviewer's failure to ask a question or

record an answer or a "don't know" that does

not seem legitimate

• Best way to treat missing value responses

– Sound questionnaire design

– Tight control over fieldwork

16


Assigning Numerical Codes

• Assign appropriate numerical codes to

responses that are not already in quantified

form

• To assign numerical codes, the researcher

should facilitate computer manipulation and

analysis of the responses


Coding Multiple Response

• Which of the following countries have you visited during the past 12 months?

________Canada

________England

________France

________Germany

________Japan

________Mexico

• Need six variables, each relating to a specific country and having two possible values --for example, 1= “No” and 2 = “Yes”

• Six columns must be set aside in the data spreadsheet to record responses to this question


Multiple Response Question –

Rank Order Question • Please rank the following fast-food restaurants by

placing a 1 beside the restaurant you think is best overall, a 2 beside the restaurant you think is second best, and so on. __________Burger King __________McDonald's __________Wendy's __________Whataburger

• This question requires as many variables (and columns) as there are objects to be ranked

• 4 separate variables are needed


Creating a Data Set

• Organized collection of data records

• Each sample unit within the data set is called a Case or Observation

• Structure of a Data Set

– The number of observations = n

– The total number of variables embedded in the questionnaire is m, then

• Data set = n x m matrix of numbers


Table 12.3 Structure of a Data Sheet

Variables

Observation 1 2 …… j …… m

1 x 11 x 12 x 1j x 1m

2 X 21 X 22 X 2j X 2m

…

i X i1 X i2 X ij X im

…

n X n1 X n2 X nj X nm

Respondent 1’s

response to variable 1.


Preliminary Data Analysis:

Basic Descriptive Statistics

• Preliminary data analysis examines the

central tendency and the dispersion of the

data on each variable in the data set

17


Measures of Central Tendency and

Dispersion for Different Types of Variables


Measurement Level of Data

Pertaining to Variable–Nominal

• Measures of Central Tendency

– Mode: Most frequently occurring response

• Measures of Dispersion

– Strictly speaking, the concept of dispersion is

not meaningful for nominal data

– An idea about the distribution of responses can

be obtained by examining their relative

frequencies of occurrence



Pertaining to Variable –Ordinal


– Median: 50th percentile response


– Range: Defined by the highest and lowest

response values

– Interquartile range: Difference between the

75th and 25th percentile responses



Pertaining to Variable– Interval


– Mean: Arithmetic average of response values


– Standard deviation: As defined in Chapter 9



Pertaining to Variable– Ratio


– Mean: Arithmetic average of response values


– Standard deviation: As defined in Chapter 9


Mode

• The value that occurs most frequently

18


Table 12.5 How Long Have You Been

Using the Services of National? –

Computing Mode

Assigned

Count/

Length of Service

(USE) Value Frequency

Less than 1 year 1 36

1 to less than 2 years 2 16

2 to less than 5 years 3 26

5 years or more 4 193 (Mode = 4 most occurring value)

Total 271




Computing Mode (Cont’d)

In SPSS: 1. Select ANALYZE;

2. Click DESCRIPTIVE STATISTICS,

3. Select FREQUENCIES,

4. Move the variable “USE” to the Variable(s) box,

5. Click STATISTICS box,

6. Select MODE,

7. Click CONTINUE, and

8. Click OK.


Table 12.5 How Long Have You Been Using the

Services of National? –Computing Mode (Cont’d)




Computing Mode (Cont’d)

1= Less than a year

2 = 1 to less than 2 years

3 = 2 to less than 5 years

4 = 5 years or more

most frequently occurring value

= mode = 4


Median

• The observation below which 50 percent of

the observations fall


Table 12.6 Length of Time Service Used –

Responses from 20 Customers How long have you been using the services of National?

4 3 4 1 4 4 4 4 4 4 3

4 4 3 4 4 4 3 1 1

1= Less than a year; 2 = 1 to less than 2 years; 3 = 2 to less than 5 years;

4 = 5 years or more

Arranging the 20 values in ascending order:

1 1 1 3 3 3 3 4 4 4 4

4 4 4 4 4 4 4 4 4

Because the sample size = 20, there are two middle values: 4 and 4. The

median is, therefore, the average of the two middle values = 4.

19


Table 12.7 Computing Median

for Length of Time Service Used

In SPSS:

1. Select ANALYZE;

2. Click DESCRIPTIVE STATISTICS,

3. Select FREQUENCIES,

4. Move the variable “USE” to the Variable(s) box,

5. Click STATISTICS box,

6. Select MEDIAN,

7. Click CONTINUE, and

8. Click OK.


Table 12.7 Computing Median for

Length of Time Service Used (Cont’d)


Table 12.7 Computing Median for

Length of Time Service Used (Cont’d)


Mean

n = Number of units in the sample

xi = data obtained from each sample unit I

x = sample mean value, given by

n

(xi )

---------

i=1 n


Table 12.8 Overall Quality of Services Provided

by National– Computing Mean

On a scale of 1 to 10, how would you rate the overall quality of service

provided by National?

Extremely Extremely

Poor Good

1 2 3 4 5 6 7 8 9 10

In SPSS

1. Select ANALYZE

2. Click DESCRIPTIVE STATISTICS

3. Select FREQUENCIES

4. Move the variable “OQ- Labeled as OVERALL SERVICE

QUALITY” to the Variable(s) box

5. Click STATISTICS box

6. Select MEAN, MEDIAN, AND MODE

7. Click CONTINUE

8. Click OK


Table 12.8 Overall Quality of Services Provided by

National– Computing Mean (Cont’d)

20



by National– Computing Mean (Cont’d)

Since the level of measurement is interval

scale, we can compute mean, median and

mode. Since the distribution is skewed to

the left, the mean is influenced by smaller

values than the median. Therefore the mean

is smaller than median. The median is

smaller than mode.


Measures of Dispersion

• Range

• Variance

• Standard Deviation


Range

• Range is the difference between the largest

and smallest value

• The simplest measure of dispersion


Variance

• Variance of a set of data is a measure of

deviation of the data around the arithmetic

mean

(xi –x )2

S2 = ----------

n-1


Standard Deviation

• Standard deviation is the square root of the

variance

n

(xi –x )2

i=1----------

n-1


Table 12.9 Overall Quality of Services Provided by

National: Computing Range, Variance, and Standard

Deviation On a scale of 1 to 10, how would you rate the overall quality of service

provided by National?

Extremely Extremely

Poor Good

1 2 3 4 5 6 7 8 9 10

In SPSS

1. Select ANALYZE



4. Move the variable “OQ- Labeled as OVERALL SERVICE QUALITY”

to the Variable(s) box

5. Click STATISTICS box

6. Select STANDARD DEVIATION, VARIANCE, and RANGE 7. Click CONTINUE

8. Click OK

21



by National: Computing Range, Variance, and

Standard Deviation (Cont’d)


Standard deviation is square root of variance =

2.33

Variance =5.43

Range = highest value-lowest value = 10-1 = 9


by National: Computing Range, Variance, and

Standard Deviation (Cont’d)


Frequency Distribution: One-

Way Tabulation

• One-way tabulation is a table showing the

distribution of data pertaining to categories

of a single variable


Table 12.10 Age and Length of

Time Service Used

• In SPSS:

1. Select ANALYZE



4. Move the variable “AGE” to the Variable(s) box

5. Click CHARTS box

6. Select BAR CHARTS

7. Click on CONTINUE

8. Click OK


Table 12.10 Age and Length of Time

Service Used (Cont’d)




22














Why Averages May be

Misleading

• Researchers tested a new sauce product and

found

– Mean rating of the taste test was close to the

middle of the scale, which had "very mild" and

"very hot" as its bipolar adjectives

• Researcher’s conclusion

– Consumers need really neither really hot nor

really mild sauce


Why Averages May be

Misleading (Cont’d)

• Deeper examination revealed

– The existence of a large proportion of consumers who wanted the sauce to be mild and an equally large proportion who wanted it to be hot nor really mild sauce

• Moral of the story:

– A clear understanding of the distribution of responses can help a researcher avoid erroneous inferences

23


Chapter 13

Hypothesis

Testing


Chapter Objectives

• Distinguish between descriptive analysis and

inferential analysis.

• State the null and alternative hypotheses

pertaining to a variety of decision situations

requiring formal hypothesis testing.

• Define Type I and Type II errors and state the

relationship between them.

• Define significance level and power of a

hypothesis test.


Chapter Objectives (Cont’d)

• Lay out the steps involved in conducting a

hypothesis test

• Interpret two-way tabulation and a chi-square

contingency test

• Use the appropriate test pertaining to hypotheses

involving a single mean, a single proportion, two

means (when the two samples are independent

and when they are dependent), and two

proportions


Hypothesis Testing: Key to Actionable

Strategies By Dave Moxley,

President… • We start all research projects with in-depth

interviews of the business heads generating

hypotheses or hunches about the topic being

researched.

• Involve the business leaders in the early hypothesis generation


Client-Researcher Involvement –

Dave Moxley

• Ensure to ask necessary questions to collect the

data for testing relevant assumptions

• Increase business buy-in to the process as a full

project partner,thereby dramatically increasing the

likelihood of subsequent market action

• Improve the image of the research function as an

integrated and valued contributor to the strategic

direction and tactical program implementation of

the business


Hypotheses Testing-Dave Moxley

• Oversimplified or incorrect assumptions must be

subjected to more formal hypothesis testing

24


Interesting Hypotheses – Dave Moxley

• Bankers assumed high-income earners are more profitable than low-income earners

• Clients who carefully balance their checkbooks every month and minimize fees due to overdrafts are unprofitable checking account customers

• Old clients were more likely to diminish CD balances by large amounts compared to younger clients

– This was nonintutive because conventional wisdom suggested that older clients have a larger portfolio of assets and seek less risky investments


Data Analysis

• Descriptive

– Computing measures of central tendency and

dispersion,as well as constructing one-way tables

• Inferential

– Data analysis aimed at testing specific hypotheses is

usually called inferential analysis


Null and Alternative Hypotheses

H0 -> Null Hypotheses

Ha -> Alternative Hypotheses

• Hypotheses always pertain to population

parameters or characteristics rather than to sample

characteristics. It is the population, not the sample,

that we want to make an infernece about from

limited data


Steps in Conducting a Hypothesis Test

• Step 1. Set up H0 and Ha.

• Step 2. Identify the nature of the sampling

distribution curve and specify the appropriate test

statistic.

• Step 3. Determine whether the hypothesis test is

one-tailed or two-tailed.


Steps in Conducting a Hypothesis Test

• Step 4. Taking into account the specified significance level, determine the critical value (two critical values for a two-tailed test) for the test statistic from the appropriate statistical table.

• Step 5. State the decision rule for rejecting H0.

• Step 6. Compute the value for the test statistic from the sample data.

• Step 7. Using the decision rule specified in step 5, either reject H0 or reject Ha.


Launching a Product Line Into a New

Market Area

• Karen, product manager for a line of apparel, to

introduce the product line into a new market area

• Survey of a random sample of 400 households in

that market showed a mean income per household

of $30,000.Karen strongly believes the product

line will be adequately profitable only in markets

where the mean household income is greater than

$29,000. Should Karen introduce the product line

into the new market?

25


Karen’s Criterion for Decision

Making

• To reach a final decision, Karen has to make a

general inference (about the population) from the

sample data

• Criterion-- mean income across across all

households in the market area under consideration

• If the mean population household income is

greater than $29,000, Karen should introduce the

product line into the new market


Karen’s Hypothesis

• Karen’s decision making is equivalent to either

accepting or rejecting the hypothesis:

– The population mean household income in the new

market area is greater than $29,000


One-Tailed Hypothesis Test

• The term one-tailed signifies that all - or z-values

that would cause Karen to reject H0, are in just one

tail of the sampling distribution

-> Population Mean

H0: $29,000

Ha: $29,000


Type I and Type II Errors

• Type I error occurs if the null hypothesis is

rejected when it is true

• Type II error occurs if the null hypothesis is not rejected when it is false


Significance Level

• -> Significance level --The upper-bound

probability of a Type I error

• 1 - ->confidence level -- the complement of significance level


InferenceBased on

Sample Data

Real State of Affairs

H0 is True H0 is False

H0 is True

Correct decisionConfidence level

= 1-

Type II error

P (Type II error) =

H0 is False

Type I errorSignificance level

= *

Correct decision

Power = 1-

*Term represents the maximum probability ofcommitting a Type I error

Summary of Errors Involved in

Hypothesis Testing

26


Level of Risk

• Two firms considering introducing a new product

that radically differs from their current product

line

– Firm ABC

• Well-established customer base, distinct reputation for its

existing product line

– Firm XYZ

• No loyal clientele, no distinct image for its present

products

Which of these two firms should be more cautious

in making a decision to introduce the new

product? Copyright © by Houghton Mifflin Company, Inc. All rights reserved 13-152

Scenario - Firms ABC & XYZ

• Firm ABC

– ABC should be more cautious

• Firm XYZ

– XYZ should be less cautious


Identifying the Critical Sample Mean Value--

Sampling Distribution Sample mean (x) values greater than $29,000--that is x-values on the right-hand side

of the sampling distribution centered on µ = $29,000--suggest that H0 may be false.

More important the farther to the right x is , the stronger is the evidence against H0


Karen’s Decision Rule for Rejecting

the Null Hypothesis

• Reject H0 if the sample mean exceeds xc


Every mean x has a corresponding equivalent

standard Normal Deviate:

The expression for z

x-

Z = ---------

sx

x = + zsx

Substituting xc for x and zc for z

xc = + zcsx where zc is standard normal deviate

corresponding to the critical sample mean, xc.

Criterion Value


Computing the Criterion Value

Standard deviation for the sample of 400 households is

$8,000. The standard error of the mean (sx ) is given by

S

s = ---- = $400

n

Critical mean household income xc through the

following two steps: 1. Determine the critical z-value, zc. For =.05, From

Appendix 1, zc = 1.645.

2. Substitute the values of zc, s, and (under the assumption

that H0 is "just" true ), xc = + zc s = $29,658.

x

27


Karen’s Decision Rule

• If the sample mean household income is greater

than $29,658, reject the null hypothesis and

introduce the product line into the new market

area.


Test Statistic

The value of the test statistic is simply the z-value

corresponding to = $30,000.

x-

Z = ------ = 2.5

s


Critical Value for Rejecting the Null

Hypothesis


P - Value – Actual Significance Level

• The probability of obtaining an x-value as high

as $30,000 or more when is only $29,000 =

.0062.

• This value is sometimes called the actual

significance level, or the p-value

• The actual significance level of .0062 in this case

means the odds are less than 62 out of 10,000 that

the sample mean income of $30,000 would have

occurred entirely due to chance (when the

population mean income is $29,000 or less).


T-test

Conduct T-Test when sample is small.

Let the sample size, n = 25

X = $30,000 , s = $8,000

From the t-table in Appendix 3, tc = 1.71 for = .05 and

d.f. = 24.

Decision rule: “Reject H0 if t 1.7l.”


T-test (Cont’d)

The value of t from the sample data:

S = 8000/25 = $1,600

x-

t = ------ = 0.625

sx

The computed value of t is less than 1.71, H0 cannot

be rejected.

Karen should not introduce the product line into the

new market area.

28


Two-Tailed Hypothesis Test

• Two-tailed test is one in whichvalues of the test

statistic leading to rejectioin of the null hypothesis

fall in both tails of the sampling distribution curve

H0: = $29,000

Ha: $29,000


Test of Two Means • A health service agency has designed a public

service campaign to promote physical fitness and

the importance of regular exercise. Since the

campaign is a major one, the agency wants to

make sure of its potential effectiveness before

running it on a national scale.

– To conduct a controlled test of the campaign’s

effectiveness, the agency needs two similar cities.

– The agency identified two similar cities:

• city 1 will serve as the test city

• city 2 will serve as a control city


Test of Two Means

• Random survey of 300 adults in city 1,200 adults

in city 2 was conducted to measure the average

time per day a typical adult in each city spent on

some form of exercise.

– Results of the survey : average was 30 minutes per day

(with a standard deviation of 22 minutes) in city 1 and

35 minutes per day (with a standard deviation of 25

minutes) in city 2.

• Question:

– From these results, can the agency conclude confidently

that the two cities are well matched for the controlled

test?


City 1: n1 = 300 x1 = 30 s1 = 22

City 2: n2 = 200 x2 = 35 s2 = 25

The hypotheses are

H0: 1 =2 or 1 -2 = 0

Ha: 1 2 or 1 -2 0

Basic Statistics and Hypotheses


Test statistic is the z-statistic, given by

(x1 - x 2) - (1 - 2 )

z = -------------------------------

s12/n1 + s2

2/n2

n1 and n2 are greater than 30.

The z-statistic can therefore be used as the test statistic.

Test Statistic


Decision – Two-Tailed Test

• For Two-Tailed tests

– Identify two critical values of z, one for each tail of the

sampling distribution.

– The probability corresponding to each tail is .025, since

= .05.

– From the Normal Table, the z-value, for /2 =.025 is

1.96.

• Decision rule : “Reject H0 if z -1.96 or if z

1.96.”

29


Computing the value of z from the survey results

and under the customary assumption that the null

hypothesis is true (i.e., 1 - 2 = 0):

(30 - 35) - (0)

z = --------------------------------- = -2.29

(22)2/300 + (25)2/200

Since z -1.96, we should reject H0.

Computing Z-value – Two-Tailed Test


Hypothesis Test Related to Mean Exercising in

Two Cities


Test statistic

(x1 - x2) - (1 - 2 )

t = -------------------------

s* ( 1/n1 + 1/n2 )

with d.f. = n1 + n2 - 2. In this expression, s* is the pooled

standard deviation, given by

(n1 – 1)s12 + (n2 – 1)s2

2

s* = ---------------------------------

n1 + n2 - 2

T- Test for Independent Samples


n1 = 20 x1 = 30 s1 = 22

n2 = 10 x2 = 35 s2 = 25

The degrees of freedom for the t-statistic are

d.f. = 28

Critical value of t with 28 d.f for a tail probability

of .025 is 2.05.

Decision rule : “Reject H0 if t -2.05 or if t

2.05." The pooled standard deviation is

s* = 529 (approximately) = 23

T- Test for Independent Samples- Two

Cities


The test statistic is

t = -.56

Since t is neither less than -2.05 nor greater than 2.05,

we cannot reject H0

The sample evidence is not strong enough to conclude

that the two cities differ in terms of levels of

exercising activity of their residents.

T- Test for Independent Samples


National Insurance Company Study –

Perceived Service Quality Differences

Between Males and Females

• Test of Two Means Using the SPSS T-TEST

Program

– On the 10-point scale, males gave a mean rating of

approximately 7.87, while females gave a mean rating

of approximately 7.83.

30




Between Males and Females • In SPSS,

1. Select ANALYZE from the menu,

2. Click COMPARE MEANS

3. Select INDEPENDENT-SAMPLES T -TEST

4. Move “OQ – Over all Service Quality” to the “TEST

VARIABLES(S)” box

5. Move “gender” to “GROUPING VARIABLE” box

6. DEFINE GROUPS (SEX = 1 for male and 2 for

female)

7. Click OK.


OQ – Overall Perceived Service Quality Gender – Sex = 1 for male

Sex = 2 for female





Group Statistics

137 7.87 2.26 .19

126 7.83 2.31 .21

gender

male

f emale

OQ

N Mean Std. Dev iation

Std. Error

Mean





F-Test--to see if the variance of the 2 groups are

assumed to be equal p-value = .210 --> null

hypothesis cannot be rejected at = 0.05

P-value > = 0.05 -- Do not Reject,

Equal variance assumed is correct

Use this row

when the null

hypothesis of

equality of

variance is

rejected





P-value=.88 is greater than

the = of 0.05.

Do not reject Ho.

The p-value implies that the odds are 88 to 100 that a difference of

magnitude .04 (i.e., 7.87 - 7.83) could have occurred from chance.

The null hypothesis cannot be rejected at the customary

significance level of .05.





Test of Two Means When Samples

Are Dependent

• The need to check for significant differences

between two mean values when the samples are

not independent

31



Are Dependent

• A retail chain ran a special promotion in a

representative sample of 10 of its stores to boost

sales.

• Weekly sales per store before and after the

introduction of the special promotion are shown

• Did the special promotion lead to a significant

increase in sales ?


Sales Per Store Before and After a

Promotional Campaign Sales per Store (In Thousands)

StoreNumber (i)

BeforePromotion(xbi )

AfterPromotion(xai )

Change inSales (InThousands)xdi = xai - xbi

1 250 260 10

2 235 240 5

3 150 151 1

4 145 140 -5

5 120 124 4

6 98 100 2

7 75 70 -5

8 85 95 10

9 180 200 20

10 212 220 8

Total 50


One-Tailed Hypothesis Test:

H0: d 0; Ha: d 0.

The sample estimate of d is xd, given by n

Xdi i=1

xd = -----

n

where n is the sample size.

xd = 50/10 = 5


Are Dependent


Test statistic is

xd -

t = ----------- = 2.10

s/n


Are Dependent


Standard deviation (s) = 7.53, = 0.05,

tc for 9 d.f = 1.83 from the Appendix 3

Decision rule: “Reject H0 if t 1.83.”

Test Statistic, t 1.83, we reject H0 and conclude that

the mean change in sales per store was significantly

greater than zero.

The special promotion was indeed effective.


Are Dependent


Hypothesis Test Related to Change in

Weekly Sales Per Store

32


Test for a Single Proportion

• Ms.Jones wants to substantially increase the firm's

advertising budget--The firm sells a variety of

personal computer accessories

• Random sample : 20 / 100 know the brand name

• True awareness rate for the brand name across all

personal computer owners is less than .3

• Should Ms. Jones increase the advertising budget

on the basis of survey results?



• Need to test the population proportion ( is the

symbol for population proportion) of personal

computer owners who are aware of the brand:

H0: .3

Ha: .3


The test statistic:

p -

Z = ---------------------

(1- )/n

where p is the sample proportion.

From the Normal Table, zc, = -1.645 for = .05.

Decision rule here is: “Reject Ho if z - 1.645.”

p = .2, = .3, and n = 100, z = -2.174



Since -2.174 -1.645, we reject H0;

The sample awareness rate of .2 is too low to support

the hypothesis that the population awareness rate is .3 or

more.

The actual significance level (p-value) corresponding to

z = -2.174 is approximately .015 (from Appendix 1).

Level of significance implies that the odds are lower

than 15 in 1,000 that the sample awareness rate of .2

would have occurred entirely by chance(that is, when

the population awareness rate is .3 or higher).



Hypothesis Test Related to Proportion

of Personal Computer Owners


Test of Two Proportions: Choosing

Between Commercial X & Commercial Y

For a New Product

Tom, advertising manager for a frozen-foods, company, is

in the process of deciding between two TV commercials, X

and Y for a new frozen food to be introduced

– Commercial X

• Runs for 20 seconds

• Random sample: 20 % awareness out of 200 respondents

– Commercial Y

• Runs for 30 seconds

• Random sample:25 % awareness out of 200 respondents

33


Test of Two Proportions (Cont’d)

• Question:

– Can Tom conclude that commercial Y will be more

effective in the total market for the new product?


Criterion for Decision Making

• To reach a final decision, Tom has to make a

general inference (about the population) from the

sample data

• Criterion-- relative degrees of awareness likely to

be created by the 2 commercials in the population

of all adult consumers

• Tom should conclude that commercial Y is more

effective than commercial X only if the anticipated

population awareness rate for commercial Y is

greater than that for X.


Hypothesis

• Tom’s Decision making is equvalent to either

accepting or rejecting the hypothesis:

– The potential awareness rate that commercial Y can

generate among the population of consumers is greater

than that which commercial X can generate


Commercial Commercial

X Y

Sample sizes: n1 = 200 n2 = 200

Sample proportions: p1 = .25 p2 = .20

The hypotheses are

H0: 1 2 or 1 - 2 0

Ha: 1 2 or 1 - 2 0

Null and Alternative Hypotheses


(p1 – p2) - (1 - 2)

z = ------------------------

p1 - p2 -- is estimated by the sample

standard error formula

Sample Standard Error

sp1 - p2 = PQ ( 1/n1 + 1/n2)

n1p1 + n2p2

P = -------------------

n1 + n2

Q = 1 - P

Test of Two Proportions-- Sample

Standard Error


For =.05, the critical value of z (from Appendix 1)

is 1.645.

Decision rule: “Reject H0 if z 1.645.”

First to compute P and Q, then sp1 - p2 and z:

200(.25) + 200(.2)

P = ----------------------- = .225

200 + 200

Q = 1 - .225 = .775

Test of Two Proportions

34


sp1 - p2 = (.225)(.775) (1/200 + 1/200)

=0.042

(.25 - .20) - (0)

z = ---------------------- = 1.19

.042

Since z 1.645, we cannot reject H0.

The sample evidence is not strong enough to suggest that

commercial Y will be more effective than commercial X.

Test of Two Proportions


Hypothesis Test Related to Awareness

Generated by Two Commercials


Cross-Tabulations: Chi-square

Contingency Test

• Technique used for determining whether there is a

statistically significant relationship between two

categorical (nominal or ordinal) variables


Telecommunications Company

• Marketing manager of a telecommunications

company is reviewing the results of a study of

potential users of a new cell phone

– Random sample of 200 respondents

• A cross-tabulation of data on whether target consumers

would buy the phone (Yes or No) and whether the cell

phone had access to the Internet (Yes or No)

• Question:

– Can the marketing manager infer that an association

exists between Internet access and buying the cell

phone?


Two-Way Tabulation of Internet Access

and Whether they Would Buy the

Cellular Phone

InternetAccess

Would Buy the Cellular Phone Yes No Total

Yes 80(80%) 20(20%) 100

No 20(20%) 80(80%) 100

Total 100(100%) 100(100%) 200


H0: There is no association between Internet access and

buying the cell phone (the two variables are

independent of each other).

Ha: There is some association between Internet access

and buying the cell phone (the two variables are not

independent of each other).

Cross Tabulations - Hypotheses

35


Conducting the Test

• Test involves comparing the actual, or observed,

cell frequencies in the cross-tabulation with a

corresponding set of expected cell frequencies(Eij)


Expected Values

ninj

Eij = -----

n

where ni and nj are the marginal frequencies, that

is, the total number of sample units in category i

of the row variable and category j of the column

variable, respectively


Computing Expected Values

The expected frequency for the first-row, first-

column cell is given by

100 100

E11 = ------------ = 50

200


Observed and Expected Cell

Frequencies InternetAccess

Would Buy the Cellular Phone Yes No Total

Yes 80(50) 20(50) 100

No 20(50) 80(50) 100

Total 100 100 200

Note: In each cell ij the number without parentheses is the

observed cell frequency (0ij) and the number in parentheses is

the expected cell frequency (Eij).


where r and c are the number of rows and columns, respectively,

in the contingency table. The number of degrees of freedom

associated with this chi-square statistic are given by the product

(r - 1)(c - 1).

r c (Oij - Eij)2

2 = -----------------

i=1 j=1 Eij

= 72.00

Chi-square Test Statistic


For d.f. = 1, Assuming =.05, from Appendix 2, the

critical chi-square value (2c) = 3.84.

Decision rule is-- “Reject H0 if 2 3.84.”

Computed 2 = 72.00

Since the computed Chi-square value is greater than

the critical value of 3.84, reject H0.

The apparent relationship between "Internet access"and

"would buy the cellular phone" revealed by the sample

data is unlikely to have occurred because of chance

Chi-square Test Statistic in a

Contingency Test

36


Interpretation

• The actual significance level associated with a chi-

square value of 72 is less than .001 (from

Appendix 2). Thus, the chances of getting a chi-

square value as high as 72 when there is no

relationship between Internet access and purchase

of cell phones are less than 1 in 1,000.


Cross-Tabulation Using SPSS for

National Insurance Company

• One crucial issue in the customer survey of

National Insurance Company was how a

customer's education was associated with whether

or not she or he would recommend National to a

friend.


Need to Conduct Chi-square Test to

Reach a Conclusion

• The hypotheses are:

– H0:There is no association between educational level

and willingness to recommend National to a friend (the

two variables are independent of each other).

– Ha:There is some association between educational level

and willingness to recommend National to a friend (the

two variables are not independent of each other).


Association Between Education and

Customer’s Willingness to recommend

National to a Friend For two-way tabulation:

1. Select ANALYZE on the SPSS menu,

2. Click on DESCRIPTIVE STATISTICS,

3. Select CROSS-TABS.

4. Move the “highest level of schooling” to ROW(S) box,

5. Move “rec” variable to “COLUMN(S) box.

6. Click on CELLS,

7. Select OBSERVED, and ROW PERCENTAGES.

8. Click CONTINUE and

9. Click OK.


Association Between Education and Customer’s

Willingness to recommend National to a Friend


COUNT

represents the

actual number of

customers in each

cell. The

percentages are

based on the

corresponding



37





For Chi-Square Assessment:

1. Select ANALYZE

2. Click on DESCRIPTIVE STATISTICS

3. Select CROSS-TABS

4. Move the variable “highest level of schooling” to

ROW(s) box

5. Move “rec” to COLUMN(s) box;

6. Click on “STATISTICS”

7. Select CHI-SQUARE, CONTINGENCY

COEFFICIENT, and CRAMER’S V

8. Click on CELLS,

9. Select OBSERVED and EXPECTED FREQUENCIES

10.Click CONTINUE

11.Click OK.

National Insurance Company Study -

Chi-Square Test


National Insurance Company Study -

Chi-Square Test


Interpret

the Table

National Insurance Company Study--

Expected Frequency Table


Computed Chi-

square value

P-value

National Insurance Company Study


National Insurance Company Study --

P-Value Significance

• The actual significance level (p-value) = 0.019

• the chances of getting a chi-square value as high

as 10.007 when there is no relationship between

education and recommendation are less than 19 in

1000.

• The apparent relationship between education and

recommendation revealed by the sample data is

unlikely to have occurred because of chance.

• Jill and Tom can safely reject null hypothesis.

38


Precautions in Interpreting Cross

Tabulation Results

• Two-way tables cannot show conclusive evidence

of a causal relationship

• Watch out for small cell sizes

• Increases the risk of drawing erroneous inferences

when more than two variables are involved


Patients whojog

Patients whodo not jog

Patients withheart disease

20 40

Patientswithout heartdisease

80 60

100 100

Is there a causal relationship between Patients who jog and

Patients with hearth disease ?

Two-way Table Based on a Survey of

200 Hospital Patients:


Chapter 14

Examining

Associations:

Correlation

and Regression


Chapter Objectives

• Compute the Spearman correlation coefficient

between ordinal scaled variables and determine

whether or not it is statistically significant

• Compute the Pearson correlation coefficient

between two variables and assess its statistical

significance

• Explain simple regression analysis and state the

distinction between a dependent variable and an

independent variable



• Describe common indicators for checking

the usefulness of a regression equation

• Discuss practical applications of regression

analysis

• Interpret the results of a multiple regression

analysis


Did You Know That Experienced Women

in High-tech Jobs Earn More Than Men?

• General Belief: Men on an average earn more than women in similar occupations

– IEEE-USA: Survey results showed that in the electrotechnology and information-technology fields professional women with 20+ years of experience earned significantly more than men with similar experience

– Regression analysis revealed that gender and experience, along with ethnic background , were significantly related to income levels in the high-tech sector

39


Did You Know That Parents’ Education

May Have a Bearing on Children’s GPA’s?

• A study of high schools in Alberta, Canada,

showed a statistically significant, positive

association between parents’ education

levels and children’s grades

• Regression analysis revealed that 11

percent of the variation in student’s grades

could be attributed to differences in parents’

education levels


Did You Know That University Students’

Gender and Age May Be Unrelated To Their

Grades In An Introductory Marketing Course?

• The most important predictors of grades in an introductory

marketing course were

– Overall GPA

– Whether the student transferred to the university from a

community college

– Number of hours the student worked per week

• Regression analysis revealed that the predictor variables,

such as gender, age, and participation in extracurricular

activities showed no significant relationship to course

grades


Overview of Techniques for

Examining Associations

• Spearman Correlation Coefficient Technique

• The technique is appropriate when

– The degree of association between two sets of ranks (pertaining to two variables) is to be examined

• Illustrative Research Question(s) This Technique Can Answer: – Is there a significant relationship between motivation levels of

salespeople and the quality of their performance?

• Assume that the data on motivation and quality of performance are in the form of ranks, say, 1through 20, for 20 salespeople who were evaluated subjectively by their supervisor on each variable



Examining Associations (Cont’d)

• Pearson Correlation Coefficient Technique

• This technique is appropriate when

– The degree of association between two metric-scaled

(interval or ratio) variables is to be examined

• Illustrative Research Question(s) This Technique

Can Answer:

– Is there a significant relationship between customers'

age (measured in actual years) and their perceptions of

our company's image (measured on a scale of 1to 7)?




• Simple Regression Analysis Technique


– A mathematical function or equation linking

two metric-scaled (interval or ratio) variables is

to be constructed, under the assumption that

values of one of the two variables is dependent

on the values of the other



Examining Associations–Simple

Regression Analysis (Cont’d)

• Illustrative Research Question(s) this Technique Can Answer:

– Are sales (measured in dollars) significantly affected by advertising expenditures (measured in dollars)?

– What proportion of the variation in sales is accounted for by variation in advertising expenditures? How sensitive are sales to changes in advertising expenditures?

40




• Multiple Regression Analysis Technique


– Under the same conditions as simple regression

analysis except that more than two variables are

involved wherein one variable is assumed to be

dependent on the others




• Illustrative Research Question(s) this Technique Can Answer:

– Are sales significantly affected by advertising expenditures and price (where all three variables are measured in dollars)?

– What proportion of the variation in sales is accounted for by advertising and price? How sensitive are sales to changes in advertising and price?


Spearman Correlation Coefficient

A Spearman correlation coefficient is a measure of

association between two sets of ranks

di = the difference between the ith sample unit's ranks on the

two variables

n = the total sample size

n

6 d2

i

i =1

rs = 1 - ----------------------------

n(n2 - 1)


Scenario: Industrial Marketing Firm

• An industrial marketing firm has been hiring all its salespeople from among the graduates of 10 business schools in the vicinity of its headquarters

• The firm developed a subjective ranking of the perceived prestige levels of the 10 schools and the performance levels of the groups of graduates recruited from these schools

• Question:

– What is the degree of association between the prestige levels of the schools and the sales performance levels of their graduates hired by this company?


Table 14.2 Association Between School

Prestige and Performance of Graduates BusinessSchool

(i)

Ranking ofSchool'sPrestige

(SPi)

Ranking ofPerformanceof School'sGraduates

(GPi)

DifferenceBetweenRanks(di =

SPi-GPi)

SquaredDifference

(di2)

1 10 8 2 4

2 7 3 4 16

3 9 7 2 4

4 1 2 -1 1

5 6 9 -3 9

6 2 4 -2 4

7 3 5 -2 4

8 8 10 -2 4

9 5 6 -1 1

10 4 1 3 9

di2 = 56


(6)(56)

rs = 1 - ---------------- = .661

10(100 - 1)

Hypotheses

H0: s = 0

Ha: s 0

Spearman Correlation Co-efficient

41


n – 2

t = rs ---------- = 2.49

1 - rs2

t - Distribution

• For = .05, t for 8 degrees of freedom (d.f. = n - 2

= 10- 2 = 8) tc = +2.31 and -2.31

• Decision Rule:

– “Reject H0 if t 2.31 or if t -2.31.”

– Since t > 2.31, we reject H0 and conclude that there is

a true association between the prestige of business

schools and the job performance of its graduates.In

other words, the sample correlation of .661 is unlikely

to have occurred because of chance.


The Pearson correlation coefficient is the degree of association

between variables that are interval-or ratio-scaled.

Pearson correlation coefficient (rxy) between them is given by

n = sample size (total number of data points)

X and Y = means

Xi and Yi = values for any sample unit i

sx and sy = standard deviations

n

i = 1 (Xi – X)(Yi – Y)

rxy = ----------------------------- (n-1) sx sy

Pearson Correlation Coefficient


Market Area Dollar Sales of

Bright (in

Thousands)

Advertising

Expenditure

for Bright ($

in 100)

Number of

Competing

detergents

1 5 5 15

2 10 13 8

3 6 5 14

4 20 15 5

5 15 10 9

6 9 9 10

7 11 5 12

8 18 13 4

9 22 17 6

10 7 6 13

11 24 19 2

12 14 12 8

13 16 15 6

14 17 14 7

15 23 18 1

16 8 7 11

17 12 10 10

18 13 12 7

19 21 16 7

20 9 16 3

Bright Detergent Data


Scatter Diagram

• Plot in a two-dimensional graph

• Indicates how closely and in what fashion

the variables are associated


Exhibit 14.1 Scatter Diagram of Sales and

Advertising Data

Advertising Expenditures for Bright ($)

200018001600140012001000800600400

Do

llar

Sa

les

of

Bri

gh

t (T

ho

usa

nd

s)

30

20

10

0

What is the relationship between dollar sales and

advertising expenditure ? Copyright © by Houghton Mifflin Company, Inc. All rights reserved 14-246

Exhibit 14.2 Scatter Diagram of Sales and

Number of Competing Brands

Num ber of Competing Detergents

1614121086420

Dol

lar S

ales

of B

right

(Tho

usan

ds)

30

20

10

0

What is the relationship between dollar sales and number of

competing detergents ?

42


Pearson Correlation

• Correlation between sales and advertising is

.927

• Correlation between sales and number of

competing brands is .910


Two-Tailed Hypothesis Test For

Correlations

• H0: = 0;

• Ha: 0,

• For = .05, 19 degrees of freedom(d.f.= n -

1 = 19) rc = + .433 and rc = -.433

• Decision rule is: “Reject H0 if r .433 or if

r -.433.”

• Reject H0 in both cases


Exhibit 14.3 Scatter Diagram Showing a

Nonlinear Association Between Variables

X

14121086420

Y

70

60

50

40

30

20


National Insurance Company– Computing

Pearson Correlation Among Service Quality

Constructs

• National Insurance Company was interested in the

correlations between respondents’ overall service-

quality perceptions (on the 10-point scale) and

their average ratings along each of the five

dimensions of Service Quality




Constructs (Cont’d)

1. Click ANALYZE

2. Select CORRELATE

3. Select BIVARIATE

4. Move “oq, reliable, empathy, tangible,

response, and assure” to VARIABLES box

5. Click OK




Constructs (Cont’d)

43




Constructs Using SPSS


Interpreting Pearson Correlation

Coefficients • Each of the five service-quality measures

(reliability, empathy, tangibles, responsiveness, and assurance) is significantly related to the overall quality (OQ) at the .001 level of significance

• Responsiveness has the strongest correlation (.8625)

• Tangibles have the weakest correlation (.5038)

• All the correlations are strong enough to be meaningful


Simple Regression Analysis

• Generates a mathematical relationship

(called the regression equation) between

one variable designated as the dependent

variable (Y) and another designated as the

independent variable (X)


Independent Variable Vs.

Dependent Variable

• Independent variable

– Explanatory or predictor variable

– Often presumed to be a cause of the other

• Dependent variable

– Criterion Variable

– Influenced by the independent variable


Scenario: Curtis Construction

Industry Lobbyist

• Curtis, a construction industry lobbyist, is in an area of the country that has a high unemployment rate and a number of economically depressed construction projects

• His current charge is to convince local government officials to vote in favor of several tax concessions for the construction industry

• He is wondering whether he can generate any concrete evidence to show that increased construction activity (presumably spurred by the proposed tax concessions) would greatly benefit the state


Scenario: Curtis Construction

Industry Lobbyist (Cont’d) • Possible Dependent Variable

– Number of people unemployed or the unemployment rate

– Data on this variable may be gathered from a sample of areas from around the country

• Possible Independent Variable

– Number of construction permits issued or number of ongoing construction projects

– Data on this variable should be gathered from the same sample

44


Scenario: Carol, Chief Librarian

• Carol, chief librarian in a major university,

is eager to increase the number of students

borrowing books from the library as well as

the number of books borrowed per student

• She needs some persuasive evidence to

show how increased borrowing of books

might benefit students


Scenario: Carol, Chief Librarian

(Cont’d) • Possible Dependent Variable

– Cumulative grade point ratio

– Data on this variable should be gathered for a sample of students who have borrowed books in the past


– Number of books borrowed

– Assuming that the library has records of the books borrowed by students, data on this variable can be obtained from those records for the same sample of students


Scenario: Jack, Trade Show Officer

• Jack, an officer in an association in charge

of putting together and promoting industrial

trade shows, is wondering about the impact

of the number of exhibitors in a trade show

on trade show attendance


Scenario: Jack, Trade Show

Officer (Cont’d)

• Possible Dependent Variable

– Number of people visiting a trade show

– Data on this variable can be obtained for a representative sample of trade shows from the association’s past records


– Number of exhibitors in a trade show

– Necessary data can be obtained from the past records


Deriving a Regression Equation

• Y = a + bX, where a and b are constants

• Y-> Dependent Variable

• x-> Independent Variable


Market Area Dollar Salesof Bright (inThousands)

AdvertisingExpenditure

for Bright($ in 100)

Number ofCompetingdetergents

1 5 5 15

2 10 13 8

3 6 5 14

4 20 15 5

5 15 10 9

6 9 9 10

7 11 5 12

8 18 13 4

9 22 17 6

10 7 6 13

11 24 19 2

12 14 12 8

13 16 15 6

14 17 14 7

15 23 18 1

16 8 7 11

17 12 10 10

18 13 12 7

19 21 16 7

20 9 16 3

Bright Detergent Data

45


Exhibit 14.4 Several Subjectivity

Constructed Regression Lines

Advertising Expenditures for Bright ($)

200018001600140012001000800600400

Do

llar

Sa

les

of

Bri

gh

t (T

ho

us

an

ds

)

30

20

10

0


Regression Using SPSS--Sales

and Advertising Data

1. Click ANALYZE

2. Select REGRESSION

3. Click LINEAR

4. Move “Dollar Sales for Bright” to DEPENDENT

Box

5. Move “advertising expenditures for Bright” to

INDEPENDENT(S) box

6. Click OK


Exhibit 14.5 SPSS Computer Output or

Simple Regression Analysis of Sales and

Advertising Data

Model Summary

.927a .860 .852 2.28

Model

1

R R Square

Adjusted

R Square

Std. Error of

the Estimate

Predictors: (Constant), Advertis ing Expenditures for

Bright ($)

a.




Advertising Data (Cont’d)

ANOVAb

571.646 1 571.646 110.221 .000a

93.354 18 5.186

665.000 19

Regression

Residual

Total

Model

1

Sum of

Squares df Mean Square F Sig.

Predictors: (Constant), Advertising Expenditures for Bright ($)a.

Dependent Variable: Dollar Sales of Bright (Thousands)b.

F is greater than the critical value

P value < = 0.05, we can infer that the R2–value of.860 is

statistically significant; it is unlikely to have occurred by chance




Advertising Data (Cont’d)

Coefficientsa

.163 1.457 .112 .912

1.210 .115 .927 10.499 .000

(Constant)

Advertising Expenditures

for Bright ($in 100)

Model

1

B Std. Error

Unstandardized

Coeff icients

Beta

Standardi

zed

Coeff icien

ts

t Sig.

Dependent Variable: Dollar Sales of Bright ($ in Thousands)a.

t value >2.10 and p-value < =0.05 --Reject Null Hypothesis, that is the

coefficient is statistically significant

a =.163

b =1.210

The regression equation is

Yi = .163 + 1.210 Xi


Standard Error

SSE

Sy/x = -----------

n - k - 1

• The value of the standard error (sy/x) is

shown in the computer output as 2.277,

which is the square root of the error mean

square value of 5.186

46


Practical Applications of

Regression Equations

• The regression coefficient, or slope, can

indicate how sensitive the dependent

variable is to changes in the independent

variable

• The regression equation is a forecasting tool

for predicting the value of the dependent

variable for a given value of the

independent variable


Precautions In Using Regression

Analysis • Only capable of capturing linear associations

between dependent and independent variables

• A significant R2-value does not necessarily imply a cause-and-effect association between the independent and dependent variables

• A regression equation may not yield a trustworthy prediction of the dependent variable when the value of the independent variable at which the prediction is desired is outside the range of values used in constructing the equation


Precautions In Using Regression

Analysis (Cont’d)

• A regression equation based on relatively

few data points cannot be trusted

• The ranges of data on the dependent and

independent variables can affect the

meaningfulness of a regression equation


Multiple Regression Analysis

• Yi = a + b1X1i + b2X2i + … + bkXki

• Yi is the predicted value of the dependent variable

for some unit i;

• X1i, X2i, …, Xki are values on the independent

variables for unit i;

• bl, b2, . . . , bk are the regression coefficients;

• a is the Y-intercept representing the prediction for

Y when all independent variables are set to zero


National Insurance Company–

Multiple Regression Using SPSS

• Jill and Tom were interested in conducting a

multiple regression analysis wherein overall

service quality perceptions is the dependent

variable and the average ratings along the

five dimensions are the indpendent variable


National Insurance Company– Multiple

Regression Using SPSS (Cont’d)

1. Click ANALYZE


3. Click LINEAR

4. Move “OQ” to DEPENDENT

Box

5. Move “reliable, empathy,

tangible, response, and assure”

to INDEPENDENT(S) box

6. Click OK

47





The R-square

of .810

indicates a

strong

relationship

between these

variables and

overall

quality.






All variables except empathy are significantly

related to overall service quality

(as indicated by the t-test of significance in the

far right column)


Bright Detergent Case – Multiple

Regression Using SPSS

1. Click ANALYZE


3. Click LINEAR

4. Move “Dollar Sales for Bright” to DEPENDENT Box

5. Move “advertising expenditures for Bright and Number of

competing Brands” to INDEPENDENT(S) box

6. Click OK.




Model Summary

.934a .873 .858 2.23

Model

1

R R Square

Adjusted

R Square

Std. Error of

the Estimate

Predictors: (Constant), Number of Competing

Detergents, Advertising Expenditures for Bright ($in

100)

a.

ANOVAb

580.373 2 290.187 58.293 .000a

84.627 17 4.978

665.000 19

Regression

Residual

Total

Model

1

Sum of

Squares df Mean Square F Sig.

Predictors: (Constant), Number of Competing Detergents, Adv ertising Expenditures

f or Bright ($in 100)

a.

Dependent Variable: Dollar Sales of Bright ($ in Thousands)b.


Coefficientsa

8.854 6.717 1.318 .205

.808 .324 .619 2.496 .023

-.498 .376 -.328 -1.324 .203

(Constant)

Adv ertising Expenditures


Number of Competing

Detergents

Model

1

B Std. Error

Unstandardized

Coeff icients

Beta

Standardi

zed

Coeff icien

ts

t Sig.

Dependent Variable: Dollar Sales of Bright ($ in Thousands)a.



48


Multicollinearity

• Multicollinearity exists when independent

variables in a multiple regression equation

are highly correlated among themselves


Bright Detergent Case– Multicollinearity

Correlations

1.000 .927** -.909**

. .000 .000

20 20 20

.927** 1.000 -.937**

.000 . .000

20 20 20

-.909** -.937** 1.000

.000 .000 .

20 20 20

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Dollar Sales of Bright ($ in

Thousands)

Adv ertising Expenditures


Number of Compet ing

Detergents

Dollar Sales

of Bright ($ in

Thousands)

Adv ertising

Expenditures

f or Bright ($in

100)

Number of

Competing

Detergents

Correlation is signif icant at the 0.01 level (2-tailed).**.

Very high correlation between independent variables-presence of multicollinearity


Chapter 15

Overview of

Other

Multivariate

Techniques


Chapter Objectives

• Distinguish between dependence and interdependence techniques

• Interpret interaction effect in a factorial ANOVA

• Identify two key purposes of discriminant analysis

• Discuss factor analysis and interpret a factor-loading matrix



• Distinguish between cluster analysis and

discriminant analysis

• Describe the potential uses of

multidimensional scaling and point out its

key limitations

• State the purpose of conjoint analysis and

use the results from such an analysis


Dependence and Interdependence

Techniques

• Dependence technique

– One variable is designated as the dependent variable and the rest are treated as independent variables

• Interdependence technique

– There are no dependent and independent variable designations, all variables are treated equally in a search for underlying patterns of relationships

49


Dependence Technique–

Regression Analysis

• Input Data

– Dependent variable(s) - metric

– Independent variable(s)- metric

• Primary Purpose of the Technique

– Ascertain the relative importance of independent variable(s) in explaining variation in the dependent variable

– Predict dependentvariable values for given values of the independent variable(s)


Overview of Multivariate

Techniques

• Analysis of Variance (ANOVA) Technique

• Usual Form of the Input Data

– Dependent variable, metric independent

variable(s), nonmetric


– See whether different levels (treatments) of

independent variable(s) have significantly

different impacts on the dependent variable



Techniques (Cont’d)

• Discriminant Analysis Technique


– Dependant variable, nonmetric independent variable(s),

metric


– To identify independent variables that are critical in

distinguishing between subsamples defined by the

dependent-variable categories; also aid inclassifying

new units into one of the subsample categories




• Factor Analysis Technique


– Metric


– To reduce data on a large number of variables into a

relatively small set of factors

– To identify key constructs underlying the original set of

measured variables in classifying new units into one of

the subsample categories




• Cluster Analysis Technique


– Metric


– To identify natural clusters of objects on the

basis of similarities of the objects on a variety

of characteristics




• Multidimensional Scaling Technique


– Nonmetric (similarity ranks based on comparison of

actual objects)


– To identify key dimensions underlying respondent

evaluations of products, brands, stores, etc.

– To determine the relative positions of the objects in

multidimensional space

50



Techniques (Cont’d) • Conjoint Analysis Technique


– Nonmetric


– To derive utility values that respondents implicitly assign to various levels of key attributes used in evaluating objects

• the utility values themselves aid in ascertaining the relative importance of the attributes as well as the potential attractiveness of descriptive profiles defined by different combinations of attributes


Analysis of Variance

• ANOVA is appropriate in situations where

the independent variable is set at certain

specific levels (called treatments in an

ANOVA context) and metric measurements

of the dependent variable are obtained at

each of those levels


Example 24 Stores Chosen randomly for the study

8 Stores randomly chosen for each treatment

Treatment 1

Store brand sold at

the regular price

Treatment 2

Store brand sold at

50¢ off the regular

price

Treatment 3

Store brand sold at

75¢ off the regular

price

monitor sales of the store brand for a week in each store


Table 15.2 Unit Sales Data Under Three

Pricing Treatments Treatment Regular Price 50 ¢ off 75 ¢ off

Unit Sale ineach store

37 46 46

38 43 49

40 43 48

40 45 48

38 45 47

38 43 48

40 44 49

39 44 49

Number ofstores

8 8 8

Mean sales 38.75 44.13 48.00


EG1(R) X1 O1

EG2(R) X2 O2

EG3(R) X3 O3

EG1 -- Experiment Group 1, X1-- Regular Price

EG2 -- Experiment Group 2, X2-- 50c off

EG3 -- Experiment Group 3, X3-- 75c off

O1 -- Observation (monitoring unit sales data in each store)



After Only Design


ANOVA –Grocery Store

Hypothesis

• Grocery Store Example

– Ho 1 = 2 = 3

– Ha At least one is different from one or more of

the others

• Hypotheses for K Treatment groups or samples

– Ho 1 = 2 = ………..k

– Ha At least one is different from one or more of

the others

51


Exhibit 15.1 SPSS Computer

Output for ANOVA Analysis

Between-Subjects Factors

Regular

pri ce8

50 cents off 8

75 cents off 8

1

2

3

Treatment

group

Val ue Label N


Exhibit 15.1 SPSS Computer Output

for ANOVA Analysis (Cont’d)

Tests of Between-Subjects Effects

Dependent Variable: SALES

345.250a 2 172.625 137.445 .000

45675.375 1 45675.375 36367.123 .000

345.250 2 172.625 137.445 .000

26.375 21 1.256

46047.000 24

371.625 23

Source

Corrected Model

Intercept

TREAT

Error

Total

Corrected Total

Type III Sum

of Squares df Mean Square F Sig.

R Squared = .929 (Adjusted R Squared = .922)a.

There is less than a .001 probability of obtaining an F-

value as high as 137.447


Bank Customer Perceptions Study

Bank Customers

Gender

Male Female

< 35

Years

35-64

Years

> 64

Years

< 35

Years

35-64

Years

> 64

Years

Measure Overall Perceptions


Bank Customer Perceptions Study (Cont’d)

Tests Between-Subjects Effects

Dependent Variable:Overall Quality of the Company’s Services

Source Type III

Sum of

Squares

df Mean

Square

F Sig.

Corrected

Model

2156.112a 5 431.222 438.891 .000

Intercept 20665.912 1 20665.912 1033.424 .000

Gender 382.436 1 382.436 389.237 .000

Age 1311.623 2 655.811 667.474 .000

Gender * Age 260.433 2 30.216 132.532 .000

Error 459.823 468 .983

Total 24341.000 474

Corrected Total 2615.935 473

a. R Squared = .824 (Adjusted R Squared = .822)



Descriptive Statistics

Dependent Variable: Overall Quality of the company 's serv ices

2.54 1.31 79

6.72 1.17 88

8.08 .82 85

5.87 2.57 252

6.49 1.39 55

6.95 .58 79

9.36 .48 88

7.79 1.53 222

4.16 2.36 134

6.83 .94 167

8.73 .93 173

6.77 2.35 474

Age

<35

35-64

>64

Total

<35

35-64

>64

Total

<35

35-64

>64

Total

Gender

Male

Female

Total

Mean Std. Dev iation N

Male and female

customers differed in

their overall

perceptions

Customers' perceptions

differed according to

their ages


Estimated Marginal Means of Overal l Qua li ty o f the company's services

Age

>6435-64<35

Es

tim

ate

d M

arg

in

al M

ea

ns

10

8

6

4

2

Gender

Male

Female

Sex and age interacted in influencing perceptions


52


Factorial Anova

• The Factorial ANOVA is used to analyze

data from a factorial design experiment

variable


Exhibit 15.2 Illustrations of Main and

Interaction Effects

Grocery Store Experiment

Display

Present

Display

absent

(a) Main and Interaction Effects Present

Display

Present

Display

absent

(b) Only Main Effects Present

U

n

i

t

S

a

l

e

s

Price

U

n

i

t

S

a

l

e

s

Price

Regular

Price 50 ¢ off 75 ¢ off Regular

Price 50 ¢ off 75 ¢ off


Discriminant Analysis

• Identifies the distinguishing features of

prespecified subgroups of units that are formed on

the basis of some dependent variable

• Examples of Subgroups

– Heavy, moderate, and light users of a product

– Homeowners and renters

– Viewers and nonviewers of a television program


Discriminant Analysis (Cont’d)

• Dependent Variable

– Categorical: as many categories as there are subgroups

• Heavy, moderate, and light users: 3 categories

• Independent Variable

– Metric-scaled

• Purpose of discriminant analysis is to classify new

units into one of the subgroups given the new

units’ values of the independent variable


Example

Computer Manufacturer

Household

income

Number of years of

formal education

PC Ownership Not Owning A PC


Exhibit 15.3 Scatter Plot of Income and

Education Data for Personal Computer

Owners and Nonowners

Owners

Non

Owners

Income ($)

53


Using the Discriminant Function

• Y = v1X1 + v2X2

– Discriminant weights v1 and v2 can be interpreted as signifying the relative importance of X1 and X2 in being able to discriminate between the two groups

• Ynew = v1X1,new + v2X2, new

– The program assigns either to the owner group or to the non-owner group based on the criterion value


Evaluating a Discriminant

Function

• Confusion Matrix

– Indicates the degree of correspondence, or lack

thereof, between the actual groupings of the

sample units and the predicted groupings

obtained by classifying the same units through

the discriminant function


Table 15.3 Confusion Matrix

Predicted groupings

Households with Households without

Actual Groupings Personal Computers Personal Computers

Households with

Personal computers 17 3

Household without

Personal computers 4 16


Usefulness of Discriminant

Analysis

• Discriminant analysis is very useful for

– Defining customer segments

– Identifying critical characteristics capable of

distinguishing among them

– Classifying prospective customers into

appropriate segments


Factor Analysis

• A data and variable reduction technique that

attempts to partition a given set of variables

into groups of maximally correlated

variables


Intuitive Explanation

• Consider two statements from the Star

Brand Inc.(SBI) survey

• S1. “I have been satisfied with the Star

products I have purchased”

• S2. “When I have to purchase a home

appliance in the future, it will likely be a

Star product”

54


Exhibit 15.6 S1 and S2 Highly Correlated:

Factor Analysis Will Be Beneficial

S1 and S2 can be

combined into one

factor.


Exhibit 15.7 Situation Where Factor Analysis

Will Not Be Beneficial: S1 and S2 Poorly

Correlated

S1 and S2 cannot

be combined

into one factor.


Factor Analysis Output and Its

Interpretation

• Primary output of factor analysis is a factor-

loading matrix


Table 15.4 Factor-Loading Matrix Based on Data from

Study of Star Customers

Factor Loadings Factors F1 F2

AchievedCommunalities

X4: My friends are very

impressed with the Star VCR

0.96 0.06 .926

X6: No other brand of VCR

even comes close to matchingthe Star

0.92 0.17 .875

X1: I did not mind paying the

high Price for my Star VCR

0.89 0.15 .815

X3: I hardly ever worry about

anything going wrong with myStar VCR

0.18 0.94 .916

X5: The Star VCR has the

latest technology built into it

0.09 0.88 .782

X2: I am pleased with the

variety of things that a StarVCR can do

0.16 0.86 .766

VCR

Eigenvalues: Standardized

variance explained by eachfactor

2.626 2.454

Proportion of the total varianceexplained by each factor

0.438 0.409

3 Variables load

high on factor 1

3 Variables load

high on factor 2


Reducing Star Data

• X1, X4, and X6 can be combined into one

factor

• X2, X3, and X5 can be into a second factor

• 6 variables can be reduced to two factors


Potential Applications of Factor

Analysis

• Used to

– Develop concise but comprehensive, multiple-item scales for measuring various marketing constructs

– Illuminate the nature of distinct dimensions underlying an existing data set

– Convert a large volume of data into a set of factor scores on a limited number of uncorrelated factors

55


Cluster Analysis

• Segment objects into groups so that

members within each group are similar to

one another in a variety of ways

• Useful for segmenting customers, market

areas, and products


Use of Cluster Analysis

• Firm offering recreational services wanted to enter a new region of the country

• They gathered data on more than 100 characteristics including

– Demographics

– Expenditures on recreation

– Leisure time activities

– Interests of household members

• The firm identified one or several household segments that are likely to be most responsive to its advertising and to its services


How Does Cluster Analysis

Work?

• Cluster analysis measures the similarity

between objects on the basis of their values

on the various characteristics


Exhibit 15.8 Clusters Formed by

Using Data on Two Characteristics

High

High

Low

Low Extent of participation in outdoor sporting events


Multidemensional Scaling

• Uncovers key dimensions underlying

customers' evaluations from a series of

similarity and/or preference judgments

provided by customers about products or

brands within a given set


Multi-Dimensional Scaling on

SUV’s

• A customer is asked to compare pairs of

SUVs and rank the pairs from most similar

to least similar

56


Table 15.5 Similarity Rankings

of Six 2001 SUVs

LX 470 Lrover MBenz Acura Infiniti BMW

LX 470 15 14 12 11 13

Lrover 1 4 7 2

Mbenz 5 8 3

Acura 10 6

Infiniti 9

Note: Numbers are ranks indicating perceived similarities between pairs of SUVs; the smaller the number, the more

similar the pair of SUVs is.


Exhibit 15.9 Multidimensional Map of 2001

SUVs Based on Similarity Rankings

What do these dimensions

stand for ?

Maybe Value

Ma

yb

e Q

ua

lity


Conjoint Analysis

• Technique for deriving the utility values

that customers presumably attach to

different levels of an object's attributes

• Requires respondents to compare

hypothetical products, brands

• The hypothetical stimuli are descriptive profiles

formed by systematically combining varying levels

of certain key attributes


Personal Computer Study

• To assess the role played by attributes in

customer evaluations of personal compters

– Price: 3 levels - $839, $1039, $1259

– Processor: 2 levels – 800MHz , 1.1 GHz

– Speed: 4 levels - 10 GB, 14 GB, 18 GB, 20 GB


Personal Computer Study

(Cont’d)

• 3 Levels of Price X, 2 Levels of Processor

Speed X, 4 Levels of Hard Drive Capacity =

24 different descriptive profiles of personal

computers are possible

• Data Collection in Conjoint Analysis

– Two-Factors-at-a-Time Approach

– Full-Profile Approach


Personal Computer Study: Two-Factors-

At-a-Time Approach

$ 839 $1,039 $1259Processing

Speed

Price

800

MHz

1.1

GHz

Note: Customers are asked to rank the six possible combinations

of levels according to their preferences , Most Preferred = 1 and

Least Preferred = 6

57


PERSONAL

COMPUTER –

DESKTOP

Price

$839

Speed

800 MHz

Hard Drive

10 GB

PERSONAL

COMPUTER –

DESKTOP

Price

$839

Speed

800 MHz

Hard Drive

14 GB

PERSONAL COMPUTER

- DESKTOP

Price

$839

Speed

800 MHz

Hard Drive

18 GB

PERSONAL

COMPUTER -DESKTOP

Price

$839

Speed

800 MHz

Hard Drive

20 GB

Note: Customers are asked to rank order their preferences for the

24 different profiles representing all possible combinations of the

three attributes

Personal Computer Study: Full-

Profile Approach


Exhibit 15.10 Utility Values for Three

Personal-Computer Attributes

$ 839 $1,039 $1,259


800 MHz 1.1 GHz


Personal-Computer Attributes (Cont’d)


10 GB 14 GB 18 GB 20 GB


Personal-Computer Attributes (Cont’d)


Relative Attributes of the 3

Attributes

• Range for price = 0.8 - 0.3 = 0.5

– Price is the most critical

• Range for hard drive capacity = 0.8 - 0.4 =

0.4

– Hard drive capacity is the next most critical

• Range for processor speed = 0.9 - 0.6 = 0.3

– Processor speed Ii the least critical


Potential Attractiveness of Different

Personal Computer Configurations

• PC Configuration A

– 800 MHz, 14 GB, $1,059

– Total utility for the personal computer =

0.6 + 0.7 + 0.4 = 1.7

• PC Configuration B

– 1.1 GHz, 18 GB, $1,259

– Total utility for the personal computer =

0.9 + 0.8 + 0.3 = 2.0

• Personal Computer B is more attractive

58


Online (Virtual) Conjoint

Analysis Experiments at MIT


Virtual Consumer Initiative: mitsloan.mit.edu


Virtual Consumer Initiative:

Ski Resort



Ski Resort (Cont’d)







59













chapter 11 sampling foundations - boun.edu.trweb.boun.edu.tr/ulas.akkucuk/ad585/ad585-part1c.pdf•...

Documents