marketing.churchill-jmr-79 (1).pdf

8/14/2019 Marketing.Churchill-JMR-79 (1).pdf

1/11

Measure and onstruct Validity StudiesGILBERT A. CHURCHILL, JR,

A crit ical element in the evolut ion of a fundamental body of knowledgein marke ting as well as for improved marketing prac tice is the developmentof better measures of the variables with which marketers work. In this articlean app raach is outlin ed by which this goa l can be achieved and portions

of the approach are illustrated in terms of a job satisfaction measure.

A Paradigm for Developing Better Measuresof Marketing Constructs

In an article in the April 1978 issue of theJournalofMarketing Jacoby placed much of the blame forthe poor quality of some of the marketing literatureon the measures marketers use to assess their variablesof interest (p. 91):More stupefying than the sheer number of our measuresis the ease with which they are proposed and theuncritical manner in which they are accepted. In pointof fact, m ostof our measures are only measures becausesomeone says that they are, not because they havebeen shown to satisfy standard measurement criteria(validity, reliability, and sensitivity). Stated somewhatdifferently, most of our measures are no more sophisti-cated than first asserting that the number of pebblesa person can count in a ten-minute period is a measureof that person's intelligence; next, conducting a studyand fmding that people who can count many pebblesin ten minutes also tend to eat more; and, finally,concluding from this:people with high intelligence tendto eat more.

Gilbert A. Churchill is Professor of Marketing, University ofWisconsin-Madison. The significant contributions of Michael H ous-ton, Shelby Hunt, John Nevin, and Michael Rothschild throughtheir comm ents on draft of this article are gratefully acknowledged.as are the many helpful comments of the anonymous reviewers.The AMA publications policy states: N o articie(s) will bepublished in the ournalofMarketing Researchwritten by the E ditoror the Vice President of Public ations. The inclusion of this articlewas approved by the Board of Directors because: (1) the articlewas submitted before the author took over as Editor, (2) ihe authorplayed no part in its review, and (3) Michael Ray, who supervisedthe reviewing process for the special issue, formally requestedhe be allowed to publish it.

Burleigh Gardner, President of Social ReseaInc., makes a similar point with respect to attimeasurement in a recent issue of the Marketing (May 5, 1978, p. 1):Today the social scientists are enamored of numand coun ting. . .. Rarely do they stop and ask, Wlies behind the numbers?When we talk about attitudes we are talking aconstructs ofthe mind s they reexpressed in resto our questions.But usually all we really know are the questionask and the answers we get.

Marketers, indeed, seem to be choking on tmeasures, as other articles in this issue attest. Tseem to spend much effort and time operatingthe routine which computer technicians refer toGIGOgarbage in, garbage out. As Jacoby so cinctly puts it, W hat d oes it mean if a findinsignificant or that the ultimate in statistical analytechniques have been applied, if the data collecinstrument g enerated invalid data at the ou tse t?' ' (1p. 90).What accounts for this gap between the obvneed for better measures and the lack of such msures? The basic thesis of this article is that althothe desire may be there, the know-how is not. situation in marketing seems to parallel the dilemwhich psychologists faced more than 20 years when Tryon (1957, p. 229) wrote:If an investigator should invent a new psychologtest and then turn to any recent scholarly work

64

ourn lo fMarketingRes ol XVI (Feb ruary 1979).


2/11

R DIGM FOR DEVELOPING ME SURES OF M RKETING CONSTRUCTS 65guidance on how to determine its reliability, he wouldconfront such an array of different formulations thathe would be unsure about how to proceed. After fiftyyears of psychological testing, the problem of discover-ing the degree to which an objective measure ofbehavior reliably differentiates individuals is still con-fused.

Psychology has made progress since that time.ability and now includes more di re ct assess-

framework which the marketer can embrace

This article is an attempt to provide such a frame-

covered by Pe ter's articleinthis issue . Finally,

THE PROBLEM AND APPROACHTechnically, the process of measurement or opera-on involves rule s for assigning num bersobjects to represent quantities of attribu tes (Nun-

Consider some arbitrary construct, C, such as cus-in time that every customer has a tr u e level X,. Hopefully, each

re. A ,,, equal to the obje ct's true s core , Xj. X,, score s, these differences would

X,-. Rarely is the researcher so fortu-

nate. Much more typical is the measurement wherethe X score differences also reflect (Selltiz et al.,1976, p. 164-8):1. True differences in other relatively stable charac-teristics which affect the score, e.g., a person swillingness to express his or her true feelings.2. Differences due to transient personal factors, e.g.,a person's mood, state of fatigue.3. Differences due to situational factors, e.g., whetherthe interviewisconducted in thehome or t centralfacility.4. Differencesdueto variationsinadministration, e.g.,interviewers who probe differently.5. Differences due to sampling of items, e.g., thespecific items used on the questionnaire; iftheitemsor the wording of those items were changed, the scores would also change.6. Differences due to lack of clarity of measuringinstruments, e.g., vague or ambiguous questionswhich are interpreted differently by those respond-ing.7. Differencesdueto mechanical factors, e.g., a checkmark in the wrong box or a response which is codedincorrectly.

Not all of these factors will be present in everymeasurement, nor are they limited to informationcollected by questionnaire in personal or telephoneinterviews . They arise also in studies in which self-ad-ministered questionnaires or observational techniquesare used. Although the impact of each factor on theA ,, score varies with the app roach, their impact ispredictable. They distort the observed scores awayfrom the true scores. Functionally, the relationshipcan be expressed as:

where:

systematic sources of error such as stable char-acteristics of the object which affect its score,andrandom sources of error such as transient per-sonal factors which affect the object's score.A me asure is va//f/when the differences in ob servedscores reflect true differences on the characteristicone is attempting to measure and nothing else, thatis, X^ = Xj.. A measure isreliable to the extent thatindependent but comparable measures of the same

trait or construct of a given object agree. Reliabihtydepends on how much of the variation in scores isattributable to random or chance errors. If a measureis perfectly reliable, X^ =0. Note that if a m easureis valid, it is reliable, but that the converse is notnecessarily true because the observed score whenA = 0 could still equal Xj- X^. Thus it is oftensaid that reliability is a necessary but not a sufficientcondition for validity. Rehability only provides nega-tive evidence of the validity of the measure. However,the ease with which it can be computed helps explain


3/11

JOUR NAL OF MAR KET ING RESEARCH, FEBRUARY its popularity. Reliability is much more routinelyreported than is evidence, whichismuch more difficultto secure but w hich relates more directly to the validityof the measure.The fundamental objective in measurement is toproduce X^ scores which approximate Xj scores asclosely as possible. Unfortunately, the researchernever knows for sure what theA*7.scores are. R ather,the measures are always inferences. The quality ofthese inferences depends directly on the proceduresthat are used to develop the measures and the evidencesupporting their go odn ess . This evidence typicallytakes the form of some reliability or vahdity index,of which there are a great many, perhaps too many.

The analyst working to develop a measure mustcontend with such notions as split-half test-retest,and alternate forms rehability as well as with face,content, predictive, concurrent, pragmatic, construct,convergent, and discriminant validity. Because someof these terms are used interchangeably and othersare often used loosely, the analyst wishing to developa measure of some variable of interest in marketingfaces difficult decisions about how to proceed andwhat rehability and validity indices to calculate.

Figure is a diagram of the sequence of steps thatcan be followed and a list of some calculations thatshould be performed in developing measures of mar-

Figure 1 SUGGE STED PROCEDURE FOR DEVEL OPING BETTER

MEASURESRecommended Coefficients

or TeGhniques

Specify domainof construct

Q^nerate sampleof items

Purifymeasure

Literature searchExperience surveyInsight stimulating examplesCritical incidentsFocus groups

Coefficient alphaFactor analyaia

Coilectdata

Assessreliability

Assessvalidity

Developnorms

Coefficient alphaSplit half reliability

Hultitrait multimathod niatrixCriterion validity

Average and other statisticssummariainq distribution of

keting cons tructs. The suggested sequence has worwell in several instances in producing measures desirable psychometric properties (see Churchilal., 1974, for one example). Some readers willdoubtedly disagree with the suggested process or wthe omission of their favorite reliability or valicoefficient. The following discussion, which deboth the steps and their rationale, shows that sof these measures should indeed be set aside becathere are better alternatives or, if they are used, they should at least be interpreted with the proawareness of their shortcomings.The process suggested is only applicable to mitem measures. This deficiency is not as seriouit might appear. Multi-item measures have mucrecommend them. First, individual items usually hconsiderable uniqueness or specificity in that eitem tends to have only a low correlation withattribute being measured and tends tc relate to oattributes as well. Second, single items tend to c

gorize people into a relatively small number of groFor example, a seven-step rating scale can at mdistinguish between seven levels of an attribute. Thindividual items typically have considerable measment error; they produce unreliable responses insense that the same scale position is unlikely tochecked in successive administrations of an insment.All three of these measurement difficulties candiminished with multi-item measures: (1) the specity of items can be averaged out when they combined, (2) by combining items, one can mrelatively fine distinctions among people, and (3)

reliability tends to increase and measurement edecreases as the number of items in a combinaincreases.The folly of using single-item measures is illustrby a question posed by Jacoby (1978, p. 93):How comfortable would we feel having our intelligassessed on the basis of our response to a sque stion ? Yet that's exactly what we do in consresearch. . . . The literature reveals hundreds ostanc es in which response s to a single question suto establish the person's level on the variable of intand then serves as the basis for extensive anaand entire articles.. . . Given the complexity of our subject m atter, makes us think we can use responses to single i(or even to two or three items) as measures of tconcepts, then relate these scores to a host of variables, arrive at conclusions based on suchinvestigation, and get away calling what we have quality research?

In sum, marketers are much better served multi-item than single-item measures of their structs, and they should take the time to develop thThis conclusion is particularly true for those invgating behavioral relationships from a fundame


4/11

P R DIGM FOR DEVELOPING ME SURES OF M RKETING CONSTRUCTS

as well as applied perspective, although it applies alsoto m arketing practitioners.SPECIFY DO MAIN OF THE CONSTRUCT

The first step in tbe suggested procedure for devel-oping better measures involves specifying the domainof the construct. Tbe researcher must be exactingin delineating what is included in the definition andwhat is excluded. Consider, for example, the constructcustomer satisfaction, whicb lies at the heart of themarketing concept. Though it is a central notion inmodern marketing thou ght, it is also a construct whichmarketers bave not measured in exacting fashion.Howard and Sbeth (1969. p. 145), for example, definecustomer satisfaction as

. . .the buyer s cognitive state of being adequately orinadequately rewarded in a buying situation for thesacrifice he has undergone. The adequacy is a conse-quence of m atching actual past purchase and consump-tion experience with the reward that was expected fromthe brand In terms of its anticipated potential to satisfythe motives served by the particular product class.It includes not only reward from consumption of thebrand but any other reward received in the purchasingand consuming process.

Thus, satisfaction by their definition seems to beattitude. Further, in order to measure satisfaction,it seems necessary to measure both expectations atthe time of purchase and reactions at some time afterpurchase. If actual consequences equal or exceedexpected consequences, tbe consumer is satisfied, butif actual consequences fall short of expected conse-quences, the consumer is dissatisfied.But what expectations and consequences should tbemarketer attempt to assess? Certainly one would wantto be reasonably exhaustive in the list of productfeatures to be included, incorporating such facets ascost, durability, quality, operating performance, andaesthetic features (Czepeiletal., 1974). But what aboutpurchasers reactions to tbe sales assistance theyreceived or subsequ ent service by independent dealers ,as would be needed, for example, after the purchaseof many small appliances? What about customerreaction to subsequent advertising or tbe expansionof the channels of distribution in wbich the productis available? What about the subsequent availabilityof com petitor s alternatives wbich serve tbe same

needs or the publishing of information about tbeenvironmental effects of using the product? To detailwhich of these factors would be included or howcustomer satisfaction should be operationalized isbeyond the scope of this article; ratber, tbe exampleemphasizes that the researcher must be exacting inthe conceptual specification of tbe construct and whatis and what is not included in tbe domain.It is imperative, though, that researchers consultthe hterature when conceptualizing constructs andspecifying domains. Perhaps if only a few more had

done so, one of the main problems cited by Kollat,Engel, and Blackwell as impairing progress in con-sumer researcbnamely, tbe use of widely varyingdefinitionsco uld have been at least diminished (K ol-lat et al., 1970, p. 328-9).Certainly definitions of constructs are means ratherthan ends in themselves . Y et the use of differentdefinitions m akes it difficult to com pare and accum u-late findings and thereby develop syntheses of whatis known. Researchers should bave good reasons forproposing additional new measures given the manyavailable for most marketing constructs of interest,and those publishing should be required to supplytheir rationale. Perhaps the older measures are inade-quate. The researcher should make sure this is thecase by conducting a thorough review of literaturein which the variable is used and should present adetailed statement of the reasons and evidence as towhy the new measure is better.

GENERA TE SAMPLE OF ITEMSTbe second step in tbe procedure for developingbetter measures is to generate items which capturethe domain as specified. Those techniques that aretypically productive in exploratory research, includingliterature searches, experience surveys, and insight-stimulating examples, are generally productive here(Selltiz et al., 1976). Tbe literature should indicatehow the variable has been defined previously and howmany dimensions or components it has. The searchfor ways to measure customer satisfaction wouldinclude product brochures, articles in trade magazinesand newspapers, or results of product tests such asthose published by Consum er Reports The experiencesurvey is not a probability sample but a judgmentsample of persons who can offer some ideas andinsights into tbe phenomenon. In measuring consumersatisfaction, it could include discussions with (I)appropriate people in the product group responsiblefor the product, (2) sales representatives, (3) dealers,(4) consumers, and (5) persons in marketing researchor advertising, as well as (6) outsiders who have aspecial expertise such as university or governmentpersonnel. The insight-stimulating examples could in-volve a comparison of competitors products or adetailed examination of some particularly vehementcomp laints in unsolicited letters about performanceof the product. Examples which indicate sharp con-trasts or have striking features would be most produc-tive.

Critical incidents and focus groups also can be usedto advantage at the item-generation stage. To use thecritical incidents technique a large number of scenariosdescribing specific situations could be made up anda sample of experienced consumers would be askedwhat specific behaviors (e.g., product changes, war-ranty handling) would create customer satisfaction ordissatisfaction (Flanagan, 1954; Kerlinger, 1973, p.


5/11

68 J O U R N A L O F M A R K E T I N G R E S E A R C H F E BR U AR Y 1536). The scen arios might be presented to the respon-dents individually or 8to 10of them might be broughttogether in a focus group where the scenarios couldbe used to trigger open discussion among participants,although other devices might also be employed topromote discourse (Calder, 1977).

The emphasis at the early stages of item generationwould be to develop a set of items which tap eachofthe dimensions of the construct at issue. Further,the researcher probably would want to include itemswith slightly different shades of meaning because theoriginal listw llbe refined to produce the final m easure.Experienced researchers can attest that seeminglyidentical statements produce widely different answers.By incorporating slightly different nuance s of m eaningin statements in the item pool, the researcher providesa better foundation for the eventual measure.Near the end of the statement development stagethe focus would shift to item editing. Each statementwould be reviewed so that its wording would be asprecise as possible. Double-barreled statements wouldbe split into two single-idea statements, and if thatproved impossible the statement would be eliminatedaltogether. Some of the statements would be recastto be positively stated and others to be negativelystated to reduce ye a- or na y- saying tendencies.The analyst's attention would also be directed atrefining those questions which contain an obvious socially acce ptable response.After the item pool is carefully edited, furtherrefinement would await actual data. The type of datacollected would depend on the type of scale usedto measure the construct.

PURIFY THE MEASUREThe calculations one performs in purifying a m easuredepend somewhat on the measurement model oneembraces. The most logically defensible model is thedomain sampling model which holds that the purposeof any particular measurement is to estimate the scorethat would be obtained if all the items in the domainwere used (Nunnaliy, 1 967, p. 175 -81) . The score thatany subject would obtain over the whole sampledomain is the person's true score, Xj..In practice, though, one does not use all of theitems that could be used, but only a sample of them.To the extent that the sample of items correlates withtrue scores, it is good. According to the domainsamphng model, then, a primary source of measure-ment error is the inadequate sampling of the domainof relevant items.Basic to the domain sampling model is the conceptof an infinitely large correlation matrix showing allcorrelations among the items in the domain. No singleitem is likely to provide a perfect representation ofthe concept, just as no single word can be used totest for differences in subjects' spelling abihties andno single question can measure a pers on's intelligence.

Rather, each item can be expected to have a certaamount of distinctiveness or specificity even thouit relates to the concept.The average correlation in this infinitely large mtrix, r indicates the extent to which some commcore is present in the items. The dispersion of corretions about the average indicates the extent to whiitems vary in sharing the common core. The kassumption in the domain sampling model is that items, if they belong to the domain of the concehave an equal amount of common co re. This statemeimplies that the average correlation in each columof the hypothetical matrix is the same and in tuequals the average correlation in the whole mat(Ley, 1972, p. I l l ; Nunnally, 1 967, p. 175 -6). This, if all the items in a measure are drawn from domain of a single construct, responses to those iteshould be highly intercorrelated. Low mteritem crelations, in contrast, indicate that some items anot drawn from the appropriate domain and are pducing error and unreliability.Coefficient Alpha

The recommended measure of the internal constency of a set of items is provided by coefficiealpha which results directly from the assumptions the domain sampling model. See Peter's article in tissue for the calculation of coefficient alpha.Coefficient alpha absolutely should be the fmeasure one calculates to assess the quality of instrument. It is pregnant with meaning because tsquare root of coefficient alpha is the estimatcorrelation of the k-item test with rrorl ss true sco(Nunnally, 1967. p. 191-6). Thus, a low coefficiealpha indicates the sample of items performs pooin capturing the construct which motivated the mesure. Conversely, a large alpha indicates that the A:-ittest correlates well with true scores.If alpha is low, what should the analyst do?' the item pool is sufficiently large, this outcome sugests that some items do not share equally in tcommon core and should be eliminated. The easiway to find them is to calculate the correlation each item with the total score and to plot thecorrelations by decreasing order of magnitude. Itewith correlations near zero would be eliminatFurther, items which produce a substantial or sudddrop in the item-to-total correlations would also deleted.

' What is lo w for alpha depends on the purpose of the reseaFor early stages of basic rese arch. Nunnally (1967);iuggests reliaities of .50 to .60 suffice and thai increasing reliabilities bey.80 is probably wasteful. In many applied settings, however, whimportant decisions are made with respect to specific test sco a reliability of .90 is the minimum that should be tolerated, a reliability of .95 should be considered the desirable standa(p. 226).


6/11

If the construct had, say, five identifiable dimen-

Some analysts mistakenly calculate split-half reli-

s subjects. The problem with this approach10 items (a very smafl number for m ost m easure-

Further, as the average of all of these coefficientsficient alpha, why not calculate coefficient

Factor nalysisSome analysts like to perform a factor analysis onthe data before doing anything else in the hope of

replete witb articles reporting such use. Much lessprevalent is its use to confirm or refute com ponentsisolated by other means. For example, in discussinga test compo sed of items tapping two common factors,verbal fluency and number facility, Campbell (1976,p. 194) comments:Recognizing multidimensionality when we see it is notalways an easy task. For example, rules for when tostop extracting factors are always arbitrary in somesense. Perhaps the wisest course is to always makethe comparison between the split half and internalconsistency estimates after first splitting the compo-nents into two halves on a priori grounds. That is,every effort should be made to balance the factor

^The number of possible splits with 2n items is given by the(2n)a (Bohr nstedl, 1970). For the exam ple cited.10

2(5 ) (5 )

content of each half [part] eforelookin at componentintercorrelations.When factor ana lysis is done before the purificationsteps suggested heretofore, there seems to be a ten-dency to produce many more dimensions than canbe conceptually identified. Tbis effect is partly due

to the "garbage item s" which do not bave the commoncore but which do produce additional dimensions inthe factor analysis. Though this application may besatisfactory during the early stages of research ona construct, the use of factor analysis in a confirmatoryfashion would seem better at later stages. Further,theoretical arguments support the iterative process ofthe calculation of coefficient alpha, the eliminationofitems, and the subsequent calculation of alpha untila satisfactory coefficient is achieved. Factor analysisthen can be used to confirm whether the number ofdimension s conceptualized can be verified empirically.Iteration

The foregoing procedure can produce several out-comes. The most desirable outcome occurs when themeasure produces a satisfactory coefficient alpha (oralphas if there are multiple dimensions) and the d imen-sions agree with those conceptualized. The measureis then ready for some additional testing for whicha new sample of data should be collected. Second,factor analysis sometimes suggests that dimensionswhich were conceptualized as independent clearlyoverlap. In this case, the items which have pureloadings on the new factor can be retained and anew alpha calculated. If this outcome is satisfactory,additional testing with new data is indicated.The third and least desirable outcome occurs whenthe alpha coefficient(s) is too low and restructuringof the items forming each dimension is unproductive.In this case, the appropriate strategy is to loop backto steps 1 and 2 and repeat the proces s to asc ertainwhat might have gone wrong. Perhaps the constructwas not appropriately delineated. Perhaps the itempool did not sample all aspe cts of the domain. Perhapsthe emphases within the measure were somehowdistorted in editing. Perhaps the sample of subjectswas biased, or the construct so ambiguous as to defymeasurement. The last conclusion would suggest afundamental change in strategy, starting with a re-thinking of the basic relationships that motivated theinvestigation in the first place.

SSESS RELI BILITY WITH NEW D TThe major source of error within a test or measureis the sampling of items. If the sample is appropriateand the items "look right," tbe measure is said tohave face or content validity. Adherence to the stepssuggested will tend to produce content valid measures.But that is not tbe whole story What about transientpersonal factors, or ambiguous questions which pro-


7/11

70 JOUR NAL OF MAR KETING RESEARCH FEBRUARY 1duce guessing, or any of the other extraneous influ-ences, other than the sampling of items, whicb tendto produce error in the measure?Interestingly, all of the errors that occur within atest canbeeasily encom passed by the domain samplingmodel. All the sources of error occurring within ameasurementwilltend to lower the average correlationamong the items within the test, but the averagecorrelation is all that is needed to e stimate the reliabil-ity. Suppose, for example, that one of the items isvague and respondents have to guess its meaning.This guessing will tend to lower coefficient alpha,suggesting there is error in the measurement. Subse-quent calculation of item-to-total correlations will thensuggest this item for elimination.

Coefficient alpba is tbe basic statistic for determin-ing the reliability of a measure based on internalconsistency. Coefficient alpha does not adequatelyestimate, though, errors caused by factors extemalto the instrument, sucb as differences in testing situa-tions and respondents over time. If the researcherwants a reliability coefficient which assesses tbebetween-test error, additional data must be collected.It is also advisable to collect additional data to ruleout the possibility that tbe previous findings are dueto chance. If the construct is more than a measurementartifact, it should be reproduced when the purifiedsample of items is submitted to a new sample ofsubjects.

Because Peter's article treats tbe assessment ofreliability, it is not examined here except to suggestthat test-retest reliability should n otbe used. Tbe basicproblem witb straight test-retest reliability is respon-dents ' memories. They will tend to reply to an itemthe same way in a second administration as they didin the first. Thus, even if an analyst were to puttogether an instrument in which the items correlatepoorly, suggesting there is no common core and thusno construct, it is possible and even probable thatthe responses to each item would correlate well acrossthe two measurements. The high correlation of thetotal scores on the two tests would suggest the me asurehad small measurement error wben in fact very littleis demonstrated about vaUdity by straight test-retestcorrelations.

ASSESS CONSTRUC T VALIDITYSpecifying the domain of the construct, generatingitems tbat exhaust the domain, and subsequentlypurifying the resulting scale should produ ce a m easurewhich is content or face valid and reliable. It mayor may not produce a measure which has constructvalidity. Construct validity, which lies at the veryheart of tbe scientific process, is most directly relatedto the question of what the instrument is in factmeasuringwhat construct, trait, or concept underliesa person's performance or score on a measure.The preceding steps should produce an internally

consistent or internally homogeneous set of iteConsistency is necessary but not sufficient for cstruct validity (Nunn ally, 1967, p. 92).Rather, to estabhsb the construct validity ofmea sure, tbe analyst also must determ ine (1) the extto which the measure correlates with other measudesigned to measure the same thing and (2) whetthe measure behaves as expected.Correlations With Other Measures

A fundamental principle in science is that aparticular construct or trait should be measurable at least two, and preferably more, different methoOtherwise the researcher has no way of knowwhether tbe trait is anything but an artifact of measurement procedure. All the measurements of trait may not be equally good, but science continuaemphasizes improvement of the measures of the vaables with which it works. Evidence of the convergvalidityof the m easure is provided by the extentwhich it correlates highly with other methods designto measure the same construct.

The measures should bave not only convergvalidity, but also discriminant validity Discriminvalidity is the extent to which the measure is indenovel and not simply a reflection of some otvariable. As Campbell and Fiske (1959) persuasivargue , Te sts can be invalidated by too high corretions with other tests from which they were intendto differ (p. 81). Quite simply, scales that correltoo highly may be measuring the same rather tdifferent constructs. Discriminant validity is indicaby predictably low correlations between the measof interest and other measures that are supposenot measuring the same variable or concep t (Heeand Ray, p. 362).A useful way of assessing the convergent adiscriminant validity of a measure is through multitrait-multimethod matrix, whicb is a matrixzero order correlations between different traits wheach of the traits is measu red by different meth(Campbell and Fiske, 1959). Table I, for examplethe matrix for a Likert type of measure designedassess salesperson job satisfaction (Churchill et 1974). The four essential elements of a multitrmultimethod matrix are identified by the numberstbe upper left corner of each partitioned segment.

Only the reliability diagonal (I) correspondingtbe Likert measure is shown; data were not collecfor the thermo me ter scale because it was not of interits lf The entries refiect the reliability of alternforms admitiistered two weeks apart. If tbese unavailable, coefficient alpba can be used.Eviden ce about the convergent validity of measis provided in the validity diagonal (3) by the extto which tbe corr elations are significantly differfrom zero and sufficiently large to encourage furtexamination of vaUdity. The validity coefficients


8/11

S OF MARKETING CONSTRUCTS Table1

MULTITRAIT MULTIMETHOD MATRIX

Method Likert ScaleJob Role Role

Satisfaction Conflict Ambiguity

ethod 2Th emom eter ScaleJob Bole Role

Satisfaction Conflict Ambiguity

2

Scale

Job Satisfaction

Role Conflict

Role Ambiguity

Job Satisfaction

Role Conflict

Role Ambiguity .170

1 of .450, .395 and .464 are allsignificant atDiscriminant validity, however, suggests three com-

1. Entriesin the validity diagonal (3) shouldbehigherthan the correlations that occupythe same rowandcolumn in the heteromethod block 4).This is aminimum requirement as it simply means thatthecorrelation between two different m easuresof thesame variable should be higher than the correlations betwe en that variable and any other variable whichhas neither trait nor methodincomm on (Campbelland Fisk e, 1959. p. 82). The entries in Table satisfythis condition.

2. Thevalidity coefficients 3)shouldbe higher thanthe correla tions Lntheheterotrait-monom ethodtri-angles (2) which suggests that the correlation withina trait measu red by different methods m ust be higherthan the correlation s between traits which havemethod incommon. It is a more stringent require-ment than that involved in theheteromethodcom-parisonsof step I as theoff-diagonal eleme ntsinthe monomethod blocks may be high becauseofmethod variance. The evidence in Table isconsis-tent with this requirement.

3. Thepattern of correlations should be thesameinallof the heterotrait triangles, e.g.. both 2) and(4).This requirementis acheckon thesignificanceof the traits when compared to the methodsandcan be achieved by rank ordering the correlationcoefficients in each heterotrait triangle: though avisual Inspection often suffices, a rank ordercor-

relation coefficient such as thecoefficient ofcon-cordance can be computedifthere are a great manycomparisons.The last requirement isgenerally, thoughnot com-pletely, satisfied by thedata inTable I. Within each

heterotrait triangle,thepairwise correlationsarecon-sistentin sign. Further, when the correlations withineach heterotrait triangleareranked from largest posi-tive to largest negative, the same order emerges exceptfor thelower left trianglein theheteromethod block.Here the correlation between job satisfaction and roleambiguity is higher, i.e., less negative, than thatbetween job satisfaction and role conflict wh ereasthe opposite wastrue in theother three heterotraittriangles seeFord et al., 1975, p. 107, as to whythis single violation of thedesired pattern may notrepresentaserious distortionin themeasure).Ideally, the methods and traits generating the multi-trait-multimethod matrix shouldbe asindependentaspossible (Campbell andFisk e, 1959,p. 103).Some-timesthenatureof the trait rulesout theopportunityfor measuring itbydifferent methods, thus introducingthe possibilityofmethod variance. When this situationarises, theresearc her's efforts shouldbedirected toobtaining as much diversity aspossible in termsofdata sourcesand scoring procedures. If the traitsarenot independent, the mono method correlations w illbe large and theheteromethod correlations betweentraits will alsobesub stantial,and theevidence about


9/11


10/11

R DIGM FOR DEVELOPING ME SURES O f M RKETING CONSTRUCTS Torgerson 1958)suggestsindiscussing the orderinga theoretical-correlational

It is more than a mere coincidence that the scienceswould order themselves in largely the same way ifthey were classified on the basis to which satisfactorymeasurement of their important variables has beenachieved. The development of a theoretical science. . . would seem to be virtually impossible unless itsvariables can be measured adequately.

rtels . 1951; Buzzell . 1963; Co nver se, 1945; Hun t,Persons doing research of a fundamental nature areadvised to exec ute the whole proces s suggested

th is comm it tment to qual i ty rese arc h . Thoseg appl ied research perhaps cannot af fo rd the

data and will at least indicate wheth er one or mo re

ully to contr ibute to impro ved marketing

R E F E R E N C E SRobert. Can M arketing Be a Scie nce? , Journalof Marketing 15 (January 1951), 319 -28.stedt, George W. Reliability and Validity Assessmentin Attitude Measurement, in Gene F. Summers, ed..Attitude Measurement. Chicago: Rand McNally andCompany, 1970, 80-99., Robert D. Is Marketing a Scie nce , HarvardBusiness Review,41 (January-Feb ruary 1963), 32-48.y J. Foc us Groups and the Nature of Qualita-tive MarketingRese&Tcii, JournalofMarketingResearch,14 (August 1977), 353-64.

Campbell, Donald R. and Donald W. Fiske. Conv ergentand Discriminant Validation by the Multitrait-MultimethodMatrix, Psychological Bulletin,56 (1959), 81-105.Campbell, John P. Psychometric The ory, in Marvin D.Dunette, ed.. Handbook of Industrial and OrganizationalPsychology. Chicago: Rand McNally. Inc., 1976. 185-222.Churchill, Gilbert A., Jr., Neil M. Ford, and Orville C.W alker, Jr. Me asuring the Satisfaction of IndustrialSalesmen, Journal of Marketing Research, 11 (August1974), 254-60.Conv erse, Paul D. Th e Development of a Science inMsiTkelin$,' Journal ofMarketing.10 (July 1945), 14-23.Cronbach , L. J. Coefficient Alpha and the Internal Struc-ture of Tests, Psychometrika,16 (1951), 297-334.Czepeil, John A., Larry J. Ro senberg, and Adebayo A kerale. Perspectives on Consumer Satisfaction, in Ronald C.Curhan, ed., 974 CombinedProceedings.Chicago: Amer-ican Marketing Association, 1974, 119-23.Flanagan, J. The Critical Incident Tec hniq ue, Psychologi-cal Bulletin,51 (1954), 327-58.Ford, Neil M., Orville C. Walker. Jr. and Gilbert A.Churchill, Jr. Expectation-Specific Measures of theIntersender Confiict and Role Ambiguity Experienced byIndustrial Salesmen, Journal of Business Research, 3(April 1975), 95-112.Gardner, Burleigh B. Attitud e Research Lacks System toHelp It Make Sense , Marketing News, 11 (May 5, 1978),1+.Ghiselli, Edwin E. Theory of Psychological Measure ment.New York: McGraw-Hill Book Company. 1964.Heeler, Roger M. and Michael L. Ray. Me asur e Validationin Marketing, Journal of Marketing Research, 9 (No-vember 1972), 361-70.Howard, John A. and Jagdish N. Sheth. The Theory ofBuyer Behavior. New York: John Wiley & Sons, Inc.,1969.Hunt, Shelby D. The N ature and Scope of Ma rketing,Journal ofMarketing 40 (July 1976), 17-28.Jacoby, Jacob. Consum er Research: A State of the ArtReview, Journal ofMarketing 42 (April 1978), 87-96.Kerlinger, Fred N. Foundations of Behavioral Research,2nd ed. New York: Holt. Rinehart, Winston. Inc., 1973.Kollat, David T., James F. Engel, and Roger D. Blackwell. Current Problems in Consumer Behavior R esea rch,Journal ofMarketing Research7 (August 1970), 327-32 .Ley, Philip. Quantitative Aspects of Psychological Assess-ment. London: Gerald Duckworth and Company, Ltd.,1972.Nunnally, Jum C. Psychometric Theory. New York: Mc-Graw-Hill Book Company. 1967.Schwab, D. P. and L. L. Cummings. Theo ries of Perfor-mance and Satisfaction: A Review, Industrial Relations,9 (1970), 408-30.Selltiz, Claire, Lawrence S. Wrightsman, and Stuart W.Cook.Research MethodsinSocial Relations,3rd ed. NewYork: Holt, Rinehart, and Winston, 1976.Torgerson, Warren S.Theoryand M ethods ofScaling.NewYork: John Wiley & Sons. Inc., 1958.Tryon, RobertC Reliability and Behavior Domain Validity:Reformulation and Historical Critique, PsychologicalBulletin, 54 (May 1957), 229-49.


11/11

marketing.churchill-jmr-79 (1).pdf

Documents