método regresión
DESCRIPTION
Método de regresión numéricaTRANSCRIPT
Bivariate Analvsis:Measures of Association
WHAT YOU WILL LEARN
IN THIS CHAPTEH:To give examples of the types of business questions that may be answered by ana.lyzing the association between two variables .l.
T.o,list,the common procedures for measuring association and to discuss how themeasUremert scale will influence the selection of statistical tests.
-rf the simnle cnrelafinn cneffinicntTo discuss the concept of the simple correlation coefficient. _ ,
To calculate a simple correlation coefficient and a coefficient of determination,
To understand that correlation does not mean causation. 'r":
To interpret a correlation matrix. ':,,1,.:,,t',"
To explain the concept ofbivariate linear regression. ,.,
To identify the intercept and slope coefficients.
To discuss the least-squares method of regression analysis. t ,,t' ,
.,,..,,,,1
To draw a regression line. :::.:...:
To test the statistical significance of a least-squares regression. ..lllr,
To calculate the intercept and slope coefficients in a bivariate linear regression. ...:,
To interpret analysis of variance summary tables for linear regression. ,,.1.:
CHAPTER Z2 Bivariate Ar-ralr,'sis: N4easures of Association
IiXHIBI'l' 22.2 Bivariatc Analvsis - Comnron Procedures for 'festing ,\ssociaticin
551
Measurement levela
r:::,-ij,,:,:,1]:ri:,::::::::,:::::::::::::::::::
ii ':: .:;:::::::::::::::::': : :::::::j:::::::::::i::::::::::i::::r 1 . : .: .::..::.i::': :: ..:. .
iiiiii.iiiiiiiiiiiii""' ... .,.,.......,H.iffi.1H#l.ii..h.dill$*... ::i:::::
.
Co rfe,lati sn coeff icie nt
{Pearson's r)E.iva,rf iat0, 1g,gl'gss,iO n an.a|ysis
: Chi-squareSpearman rank correlationKendall's rank correlation
::::l:::llllllllll:::::llt::lll..,l..6,h,i;squaro "" i
::::::i:i::::::i:.i::::i::::i,ii:i:i:;i:Hl liisoef{ic|efl'f''Cu,nti nge ncy coeff i cie nt
Measure of association Sample question
Are dollar salesassociated with advertising
dollar expenditures?
ls rank preterence forshopping centers associated
with Likert scale rankingof convenience of locations?
ls sex associated withbrand awareness (aware/
not aware)?
,lf at least one of the two variables has a given level of measurement, the appropriate procedure is the one with the lewest assumptions about
the data.
simple correlationcoeff icientA statistical measure of the
covariation of or association
between two variables.
SIN,{PLE CORRELATION COEFFICIENT
The most popular technique that indicates the relationship of one variable to another
is simple correlation analysis. The simple correlation coefficient is a statisticalmeasure of the covariation or association between two variables. The correlationcoefficient (r) ranges from +1.0 to -1.0. If the value of r is 1.0, there is a perfectpositive linear (straightJine) relationship. If the value of r is -1.0, a perfect negative
linear relationship or a perfect inverse relationship is indicated. No correlation is
indicated if r = 0. A correlation coefficient indicates both the magnitude of the lin-ear relationship and the direction of the relationship. For example, if we find that
the value of r = -.92, we know we have a relatively strong inverse relationship.That is, the greater the value measured by variable X the less the value measured
by variable LThe formula for calculating the correlation coefficient for two variables X
and Iis:I(X,- X)V,-Yt(" = r" = l\x,- -);,,>ff, - Yr
where the symbols X and Y represent the sample means of X and { respectively.
517 PART VI Data Analysis and Presentation
An alternative way of expressing the correlation formula is:
orvrr_ ryx_ \/oW,
where
o? = variance ofXo? = vaiance of Y
o,r' = covoriance ofX and Y
with
r(X, - Iog,- Yto" =- lg
If associated values of X, and Yr differ from their means in the same direction.
then their covariance will be positive. Covariance will be negative if the values of Xi
and Y, have a tendency to deviate in opposite directions.
EXHIBTT ?2,7
Scatter DiagramsIllustrating CorrelationPatterns
r = .30OO
f=.80 f = +1 .0
a
o'ao
a'
ooo
aa
aaaO
oa
a
Y aO
aaO
a
aaaO
'aaa
a
Oa
OO
aa
o
ooao
o
aOo
a
o
a
a
Oa
oa'
oooOO
aa
Oa
a
a
o
ao
aOo
oa
a
Low PositiveCorrelation
High PositiveCorrelation
Perfect PositiveCorrelation
f=0 r = -.60 I = -1 .0
ao
a
oooo
'a ' o
o 'aoo
OOo
a
a
o
Moderate NegativeCorrelation
Perfect NegativeCorrelation
CHAPTER 22 Bivariate An:rlvsis: N,leasures of ,{.ssociartion 557
In actuality, the simple correlation coefficient is a standardized measure of co-variance. In the formula the numerator represents covariance and the denominator isthe square root of the product of the sample variances. Researchers find the correla-tion coefficient useful because two correlations can be compared without regard tothe amount of variation exhibited by each variable separately.
Exhibit 22.3 ilhstrates the correlation coefficients and scatter diagrams for sev-eral sets of data.
An ErarnpleTo illustrate the calculation of the correlation coefficient, an investigation is made todetermine if the average number of hours worked in manufacturing industries is re-lated to unemployment. A correlation analysis on the data in Table 22.1 is used todetermine if the two variables are associated.
The correlation between the two variables is -.635, which indicates an inverse re-lationship. Thus when the number of hours worked is high, unemployment is low.This makes intuitive sense. If factories are increasing output, regular workers typi-cally work more overtime and new employees are hired (reducing the unemploy-ment rate). Both variables are probably related to overall economic conditions.
Correlation and CausationIt is important to remember that correlation does not meancausation. No matter howhighly conelated the rooster's crow is to the rising ofthe sun, the rooster does notcause the sun to rise. It has been pointed out that there is a high correlation betweenteachers' salaries and the consumption of liquor over a period of years. The approxi-mate correlation coefficient is r = .9. This high correlation does not indicate thatteachers drink, nor does it indicate that the sale of liquor increases teachers' salaries.It is more likely that both teachers' salaries and liquor sales covary because they areboth influenced by a third variable, such as long-run growth in national ircomeand./or population.
In this example relationship between the two variables is apparent but not real.Even though the variables are not causally related, they can be statistically related.
fr
-;!t
Researchers who examine
statistical relationsh i ps m ustbe aware that the variables
may not be causally related.
554
TABLtr 22.1
Correlation Analvsisof Number ofHours \\'orked inN,IanufacturingIndustries w'ithunemplor,'ment Rate
coefficient ofdetermination (r2)
A measure of that portion of
the total variance of a variable
that is accounted for by
knowing the value of another
variable.
PART VI Data Ar-ralysis and Presentatior-r
NumberUnemployment of HoursRate (X,) Worked (Y;) X,- X (X,- X), Y, -Y (Y, - Y)' (X,- X)(V, - D
5.5
4.4
4.1
4.3
6.8
5.5
5.5
6.7
5.5
5.7
5.2
4.5
3.8
3.8
3.6
3.5
4.9
5.9
5.6
0.51
-0.59
-0.89
-0.691.81
0.51
0.51
1 .71
0.51
0.71
0.21
-0.49
-1.19
-1 .19
-1 .39
-1 .49
-0.090.91
0.61
0.2601
0.3481
0.7921
0.4761
3.2761
0.260 1
0.2601
2.9241
0.2601
0.5041
0.0441
0.2401
1 .4161
1 .4161
1.9321
2.2201
0.0081
0.8281
0.3721
39.6
40.7
40.4
39.8
39.2
40.3
39.7
39.8
40.4
40.5
40.7
41 .2
41 .3
40.6
40.7
40.6
39.8
39.9
40.6
X = 4.99
Y = 40.31
-0.71 0.5041
0.39 0.1521
0.09 0.0081
-0.51 0.2601
-1 .11 1 .2321
-0.01 0.0001
-0.61 0.3721
-0.51 0.2601
0.09 0.0081
0.1 9 0.0361
0.39 0.1521
0.89 0.7921
0.99 0.9801
0.29 0.0841
0.39 0.1 521
0.29 0.0841
-0.51 0.2601
-0.41 0. 1 681
0.29 0.0841
-0.3621
-0.2301
-0.08010.3519
-2.0091
-0.0051
-0.31 1 1
-0.87210.0459
0.1 349
0.0819
-0.4361
-1 .1781
-0.3451
-0.5421
-0.43210.0459
-0.3731
0.1 769
I(X,-X)r-17.8379>(f-Y)'=5.5899
z(X,- xl(Y,- Y) - -6.338e2(X,- xlI - Y)
2(X,- x)r2(Y,- Y)'-6.3389
-6.3389=:f ge.ttz
= -.635
This can occur because both are caused by a third (or more) factor(s). When this isso, the variables are said to be spuriously related.
C oeffi cient of DeterminationIf we wish to know the proportion of variance in I explained by X (or vice versa).we can calculate the coefficient of determination by squaring the correlationcoefficient (r2):
. Explained variancet
-- Total variance
t)6 PART VI Data Analysis and Presentartion
'f ABLE 22.2 Pearson Product-Nlornent Correlation Nlatrix for Sales N'Ianagement Example,'
Variables JS GE SE OD JT RA TP WLVI
S Performance
JS Job satisfaction
GE Generalized self-esteem
SE Specific self-esteem
OD Other-directedness
Vl Verbal intelligence
JT Job-related tension
RA Role ambiguity
TP Territory potential
WL Workload
1.00
.45b 1.00
.31b .10 1.00
.61b .28b .36b
.05 -.03 -.44b
-.36b -.13 -.14_.48b _.56b _.32b
-.26" -.24" -.32b.49b .31b .04
.45b .1 1 .29"
1.00
-.24"
-.11
-.34b
-.39b.zgb
.29"
1.00
-.1 8d 1 .00
.26b -.02
.38b -.05
.09 -.09
-.04 -j2
1.00
.44b 1.00
-.38b -.26b 1.00
_.27" _.22d .4gb 1.00
"Numbers below the diagonal are for the sample. Those above the diagonal are omitted.
op < .05.
bivariate linear regressionA measure of linear
association that investigates a
straight-line relationship of the
tyOe Y: 3 * pX, where Y is
the dependent variable. X is
the independent variable, and
a and B are two constants to
be estimated.
interceptAn intercepted segment of a
iine. The point at which a
regression lrne intersects the
Y-axis.
slopeThe inclination of a regression
line as compared to a base
line. Rise (vertical distance)
over run (horizontal difference),
REGRESSION AN'ALYSIS
Regression is another technique for measuring the linear association between a de-pendent and independent variable. Although regression and correlation are mathe-matically related, regression assumes the dependent (or criterion) variable, I, ispredictively linked to the independent (or predictor) variable, X. Regression analysisattempts to predict the values of a continuous, interval-scaled dependent variablefrom the specific values ofthe independent variable. For example, the amount ofex-ternal funds required (the dependent variable) might be predicted on the basis ofsales growth rates (independent variable). Although there are numerous applicationsof regression analysis, forecasting sales is by far the most common.
The discussion here concerns bivariate linear regression. This form of regres-sion investigates a straight-line relationship of the type Y = a + 9X, where I is thedependent variable and X is the independent variable and a and B are two constantsto be estimated. The symbol a represents the I intercept and B is the slope coeffi-cient. The slope B is the change in Idue to a corresponding change in one unit ofX.The slope may also be thought of as "rise over run" (the rise in units on the I axis di-vided by the run in units along the X axis.) (The A is the notation for "a change in.",
Suppose a researcher is interested in forecasting sales for a construction distribu-tor (wholesaler) in Florida. Further, the distributor believes a reasonable associatioiexists between sales and building permits issued by counties. Using bivariate linea:regression on the data in Table 22.3, the researcher will be able to estimate sales po-tential (Y) in various counties based on the number of building permits (X).
For a better understanding of the data in Table 22.3, the data can be plotted on "scatter diagram (Exhibit 22.4).ln the diagram the vertical axis indicates the value c:the dependent variable I and the horizontal axis indicates the value of the indepen-dent variable X. Each point in the diagram represents an observation of the X and i'at a given point in time, that is, the paired values of Y arrd X. The relationshr:
..iF.l.r+lrl*r
:|:;jr:,:;,1:;:::;i;::';;
,.,,it:,f :
,\J,:::,:::::t:):a::::t:)):)
i:::fim
]ffilia.d
Regression: One Step Backward
The essence of a dictionarydefinition of the word "re-gression" is a going back
or moving backward. Thisnotion of regressing, thatthings "go back to Previousconditions," was the source
for the original concept of statistical regression. Gal-
ton, who first worked out the concept of correlation,got the idea from thinking about "regression toward
mediocrity,o' a phenomenon observed in studies of in-
heritance. "Tall men will tend to have shorter sons,
and short men taller sons. The sons' heights, then,
tend to 'regress to,' or 'go back to,' the mean of the
population. Statistically, if we want to predict Y and X
and the correlation between X and Y is zero, then our
best prediction is to the mean." (lncidentally, the sym-
bol r, used for the coefficient of correlation, was origi-
nally chosen because it stood for "regression.")
CIIAP'|ER 22 Bivrrriatte An:rlvsis: \'leasttres of Associ:rtiorr ss7
between X and Y could be "eyeballed," that is, a straight line could be drawn through
the points in the figure. However, such a line would be subject to human error. Two
researchers might draw different lines to describe the same data.
Least-Sciuares \Iethod of Regressinn .\nalvsisThe task of the researcher is to find the best means for fltting a straight line to the
data. The least-squares method is a relatively simple mathematical technique that
ensures that the straight line will best represent the relationship between X and Y.
The logic behind the least-squares technique goes as follows. No straight line can
completely represent every dot in the scatter diagram. Unless there is a perfect
least-squares rnethod
A mathematical iechnique
ensuring that the regression
line will hest represent the
linear relationship between
X and Y.
'I'atble 72.3
Relationsliil> of Salcs
Potential to RtrilclingPernrits Issrrecl
Dealer
YDealer's SalesVolume (000)
xBuildingPermits
1
2
3
4
5
6
7
II
10
11
12
13
14
15
77
79
80
83
101
117
129
120
97
106
99
121
103
86
99
86
93
95
104
139
180
165
147
119
132
126
156
129
96
108
558
EXHIBIT 22,4
Scatter Diagram andEyeball Forecast
PART VI Data Analysis and Presentation
Y
165
160
155
150
145
140
135
130
12s
120
115
110
105
100
95
90
85
80
Myline
+
t'Yourline
85 95 105 115 125 135 145 155 165 175 18s 195
X
residualThe difference between the
actual value of the dependent
variable and the estimated
value of the dependent
variable in the regression
equation.
correlation between two variables, there will be a discrepancy between most of theactual scores (each dot) and the predicted score based on the regression line. Simplystated, any straight line that is drawn will generate errors. The method of leastsquares uses the criterion of attempting to make the least amount of total error inprediction of Y from X. More technically, the procedure used in the least-squaresmethod generates a straight line, which minimizes the sum of squared deviations ofthe actual values from this predicted regression line. Using the symbol e to representthe deviations ofthe dots from the line, the least-squares criterion is:
Le? is-iri*r*where
€i=Yi- i, (the "residual")I; = actual value of the dependent variable
i, = estimated value of the dependent variable (Yhat)
n = number of observations
i = number of the observation
The symbols A and B ate
the line. Thus, to comPute
formulas:
CHAPTER 22 Bivariate Analysis: Measures of Association 559
The general equation of a straight line equals Y = a * BX, whereas a more appro-
priate equation includes an allowance for error:
Y=6+BX+e
utilized when the equation is a regression estimate ofthe estimated values of a and 9, *. use the following
A n(>xY) - (>x)(Ir)p-
and
6=V - 0X
where
0 - estimated slope of the line (the "regression coefficient")
A - estimated intercept of the y axis
Y - dependent variable
Y - mean of the dePendent variable
X - independent variable
X - mean of the independent variable
n = number of observations
Dealer Y XY
tl195
X
rf themplyleast
ror inpares
ms ofresent
TABLtr 22.4
Least-SquaresComputation
)exY
1772793804835 101
6 117
7 129
8 120
99710 106
11 99
12 121
13 103
14 86
15 99
7 = 99.8
5,929
6,241
6,400
6,889
10,201
13,689
16,641
14,400
9,409
11,236
9,801
14,641
10,609
7,396
9,801
>Y2 = 153,283
86
93
95
104
139
180
165
147't 19
132
126
156
129
96
108
2X - 1,W5
X -125
7,396
8,649
9,025
10,816
19,321
32,400
27,225
21 ,609
14,161
17,424
15,876
24,336
16,641
9,216
11,664
>X2 = 245,759
6,622
7,347
7,600
8,632
14,039
21 ,060
21 ,285
17,640
11,543
13,992
12,474
18,876
13,287
8,256
10,692
>xY= 193ffi
560 I']AR'f VI Data .\nalvsis uncl I'rcsentertion
These equations may be solved by simple arithmetic (see Tablemate the relationship between the distributor's sales to a dealer andbuilding permits, the following manipulations are performed:
0- n(ZxY) - (>))(Ir;n(2X2) - (I4'
1 5( 1 93,345.) - 2,906,975
15(215 ,l 59) - 3,5 15 ,625
2,900,115 - 2,906,,915
3,686,38s - 3,5ts,62s
_ 93,300
110,160
= .54638
h=Y - gX
= 99.8 - .54638(125)
= 99.8 - 68.3
= 31.5
The formula i' = 31.5 + 0.546X is the regression equation used for the prediction ofthe dependent variable. suppose the wholesaler considers a new dealership in anarea where the number of building permits equals 89. Sales may be forecast in thisarea as:
i'= 31.5 + .546 (n= 31.5 + .s46 (89)
= 31.5 + 48.6
= 80.1
Thus our distributor may expect sales of 80. I in this new area.sCalculation of the cor:relation coefficient gives an indication of how accurate the
predictions may be. In this example the correlation coefficient is r = .9356, and thecoefficient of determination is 12 = .8754.
i ila* ilig ii it,,:i;;rrllrri !,rrrr:To draw a regression line on the scatter diagram, only two predicted values of I-need plotting. For example, if Dealer 7 and Dealer 3 are used, t, and ?rwill be cal-culated to be 121.6 and 83.4:
Dealer 7 (actual Ivalue = 129): f', =31.5 +.546(165)
= 121.6
Dealer 3 (actual Y value = 80): I, = 31.5 + .546(95)
= 83.4
once the two Y values have been predicted, a straight line connecting the points?t = 121 .6, Xt = 165, and i, = 83.4, X1= 95 can be drawn.
Exhibit 22.5 shows the regression line. If it is desirable to determine the error (re-sidual) of any observation, the predicted value of r is flrst calculated. The predictedvalue is then subtracted fiom the actual value. For example, the actual observation
22.4). To esti-the number of
567,
trXHIBT'I'22.6Scatter f)iagranrof fhplained anclLlnerplainecl Yariation
PART VI Data Analysis ar-rd Presentatior-r
Dea
actuaY
130
120
110
100
90
80
ler B
I sales
\\ $ryvo
Yi- Y = Deviationexplained by regression
AYAX
100 140 150 170
using r, - Y; rather than { - 7. ttris is the "explained" deviation due to the regres-sion. The smaller number 8.2 is the deviation not explained by the regression.
Thus the total deviation can be partitioned into two parts:
(y,-V) =1?,-r1 +g,-?;Total Deviation Deviationdeviation = explained by + unexplained by
the regression the regression(residual error)
where
7 = mean of the total group
i = value predicted with regression equation
Yi = actual value
For Dealer 8 the total deviation is 120 - 99.8 = 20.2, the deviation explained by theregression is I 1 1.8 - 99.8 = 12, and the deviation unexplained by the regression is120 - 111.8 = 8.2. If these values are summed over all values of y,(i.e., all observa-tions) and squared, these deviations provide an estimate of the variation of r ex-plained by the regression and unexplained by the regression:
Z(y,- y), = I(r, - y), + 2(y,_ t,),Total Explained Unexplainedvariation =variation + variationexplained (residual)
we have thus partitioned the total sum of squares, ssr, into two parts: the regres-sion sum of squares, SSr, and the error sum of squares, SSe..
SSr-SSr+SSe
180160130120110
CHAPTER 22 Bivariate Analvsis: N4easures of Association
tions. The beta coefficients of some well-known com-panies, as calculated by Merrill Lynch, are shown inthe table below. Most stocks have betas in the rangeof 0.75 to 1.50, The average for all stocks is 1.0 bydefinition. A list of beta coefficients is given below:
Stock Betaalized rate of return on the stock market ( K*1.
The tendency of a stock to move with the marketis reflected in its beta coefticient, which is a mea-sure of the stock's volatility relative to an averagestock. Betas are discussed at an intuitive level in thissection.
An average risk sfock is defined as one whichtends to move up and down in step with the generalmarket as measured by some index such as the DowJones or the New York Stock Exchange lndex. Such a
stock will, by definition, have a beta (g) of 1.0, whichindicates that, in general, if the market moves up by10 percent, the stock will also move up by 10 percent,while if the market falls by 10 percent, the stock willlikewise fall by 10 percent, A portfolio of such g = 1.0stocks will move up and down with the broad marketaverages and will be just as risky as the averages. lfB = 0.5, the stock is only half as volatile as the mar-ket-it will rise and fall only half as much-and a port-folio of such stocks is half as risky as a portfolio ofF = 1.0 stocks. On the other hand, if p :2.A, the stockis twice as volatile as an average stock, so a portfolioof such stocks will be twice as risky as an averageportfolio.
Betas are calculated and published by MerrillLynch, Value Line, and numerous other organiza-
The Concept of Beta When Investirg in Stocks
Suppose a regression wasrun with the historic realizedrate of return on a particularstock (K ) as the dependentvariable and the historic re-
Apple Computer
Union Pacific
Georgia-Pacific
Mattel
General Electric
Bristol Myers
General Motors
McDonald's
Procter & Gamble
IBM
Anheuser-Busch
Pacific Gas & Electric
1.60
1.43
1.36
1 .15
1.09
1.00
0.94
0.93
0.80
0.70
0.58
4.47
lf a high-beta stock (one whose beta is greaterthan 1,0) is added to an average risk (F : 1.0) portfolio,then the beta and consequently the riskiness of theportfolio will increase. Conversely, if a low-beta stock(one whose beta is less than 1.0) is added to an aver-age risk portfolio, the portfolio's beta and risk will de-cline. Thus, because a sfock's beta measures ifscontribution to the riskiness of the portfofio, beta isthe appropriate measure of the stock's riskiness.
F-testA procedure used to
determine if there is more
variability in the scores of one
sample than in the scores of
another sample.
'l':\ULIi,22.5
Analvsis ol- \'ariance'l':rble fr;r llivariatcRc-gre ssion
Source of Variation Sum of SquaresMean Square(Variance)
An F-test or an analysis ofvariance applied to regression can be used to test relativemagnitude of the SSr and SSe with their appropriate degrees of freedom. Table 22.5indicates the technique for conducting the F-test.
Degrees ofFreedom
Explained by regression
Unexplained (error)
where k - hUrTlber of estimated parameters (variables)/? - r'lurT'lber of observations
k-1n- k
SSr = >(V,- V1,
SSe - >(Y,- Y)'SSrl k-1SSeln - k
564
TABLtr 22.6
Analvsis of YarianceSummarr''I-able forRegression of Sales onBuilding Pennits
analysis of variance
summarY table
A table that Presents the
results of a regression
calculation.
PART VI Data Analysis and Presentation
Source of Variation Sum of Squares Mean Square F-Valued.f .
Explained bY regression
Unexplained by regression (error)
Total
3398.49
483.91
3882.40
3398.49
37.22
91 .30
1!14
Fortheexampleonsalesforecasting,theanalysisofvariancesummarytable'comparingrelativemagnitudesofthemeanSquare,ispresentedinTable22.6,FromTable6intheAppendixwefindthattheF-valuegl.3,withldegreeoffreedominthe numerator and 13 degrees of freedom in the denominator, exceeds the probabil-
ity level of .01. The ,orfiri"nt of determinatio-n, rz,reflects the proportion of varia-
tiln explained by the regression line' To calculate r2:
- SSr SSer.=lS =, _F
In our example, 12 is calculated to be '875:
" 3398.49
"=ffii ='875
The coefficient of determination may be interpreted to mean that 87 percent of the
variation in sales *u, "^ftuir.d
by associating the variable with building permits'
SUMN,TARY
In many situations two variables are interrelated or associated. Many bivariate statis-
tical techniques can be used to measure association. Researchers select the appropri-
ate technique on the basis of each variable's scale of measurement'
Thecorrelationcoefficient(r),astatisticalmeasureofassociationbetweentwovariables, ranges from r = +1 .0 for a perfect positive correlation to r = -1'0 for a per-
fect negative correlation. No correlaiion is indicated for r = 0. Simple correlation is
themeasureoftherelationshipofonevariabletoanother.Thecorrelationcoefficientindicates the strength of the association of two variables and the direction of that
association.Itmustberememberedthatcorrelationdoesnotprovecausation,asvariables other than those being measured may be involved' The coefficient of deter-
mination (rz) measures th" uriount of the total variance in the dependent variable
that is accounted for by knowing the value of the independent variable' The results
of correlation computations -" oft"' presented in a correlation matrix'
Bivariate tin"* r"gr"rrion investigates a straight-line relationship between one de-
pendent variable anO one independeni variable. The regression can be done intuitivell
fv prouing a scatter aiagram of the X and y points and drawing a line to fit the ob-
served relationship. rneieast-squares method mathematically determines the best-fit-
ting regression line for tlre observed data. The line determined by this method may be
used to forecast values of the dependent variable, given a value for the independent
CHAPTER 22 Bivariate Analtsis: Measures of Association
8. A football team's season ticket sales, percentage of games won, and number ofactive alumni are given below:
;i1,:
iEL.:fldr_,ar
.$ml ,
ffii.dlH
{ffij..=:'
ffi'fp'{&{&tqif,.
4'i,,
sffi1,rffii#
r;ie,,*,
SeasonYear Ticket Sales
Percentage ofGames Won
Number ofActive Alumni
1 985
1 986
1 987
1 988
1 989
1 990
1 991
1992
1 993
4,995
8,599
8,479
8,419
10,253
12,457
13,285
14,177
15,730
40
54
55
58
63
75
36
27
63
NA
NA
NA
NA
NA
6,315
6,860
8,423
9,000
a. Interpret the correlation between each variable.b. Calculate: Regression sales = Percentage of games won.c. Calculate: Regression sales = Number of active alumni.
9. Are the different forms of consumer installment credit in the table below highlycorrelated? Explain.
Credit Card Debt Outstanding (Millions of Dollars)
Year
Travel and BankGas Entertainment CreditCards Cards Gards
RetailCards
Total TotalCredit lnstallmentCards Credit
1
2
3
4
5
6
7
8
9
10
11
$ e3e
1,119
1,298
1,650
1,804
1,762
1,832
1,823
1,993
1,981
2,074
$6176
110
122
132
164
191
238
273
238
284
$ 828
1,312
2,639
3,792
4,490
5,408
6,838
9,281
9,501
1 1,351
14,262
$ 9,400
10,200
10,900
11,500
13,925
14,763
16,395
17,933
18,002
19,052
21 ,082
$1 1,229
12,707
14,947
17,064
20,351
22,097
25,256
28,275
29,669
32,622
37,702
$ 79,428
87,745
98,1 05
102,064
1 1 1,295
127,332
147,437
156,124
164,955
185,489
216,572
10. A manufacturer of disposable washcloths/wipes told a retailer that sales for this
product category closely correlated with the sales of disposable diapers. The re-
tailer thought he would check this out for his own sales-forecasting purposes.
Where might a researcher find data to make this forecast?
The Springfield Electric Company manufactures electric pencil sharpeners. The
company believes that sales are correlated with the number of workers em-
ployed in specific geographic al areas. The following table presents Springfleld's
11.