statistical inference and regression analysis: stat-gb.3302.30, stat-ub.0015.01

104
Professor William Greene Stern School of Business IOMS Department Department of Economics Statistical Inference and Regression Analysis: Stat-GB.3302.30, Stat- UB.0015.01

Upload: aradia

Post on 25-Feb-2016

74 views

Category:

Documents


1 download

DESCRIPTION

Statistical Inference and Regression Analysis: Stat-GB.3302.30, Stat-UB.0015.01. Professor William Greene Stern School of Business IOMS Department Department of Economics. Part 5 – Hypothesis Testing. Objectives of Statistical Analysis. Estimation How long do hard drives last? - PowerPoint PPT Presentation

TRANSCRIPT

Statistics

Professor William GreeneStern School of BusinessIOMS Department Department of EconomicsStatistical Inference and Regression Analysis: Stat-GB.3302.30, Stat-UB.0015.01

#/100Part 5 Hypothesis Testing1Part 5 Hypothesis Testing

#/100Part 5 Hypothesis Testing2Objectives of Statistical Analysis

EstimationHow long do hard drives last?What is the median income among the 99%ers?Inference hypothesis testingDid minorities pay higher mortgage rates during the housing boom?Is there a link between environmental factors and breast cancer on eastern long island?

#/100Part 5 Hypothesis TestingGeneral FrameworksParametric Tests: features of specific distributions such as the mean of a Bernoulli or normal distribution.Specification Tests (Semiparametric) Do the data arrive from a Poisson process Are the data normally distributedNonparametric Tests: Are two discrete processes independent?

#/100Part 5 Hypothesis TestingHypothesesHypotheses - labelsState 0 of Nature Null HypothesisState 1 Alternative HypothesisExclusive: Prob(H0 H1) = 0Exhaustive: Prob(H0) + Prob(H1) = 1Symmetric: Neither is intrinsically preferred the objective of the study is only to support one or the other. (Rare?)

#/100Part 5 Hypothesis TestingTesting Strategy

#/100Part 5 Hypothesis TestingPosterior (to the Evidence) Odds

#/100Part 5 Hypothesis TestingDoes the New Drug Work?Hypotheses: H0 = .50, H1 = .75Priors: P0 = .40, P1 = .60Clinical Trial: N = 50, 31 patients respond p = .62Likelihoods:L0 (31| =.50) = Binomial(50,31,.50) = .0270059L1 (31| =.75) = Binomial(50,31,.75) = .0148156Posterior odds in favor of H0 = (.4/.6)(.0270059/.0148156) = 1.2152 > 1Priors favored H1 1.5 to 1, but the posterior odds favor H0 , 1.2152 to 1. The evidence discredits H1 even though the data seem more consistent with prior P1.

#/100Part 5 Hypothesis TestingDecision StrategyPrefer the hypothesis with the higher posterior oddsA gap in the theory: How does the investigator do the cost benefit test?Starting a new business venture or entering a new market: Priors and market researchFDA approving a new drug or medical device. Priors and clinical trialsStatistical Decision Theory adds the costs and benefits of decisions and errors.

#/100Part 5 Hypothesis TestingAn Alternative StrategyRecognize the asymmetry of null and alternative hypotheses.

Eliminate the prior odds (which are rarely formed or available).

#/100Part 5 Hypothesis Testing

http://query.nytimes.com/gst/fullpage.html?res=9C00E4DF113BF935A3575BC0A9649C8B63

#/100Part 5 Hypothesis TestingClassical Hypothesis TestingThe scientific method applied to statistical hypothesis testingHypothesis: The world works according to my hypothesisTesting or supporting the hypothesisData gatheringRejection of the hypothesis if the data are inconsistent with itRetention and exposure to further investigation if the data are consistent with the hypothesisFailure to reject is not equivalent to acceptance.

#/100Part 5 Hypothesis TestingAsymmetric HypothesesNull Hypothesis: The proposed state of natureAlternative hypothesis: The state of nature that is believed to prevail if the null is rejected.

#/100Part 5 Hypothesis TestingHypothesis Testing StrategyFormulate the null hypothesisGather the evidenceQuestion: If my null hypothesis were true, how likely is it that I would have observed this evidence?Very unlikely: Reject the hypothesisNot unlikely: Do not reject. (Retain the hypothesis for continued scrutiny.)

#/100Part 5 Hypothesis TestingSome Terms of ArtType I error: Incorrectly rejecting a true nullType II error: Failure to reject a false nullPower of a test: Probability a test will correctly reject a false nullAlpha level: Probability that a test will incorrectly reject a true null. This is sometimes called the size of the test.Significance Level: Probability that a test will retain a true null = 1 alpha.Rejection Region: Evidence that will lead to rejection of the nullTest statistic: Specific sample evidence used to test the hypothesisDistribution of the test statistic under the null hypothesis: Probability model used to compute probability of rejecting the null. (Crucial to the testing strategy how does the analyst assess the evidence?)

#/100Part 5 Hypothesis TestingPossible Errors in TestingCorrect DecisionType II ErrorType I ErrorCorrect Decision Hypothesis is Hypothesis is True FalseI Do Not Reject the HypothesisI Reject the Hypothesis

#/100Part 5 Hypothesis TestingA Legal Analogy: The Null Hypothesis is INNOCENTCorrect DecisionType II ErrorGuilty defendant goes freeType I ErrorInnocent defendant is convicted

Correct Decision Null Hypothesis Alternative Hypothesis Not Guilty GuiltyFinding: Verdict Not GuiltyFinding: VerdictGuilty The errors are not symmetric. Most thinkers consider Type I errors to be more serious than Type II in this setting.

#/100Part 5 Hypothesis Testing(Jerzy) Neyman (Karl) Pearson MethodologyStatistical testingMethodologyFormulate the null hypothesisDecide (in advance) what kinds of evidence (data) will lead to rejection of the null hypothesis. I.e., define the rejection regionGather the dataMechanically carry out the test.

#/100Part 5 Hypothesis TestingFormulating the Null HypothesisStating the hypothesis: A belief about the state of natureA parameter takes a particular valueThere is a relationship between variablesAnd so onThe null vs. the alternativeBy induction: If we wish to find evidence of something, first assume it is not true.Look for evidence that leads to rejection of the assumed hypothesis.Evidence that rejects the null hypothesis is significant

#/100Part 5 Hypothesis TestingExample: Credit Scoring RuleInvestigation: I believe that Fair Isaacs relies on home ownership in deciding whether to accept an application.Null hypothesis: There is no relationshipAlternative hypothesis: They do use homeownership data.What decision rule should I use?

#/100Part 5 Hypothesis TestingSome Evidence

= Homeowners5469503018451100

#/100Part 5 Hypothesis TestingHypothesis TestAcceptance rate for homeowners = 5030/(5030+1100) = .82055Acceptance rate for renters is .74774 H0: Acceptance rate for renters is not less than for owners. H0: p(renters) > .82055H1: p(renters) < .82055

#/100Part 5 Hypothesis TestingThe Rejection RegionWhat is the rejection region?Data (evidence) that are inconsistent with my hypothesisEvidence is divided into two types:Data that are inconsistent with my hypothesis (the rejection region)Everything else

#/100Part 5 Hypothesis TestingMy Testing ProcedureI will reject H0 if p(renters) < .815 (chosen arbitrarily)Rejection region is sample values of p(renters) < 0.815

#/100Part 5 Hypothesis TestingDistribution of the Test Statistic Under the Null HypothesisTest statistic p(renters) = 1/N i Accept(=1 or 0)Use the central limit theorem:Assumed mean = .82055Implied standard deviation= sqr(.82055*.17945/7413)=.00459Using CLT, normally distributed. (N is very large).Use z = (p(renters) - .82055) / .00459

#/100Part 5 Hypothesis TestingAlpha Level and Rejection RegionProb(Reject H0|H0 true) = Prob(p < .815 | H0 is true)= Prob[(p - .82055)/.00459)= Prob[z < -1.209]= .11333Probability of a Type I errorAlpha level for this test

#/100Part 5 Hypothesis TestingDistribution of the Test Statistic and the Rejection Region

Area=.11333

#/100Part 5 Hypothesis TestingThe TestThe observed proportion is 5469/(5469+1845) = 5469/7314 = .74774

The null hypothesis is rejected at the 11.333% significance level (by the design of the test)

#/100Part 5 Hypothesis TestingPower of the test

#/100Part 5 Hypothesis TestingPower Function for the Test(Power = size when alternative = the null.)

#/100Part 5 Hypothesis TestingApplication: Breast Cancer On Long IslandNull Hypothesis: There is no link between the high cancer rate on LI and the use of pesticides and toxic chemicals in dry cleaning, farming, etc.Neyman-Pearson ProcedureExamine the physical and statistical evidenceIf there is convincing covariation, reject the null hypothesisWhat is the rejection region?The NCI study:Working null hypothesis: There is a link: We will find the evidence.How do you reject this hypothesis?

#/100Part 5 Hypothesis TestingFormulating the Testing ProcedureUsually: What kind of data will lead me to reject the hypothesis?Thinking scientifically: If you want to prove a hypothesis is true (or you want to support one) begin by assuming your hypothesis is not true, and look for evidence that contradicts the assumption.

#/100Part 5 Hypothesis TestingHypothesis About a MeanI believe that the average income of individuals in a population is $30,000.H0 : = $30,000 (The null)H1: $30,000 (The alternative)I will draw the sample and examine the data.The rejection region is data for which the sample mean is far from $30,000.How far is far????? That is the test.

#/100Part 5 Hypothesis TestingApplicationThe mean of a population takes a specific value:Null hypothesis: H0: = $30,000H1: $30,000Test: Sample mean close to hypothesized population mean?Rejection region: Sample means that are far from $30,000

#/100Part 5 Hypothesis TestingDeciding on the Rejection RegionIf the sample mean is far from $30,000, reject the hypothesis.Choose, the region, for example,

The probability that the mean falls in the rejection region even though the hypothesis is true (should not be rejected) is the probability of a type 1 error. Even if the true mean really is $30,000, the sample mean could fall in the rejection region. 29,500 30,000 30,500RejectionRejection

#/100Part 5 Hypothesis TestingReduce the Probability of a Type I Error by Making the (non)Rejection Region Wider 28,500 29,500 30,000 30,500 31,500Reduce the probability of a type I error by moving the boundaries of the rejection region farther out.You can make a type I error impossible by making the rejection region very far from the null. Then you would never make a type I error because you would never reject H0.Probability outside this interval is large.Probability outside this interval is much smaller.

#/100Part 5 Hypothesis TestingSetting the Level is the probability of a type I errorChoose the width of the interval by choosing the desired probability of a type I error, based on the t or normal distribution. (How confident do I want to be?)Multiply the z or t value by the standard error of the mean.

#/100Part 5 Hypothesis TestingTesting ProcedureThe rejection region will be the range of values greater than 0 + z/N orless than 0 - z/N Use z = 1.96 for 1 - = 95%Use z = 2.576 for 1 - = 99%Use the t table if small sample, variance is estimated and sampling from a normal distribution.

#/100Part 5 Hypothesis TestingDeciding on the Rejection RegionIf the sample mean is far from $30,000, reject the hypothesis.Choose, the region, say,

RejectionRejection

I am 95% certain that I will not commit a type I error (reject the hypothesis in error). (I cannot be 100% certain.)

#/100Part 5 Hypothesis TestingThe Testing Procedure (For a Mean)

#/100Part 5 Hypothesis TestingThe Test ProcedureChoosing z = 1.96 makes the probability of a Type I error 0.05.Choosing z = 2.576 would reduce the probability of a Type I error to 0.01.Reducing the probability of a Type I error reduces the power of the test because it reduces the probability that the null hypothesis will be rejected.

#/100Part 5 Hypothesis TestingP ValueProbability of observing the sample evidence assuming the null hypothesis is true.

Null hypothesis is rejected if P value <

#/100Part 5 Hypothesis TestingP value < Prob[p(renter) < .74774] = Prob[z < (.74774 - .82055)/.00459] = (-15.86) = .59946942854362260 * 10-56 Impossible

=.11333

#/100Part 5 Hypothesis TestingConfidence IntervalsFor a two sided test about a parameter, a confidence interval is the complement of the rejection region. (Proof in text, p. 338)

#/100Part 5 Hypothesis TestingConfidence IntervalIf the sample mean is far from $30,000, reject the hypothesis.Choose, the region, say,

RejectionRejection

I am 95% certain that the confidence interval contains the true mean of the distribution of incomes. (I cannot be 100% certain.)Confidence

#/100Part 5 Hypothesis TestingOne Sided TestsH0 = 0, H1 0 Rejection region is sample mean far from 0 in either directionH0 = 0, H1 > 0. Sample means less than 0 cannot be in the rejection region.Entire rejection region is above 0. Reformulate: H0 < 0, H1 > 0.

#/100Part 5 Hypothesis TestingLikelihood Ratio Tests

#/100Part 5 Hypothesis TestingCarrying Out the LR TestIn most cases, exact distribution of the statistic is unknownUse -2log Chi squared [1]For a test about 1 parameter, threshold value is 3.84 (5%) or 6.45 (1%)

#/100Part 5 Hypothesis TestingPoisson Likelihood Ratio Test 49

#/100Part 5 Hypothesis TestingGeneralities About LR Test

#/100Part 5 Hypothesis TestingGamma Application

#/100Part 5 Hypothesis TestingSpecification TestsGenerally a test about a distribution where the alternative is some other distribution.Test is generally based on a feature of the distribution that is true under the null but not true under the alternative.

#/100Part 5 Hypothesis TestingPoisson Specification Tests3820 observations on doctor visitsPoisson distribution?

#/100Part 5 Hypothesis TestingDeviance TestPoisson Distribution p(x) = exp(-)x/x!H0: Everyone has the same Poisson DistributionH1: Everyone has their own Poisson distributionUnder H0, observations will tend to be near the mean. Under H1, there will be much more variation.Likelihood ratio statistic (Text, p. 348)

#/100Part 5 Hypothesis TestingDeviance Test

#/100Part 5 Hypothesis TestingDispersion TestPoisson Distribution p(x) = exp(-)x/x!H0: The distribution is PoissonH1: The distribution is something elseUnder H0, the mean will be (almost) the same as the varianceApproximate Likelihood ratio statistic (Text, p. 348) = N * Variance / MeanFor the doctor visit data, this is 22,348.6 vs. chi squared with 1 degree of freedom. H0 is rejected.

#/100Part 5 Hypothesis Testing

#/100Part 5 Hypothesis TestingSpecification Test - NormalityNormal Distribution is symmetric and has kurtosis = 3.

Compare observed 3rd and 4th moments to what would be expected from a normal distribution.

#/100Part 5 Hypothesis TestingSymmetric and Skewed Distributions

#/100Part 5 Hypothesis TestingKurtosis: t[5] vs. Normal

Kurtosis of normal(0,1) = 3Kurtosis of t[k] = 3 + 6/(k-4); for t[5] = 3+6/(5-4) = 9.

#/100Part 5 Hypothesis TestingBowman and Shenton Test for Normality

#/100Part 5 Hypothesis TestingTesting for a DistributionH0: The distribution is assumedH1: The assumed distribution is incorrectStrategy: Do the features of the sample resemble what we would observe if H0 were correctContinuous: CDF of data resemble CDF of the assumed distributionDiscrete: Sample cell probabilities resemble predictions from the assumed distribution

#/100Part 5 Hypothesis TestingProbability Plot for Normality

#/100Part 5 Hypothesis TestingNormal (log)Income?

#/100Part 5 Hypothesis TestingRandom Sample from Normal

#/100Part 5 Hypothesis TestingNormality Tests

#/100Part 5 Hypothesis TestingKolmogorov - Smirnov Test

#/100Part 5 Hypothesis TestingChi Squared Test for a Discrete DistributionOutcomes = A1, A2,, AMPredicted probabilities based on a theoretical distribution = E1(), E2(),,EM().Sample cell frequencies = O1,,OM

#/100Part 5 Hypothesis TestingTest Statistics

#/100Part 5 Hypothesis TestingV2 Rocket Hits576 0.25Km2 areas of South London in a grid (24 by 24)535 rockets were fired randomly into the grid = NP(a rocket hits a particular grid area) = 1/576 = 0.001736 = Expected number of rocket hits in a particular area = 535/576 = 0.92882How many rockets will hit any particular area? 0,1,2, could be anything up to 535.The 0.9288 is the for a Poisson distribution:

Adapted from Richard Isaac, The Pleasures of Probability, Springer Verlag, 1995, pp. 99-101.

#/100Part 5 Hypothesis Testing70

1 2 3 4 5 6 7 8 9 10 11 12 1312

3

4

5

6

7

8

9

10

11

12

13

#/100Part 5 Hypothesis Testing71

1 2 3 4 5 6 7 8 9 10 11 12 1312

3

4

5

6

7

8

9

10

11

12

13

#/100Part 5 Hypothesis Testing72

1 2 3 4 5 6 7 8 9 10 11 12 1312

3

4

5

6

7

8

9

10

11

12

13

#/100Part 5 Hypothesis Testing73Poisson Process = 1/169N = 144 = 144 * 1/169 = 0.852Probabilities:P(X=0) = .4266P(X=1) = .3634P(X=2) = .1548P(X=3) = .0437P(X=4) = .0094P(X>4) = .0021

#/100Part 5 Hypothesis TestingInterpreting The Process = 0.852Probabilities:P(X=0) = .4266P(X=1) = .3634P(X=2) = .1548P(X=3) = .0437P(X=4) = .0094P(X>4) = .0021There are 169 squaresThere are 144 trialsExpect .4266*169 = 72.1 to have 0 hits/squareExpect .3634*169 = 61.4 to have 1 hit/squareEtc.Expect the average number of hits/square to = .852.

#/100Part 5 Hypothesis TestingDoes the Theory Work?Theoretical OutcomesSample OutcomesOutcomeProbabilityNumber of CellsSample ProportionNumber of cells0.426672.4733801.363461.2899492.154826.1539263.04377.0769134.00942.00591> 4.00211.00000169*Prob(Outcome)Observed frequencies

#/100Part 5 Hypothesis TestingChi Squared for the Bombing Run77

#/100Part 5 Hypothesis TestingDifference in Means of Two PopulationsTwo Independent Normal PopulationsCommon known varianceCommon unknown varianceDifferent VariancesOne and two sided testsPaired SamplesMeans of paired observationsTreatments and Controls Diff-in-Diff SATNonparametric Mann/WhitneyTwo Bernoulli Populations

#/100Part 5 Hypothesis TestingComparing Two Normal Populations

#/100Part 5 Hypothesis TestingUnknown Common Variance

#/100Part 5 Hypothesis TestingHousehold Incomes, Equal Variances------------------------------------------------------t test of equal means INCOME by MARRIED------------------------------------------------------MARRIED = 0 Nx = 817 MARRIED = 1 Ny = 3057t [ 3872] = 3.7238 P value = .0002------------------------------------------------------ Mean Std.Dev. Std.ErrorINCOME ----------------------------------------------MARRIED = 0 .27982 .12939 .00453MARRIED = 1 .30145 .15194 .00275------------------------------------------------------

#/100Part 5 Hypothesis TestingUnknown Different Variances

#/100Part 5 Hypothesis Testing2 ProportionsTwo Bernoulli Populations:Xi ~ Bernoulli with Prob(xi=1) = xYi ~ Bernoulli with Prob(yi=1) = yH0: x = yThe sample proportions arepx = (1/Nx)ixi and py = (1/Ny)iyiSample variances are px(1-px) and py(1-py).Use the Central Limit Theorem to form the test statistic.

#/100Part 5 Hypothesis Testingz Test for Equality of Proportions

Application: Take up of public health insurance.------------------------------------------------------t test of equal means PUBLIC by FEMALE------------------------------------------------------FEMALE =0 Nx = 1812 FEMALE =1 Ny = 1565t [ 3375] = 5.8627 P value = .0000------------------------------------------------------ Mean Std.Dev. Std.ErrorPUBLIC ----------------------------------------------FEMALE = 0 .84713 .35996 .00846FEMALE = 1 .91310 .28178 .00712

#/100Part 5 Hypothesis TestingPaired Sample t and z TestObservations are pairs (Xi,Yi), i = 1,,NHypothesis x = y.Both normal distributions. May be correlated.Medical Trials: Smoking vs. Nonsmoking (separate individuals, probably independent)SAT repeat tests, before and after. (Definitely correlated)Test is based on Di = Xi Yi. Same as earlier with H0:D = 0.

#/100Part 5 Hypothesis TestingTreatment EffectsSAT Do OversExperiment: X1, X2, , XN = first SAT score, Y1, Y2, , YN = secondTreatment: T1,,TN = whether or not the student took a Kaplan (or similar) prep scoreHypothesis, y > x.Placebo: In Medical trials, N1 subjects receive a drug (treatment), N2 receive a placebo.Hypothesis: Effect is greater in the treatment group than in the control (placebo) group.

#/100Part 5 Hypothesis TestingMeasuring Treatment Effects

#/100Part 5 Hypothesis TestingTreatment Effects in Clinical TrialsDoes Phenogyrabluthefentanoel (Zorgrab) work?Investigate: Carry out a clinical trial.N+0 = The placebo effectN+T N+0 = The treatment effectThe hypothesis is that the difference in differences has mean zero.

Placebo Drug TreatmentNo Effect N00 N0TPositive Effect N+0 N+T

#/100Part 5 Hypothesis Testing88A Test of IndependenceIn the credit card example, are Own/Rent and Accept/Reject independent? Hypothesis: Prob(Ownership) and Prob(Acceptance) are independentFormal hypothesis, based only on the laws of probability: Prob(Own,Accept) = Prob(Own)Prob(Accept) (and likewise for the other three possibilities.Rejection region: Joint frequencies that do not look like the products of the marginal frequencies.

#/100Part 5 Hypothesis Testing89Contingency Table AnalysisThe Data: Frequencies Reject Accept TotalRent 1,845 5,469 7,214Own 1,100 5,030 6,630Total 2,945 10,499 13,444Step 1: Convert to Actual Proportions Reject Accept TotalRent 0.13724 0.40680 0.54404Own 0.08182 0.37414 0.45596Total 0.21906 0.78094 1.00000

#/100Part 5 Hypothesis Testing90Independence Test

Step 2: Expected proportions assuming independence: If the factors are independent, then the joint proportions should equal the product of the marginal proportions.[Rent,Reject] 0.54404 x 0.21906 = 0.11918[Rent,Accept] 0.54404 x 0.78094 = 0.42486[Own,Reject] 0.45596 x 0.21906 = 0.09988[Own,Accept] 0.45596 x 0.78094 = 0.35606

#/100Part 5 Hypothesis Testing91Comparing Actual to Expected

#/100Part 5 Hypothesis Testing92When is the Chi Squared Large?Critical values from chi squared tableDegrees of freedom = (R-1)(C-1).Critical chi squaredD.F. .05 .01 1 3.84 6.63 2 5.99 9.21 3 7.81 11.34 4 9.49 13.28 5 11.07 15.09 6 12.59 16.81 7 14.07 18.48 8 15.51 20.09 9 16.92 21.6710 18.31 23.21

#/100Part 5 Hypothesis Testing93Analyzing DefaultDo renters default more often (at a different rate) than owners?To investigate, we study the cardholders (only) DEFAULTOWNRENT 0 1 All 0 4854 615 5469 46.23 5.86 52.09

1 4649 381 5030 44.28 3.63 47.91

All 9503 996 10499 90.51 9.49 100.00

#/100Part 5 Hypothesis Testing94Hypothesis Test

#/100Part 5 Hypothesis Testing95Multiple Choices: Travel Mode210 Travelers between Sydney and Melbourne4 available modes, air, train, bus, carAmong the observed variables is income.Does income help to explain mode choice?Hypothesis: Mode choice and income are independent.

#/100Part 5 Hypothesis Testing96Travel Mode Choices

#/100Part 5 Hypothesis Testing97Travel Mode Choices and Income+----------------------------------------------------------+| Travel MODE Data |+--------+-------------------------------------------------+|INCOME | AIR TRAIN BUS CAR || Total |+--------+-------------------------------------++----------+|LOW | 10 36 9 8 || 63 || | 0.04761 0.17143 0.04286 0.03810 || 0.30000 ||----------------------------------------------++----------+|MEDIUM | 19 20 13 24 || 76 || | 0.09048 0.09524 0.06190 0.11429 || 0.36190 ||----------------------------------------------++----------+|HIGH | 29 7 8 27 || 71 || | 0.13810 0.03333 0.03810 0.12857 || 0.33810 ||==============================================++==========+|Total | 58 63 30 59 || 210 || | 0.27619 0.30000 0.14286 0.28095 || 1.00000 |+--------+-------------------------------------+-----------+

#/100Part 5 Hypothesis Testing98Contingency Table+----------------------------------------------------------+| Travel MODE Data |+--------+-------------------------------------------------+|INCOME | AIR TRAIN BUS CAR || Total |+--------+-------------------------------------++----------+| | 10 36 9 8 || 63 ||LOW | 0.04761 0.17143 0.04286 0.03810 || 0.30000 || | 0.08286 0.09000 0.04286 0.08429 |||----------------------------------------------++----------+| | 19 20 13 24 || 76 ||MEDIUM | 0.09048 0.09524 0.06190 0.11429 || 0.36190 || | 0.09995 0.10857 0.05170 0.10168 |||----------------------------------------------++----------+| | 29 7 8 27 || 71 ||HIGH | 0.13810 0.03333 0.03810 0.12857 || 0.33810 || | 0.09338 0.10143 0.04830 0.09499 |||==============================================++==========+|Total | 58 63 30 59 || 210 || | 0.27619 0.30000 0.14286 0.28095 || 1.00000 |+--------+-------------------------------------+-----------+Assuming independence, P(Income,Mode) = P(Income) x P(Mode).

#/100Part 5 Hypothesis Testing99Computing Chi Squared

For our transport mode problem, R = 3, C = 4, so DF = 2x3 = 6. The critical value is 12.59. The hypothesis of independence is rejected.

#/100Part 5 Hypothesis Testing100