english math statistics data the scientific method knowledge
TRANSCRIPT
english math
statistics
data
THE SCIENTIFIC METHOD
knowledge
ENGLISH TO MATH
• HYPOTHESIS IN ENGLISH: Revenues are related to the economy
• HYPOTHESIS IN MATH: Revenues (R) are related to income (Y), interest rates (I), prices (P), and time (T):
• R = a + b*Y + c*I + d*P + e*T
• Assumptions on coefficients: eg. b>0
CRITICAL ASSUMPTIONS
• REPRODUCIBILITY
• CORRECT SPECIFICATION
• ALL INFLUENCES THAT ARE NOT INCLUDED, HAVE NO EFFECT
• ALL INFLUENCES THAT ARE INCLUDED HAVE PRECISE, RIGID EFFECT
• CETERIS PARIBUS
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0 5 10 15 20 25 30 35 40 45 50
ADVERTISING AND CHANGE IN MARKET SHAREChange in Market Share (%)
Ad Spending($mil)
EstimatedRegressionLine
MATH TO STATISTICS
• NULL HYPOTHESES: State the opposite of what you wish to prove and find a counterexample.
• CRITICAL VALUES: You reject the null hypothesis when you jump the hurdle (critical value)
CRITICAL ASSUMPTIONS• CORRECT STATISTICAL METHOD CHOSEN
(eg. Regression)
• STATIONARITY (NO TREND EFFECTS)
• LEAST SUM SQUARED ERROR IS THE APPROPRIATE CRITERION
• RANDOMNESS OF OUTSIDE INFLUENCES (No autocorrelation or heteroscedasticity)
• STATISTICAL DISCRIMINATION POSSIBLE (No Multicollinearity)
x-
xx
x
xx
x
x
x
x
FITTING THE REGRESSION LINEM.Share = a + b* (Advt. Spending)
Advt. Spending
M.Share
}=ba={
= .858 + .2246 * (Advt. Spending)
0
5
10
15
20
25
0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51
ADVERTISING AND MARKET SHARE: CIGARETTESMarket Share (%)
Ad Spending($mil)
MEAN
REGRESSION LINE
{UNexplainederror
explainederror
TOTALerror
R-SQUARED
• = EXPLAINED SUM SQUARED ERROR
• TOTAL SUM SQUARED ERROR
• FOR EXAMPLE: An R-squared value of .90 means that ninety percent of the variation in your dependent variable is explained by the independent variables.
F-statistic
• EXPLAINED MEAN SQUARE ERROR
• UNEXPLAINED MEAN SQ. ERROR
• Null Hypothesis: The dependent variable is not explained by a combination of all of the independent variables together.
• Go to F-tables (.05) to find the critical values for rejecting the null hypothesis
t-statistic
• The critical value tests the significance of each variable (rejects the null hypothesis on each variable).
• Null Hypothesis: The dependent variable is not related to the independent variable.
• Go to t-tables (.05) to find the critical values for rejecting the null hypothesis in a two-tail test. Go to the .10 column for one-tail tests.
x-
xx
x
xx
x
x
x
x
HETEROSCEDASTICITY
}
}
LARGEERRORAT THISEND}
}
SMALLERRORELSE-WHERE
HETEROSCEDASTIC PATTERNS OF ERROR
·
·
······
·
··· ··
·····
·
·
··
·
···
··
··
···
·
·· ·
·
·· ·
·
·
··
·
·
···
··
·
·
Scattered at one end Scattered in the middle Scattered at both ends
AUTOOCORRELATION
POSITIVE AUTOCORRELATION NEGATIVE AUTOCORRELATION(eg. curvilinear pattern or other (eg. alternation above and below the nonlinear pattern) regression line)
··· ·
· · ·· ·
·
·
· ··
··
·
·
·
·
·
·
·
·
DURBIN-WATSON TEST FOR AUTOCORRELATION
POSITIVE AUTOCORRELATION NEGATIVE AUTOCORRELATION
0 .72 1.74 2.00 2.26 3.28 4.00| | | | | | |
Reject the null hypo-thesis that there is noPOSITIVEautocorre-lation
Reject the null hypo-thesis that there is noNEGATIVEautocorre-lation
Uncertainregion forPOSITIVEautocorre-lation
Uncertainregion forNEGATIVEautocorre-lation
NoPO-SI-TIVEauto-cor-re-lation
NoNE-GA-TIVEauto-cor-re-lation
STATISTICS TO DATA
• How is data defined and collected?
• Is the data consistently collected across all units?
• How should the data be transformed for your particular use?
DATA COLLECTION
• TIME SERIES: measures variation of a unit or variable over several time periods
• CROSS SECTION: measures variation during a given time period over several different units
• POOLED CROSS SECTION- TIME SERIES: measures variation of different units over different time periods.
TIME SERIES TRANSFORMATIONS
• SAMPLE SIZE
• AGGREGATION OF TIME (YEAR? DAY
• AGGREGATION OF UNIT (FIRM, MARKET, INDUSTRY)
• SPECIAL EVENT (DUMMY VARIABLE)
• MATH TRANSFORMATIONS
MATH TRANSFORMATIONS
• LOGARITHMS
• INVERSE
• PERCENTAGE CHANGES
• INFLATION, SEASONALITY
STATISTICAL PROCEDURES REQUIRED FOR DIFFERENT KINDS OF PROBLEM SOLVING
ARE THEREMANYEQUATIONS?
DO THEYINVOLVELINEARFUNCTIONS?
Is there more thanone inde-pendentvariable?
Simultaneous equation esimationprocedures should be used.
Apply MultipleLinear Regression.
Use SimpleLinear Regression.
Use NON linearregression orother NON linearestimation techniques.
yes
no
yes
no
yes
no