project of harish
TRANSCRIPT
-
8/2/2019 Project of Harish
1/9
Factors Affecting GDP, using 3 country data
A minor research project
(In partial fulfilment of the requirements of the course: MSTA 427)
Submitted by
HARISH KUMAR
Under the supervision of
Dr. R.N. Rattihali sir
Department of Statistics
Central University of Rajasthan
Kishangarh-305801
April 2012
-
8/2/2019 Project of Harish
2/9
CERTIFICATE
This is to certify that the minor research project report entitled
.submitted by........... a student of M.A./M.Sc.-
Statistics (Actuarial) IV semester. This is a record of bonafied work carried out
by him, under my guidance as part of the course: MSTA 427.
To the best of my knowledge, the report presented in the project has not
been submitted earlier for the award of any degree/diploma.
Place: KishangarhProject supervisor
Date:
Name:
-
8/2/2019 Project of Harish
3/9
INDEX
1)Introduction and description of the problem
2)Objectives
3)Formulation: Model, Hypothesis, etc.
4)Definitions5)Illustrations
6)Review of the work with references
7)Methodology (Statistical Tools, packages, graphs, etc. used)
8)Data: Main source (which has been analysed), other sources
9)Data Analysis
10) Conclusions
11) Major findings12) References
-
8/2/2019 Project of Harish
4/9
The introducntion to the problem : Gross domestic product (GDP) refers to
the market value of all officially recognized final goods and services produced within
a country in a given period. The GDP is an economic indicator to guess the
economic health of a country. The objective of this project is to give an analysis
to the factors affecting the GDP in India
Formulation and model : Although there so many factors affecting GDP
of any country but the factors used in this project are
Crop production Percentage change in industrial production
Inflation rate
Interest rate
Taxes on goods and services
Methodology and statistical tools used:-
The methodology and statistical tools used were regression analysis,
testing of hypothesis and some nonparametric tests.
A brief introduction to regression analysis: Regression analysis is the most
often applied technique of statistical analysis and modeling , widely used
technique for analyzing multifactor data
and for modelling the relationship between response variable (dependent
variable)and regress variable (independent variable). In general, it is used to
model a response variable (Y) as a linear function of one or more regress
variables (X1, X
2... X
p). Here linearity means linearity in parameters. If
there is only one regress variable then it is called simple linear regression
otherwise it is called multiple linear regression. The notations for
expressing the linear regression is generally given as
-
8/2/2019 Project of Harish
5/9
Yi= 0+ 1Xi + i (simple linear regression)
Yi = 0+ 1X1i+ 2X2i+ ... +pXpi + i (multiple linear regressions).
The e term in the model is referred to as a random error term which may
be due to various causes and may have following some particular statistical
distribution.
Model Assumptions:-
We assume that error term is following normal distribution with common
mean zero and common variance 2and are independent to each other.
Estimation of the parameters:-
in regression analysis our interest lies in estimating the best fitted
model.Ordinarily the regression coefficients (the s) are of unknown value
and must be estimated from sample information. There are well-
established statistical/ mathematical methods for determining these
estimates. The generally used methods are
1.Method of least square2.Method of maximum likelihood.
The resulting estimated model is
The random error then is estimated by
Estimation of parameters in multiple linear regressions:-
Method of least square :- let us assume that n observations were taken on p
regress variables and assuming errors to be i.i.d. N(0,2). Then the model can be
expressed as
Y1=
0+
1X
11+
2X
22+ ... +
pX
p1+
1
Y2= 0+ 1X12+ 2X22+ ... +pXp2+ 2
. . . . . . . . . . . .
-
8/2/2019 Project of Harish
6/9
Yi= 0+ 1X1i+ 2X2i+ ... +pXpi+ i
. . . . . . . . . .
Yn= 0+ 1X1n+ 2X2n+ ... +pXpn+ n
The above model can be written in the matrix form as
Y = X +
Where Y is a nx1 vector is (p+1)x1 vector X is a nx(p+1) vector.
Then the least square estimate of regression coefficient vector is given by
Analysis of Variance for regression
The ANOVA is used to test whether there is a linear relationship between
response variable Yi and regress variables Xi. The ANOVA table in multiple
linear regressions is given as following.
Source of Sum of Degrees of Mean Square F0
Variation squares freedom
Regression SSR p MSR MSR/ MSRES
Residuals SSRES n-(p+1) MSRES
Total SST n-1
Where the above notation are explained as following
SSR = sum of squares due to regress variables
SSRES = sum of squares due to residuals
SST = Total sum of squares
MSR = SSR/p = Mean square regressor
Here
-
8/2/2019 Project of Harish
7/9
SSRES= YY -
SST= YY -
One another method for testing the model adequacy is R2
which is also known as
the coefficient of determination. Value of R
2
lies between 1and 0. Higher thevalue of R
2the model fitted is considered to be better. The value of R
2is given as
=
Are the residuals (or errors) approximately normally distributed? A varietyof methods are available for checking this regression assumption:
Durbin Watson test
Anderson Daring test
Chi Square test
Testing the assumption for errors to be uncorrelated or independent can becarried out by using a non-parametric test Durbin Watson Test
The underlying Hypothesis is
H0: = 0
H0: > 0
The test statistics is
If d < dL reject H0
If d > dU do not reject H0
The Packages and statistical soft wares used are
SPSS MS- Excel
-
8/2/2019 Project of Harish
8/9
The data collection: The secondary data for 10 year (1999-2009) was
collected from the following sources
www.data.wordbank.com
www.tradingeconomics.com
www.rbi.org.in
Objectives:
To fit a multiple linear regression model considering GDP as adependent variable
Testing the significance of the regression by constructing theANOVA table. i.e. testing the hypothesisH0: 1= 2= 3 = 4= 5 =0
H1: j 0 for at least one j
Testing the significance of the model by computing R2.
Constructing 95 % confidence interval for estimates.
Testing Normality assumption for errors
DATA ANALYSIS
The 15 year data (1994 - 2009) of three countries is as following
http://www.data.wordbank.com/http://www.data.wordbank.com/http://www.tradingeconomics.com/http://www.tradingeconomics.com/http://www.rbi.org.in/http://www.rbi.org.in/http://www.rbi.org.in/http://www.tradingeconomics.com/http://www.data.wordbank.com/ -
8/2/2019 Project of Harish
9/9