project of harish

8/2/2019 Project of Harish

1/9

Factors Affecting GDP, using 3 country data

A minor research project

(In partial fulfilment of the requirements of the course: MSTA 427)

Submitted by

HARISH KUMAR

Under the supervision of

Dr. R.N. Rattihali sir

Department of Statistics

Central University of Rajasthan

Kishangarh-305801

April 2012


2/9

CERTIFICATE

This is to certify that the minor research project report entitled

.submitted by........... a student of M.A./M.Sc.-

Statistics (Actuarial) IV semester. This is a record of bonafied work carried out

by him, under my guidance as part of the course: MSTA 427.

To the best of my knowledge, the report presented in the project has not

been submitted earlier for the award of any degree/diploma.

Place: KishangarhProject supervisor

Date:

Name:


3/9

INDEX

1)Introduction and description of the problem

2)Objectives

3)Formulation: Model, Hypothesis, etc.

4)Definitions5)Illustrations

6)Review of the work with references

7)Methodology (Statistical Tools, packages, graphs, etc. used)

8)Data: Main source (which has been analysed), other sources

9)Data Analysis

10) Conclusions

11) Major findings12) References


4/9

The introducntion to the problem : Gross domestic product (GDP) refers to

the market value of all officially recognized final goods and services produced within

a country in a given period. The GDP is an economic indicator to guess the

economic health of a country. The objective of this project is to give an analysis

to the factors affecting the GDP in India

Formulation and model : Although there so many factors affecting GDP

of any country but the factors used in this project are

Crop production Percentage change in industrial production

Inflation rate

Interest rate

Taxes on goods and services

Methodology and statistical tools used:-

The methodology and statistical tools used were regression analysis,

testing of hypothesis and some nonparametric tests.

A brief introduction to regression analysis: Regression analysis is the most

often applied technique of statistical analysis and modeling , widely used

technique for analyzing multifactor data

and for modelling the relationship between response variable (dependent

variable)and regress variable (independent variable). In general, it is used to

model a response variable (Y) as a linear function of one or more regress

variables (X1, X

2... X

p). Here linearity means linearity in parameters. If

there is only one regress variable then it is called simple linear regression

otherwise it is called multiple linear regression. The notations for

expressing the linear regression is generally given as


5/9

Yi= 0+ 1Xi + i (simple linear regression)

Yi = 0+ 1X1i+ 2X2i+ ... +pXpi + i (multiple linear regressions).

The e term in the model is referred to as a random error term which may

be due to various causes and may have following some particular statistical

distribution.

Model Assumptions:-

We assume that error term is following normal distribution with common

mean zero and common variance 2and are independent to each other.

Estimation of the parameters:-

in regression analysis our interest lies in estimating the best fitted

model.Ordinarily the regression coefficients (the s) are of unknown value

and must be estimated from sample information. There are well-

established statistical/ mathematical methods for determining these

estimates. The generally used methods are

1.Method of least square2.Method of maximum likelihood.

The resulting estimated model is

The random error then is estimated by

Estimation of parameters in multiple linear regressions:-

Method of least square :- let us assume that n observations were taken on p

regress variables and assuming errors to be i.i.d. N(0,2). Then the model can be

expressed as

Y1=

0+

1X

11+

2X

22+ ... +

pX

p1+

1

Y2= 0+ 1X12+ 2X22+ ... +pXp2+ 2

. . . . . . . . . . . .


6/9

Yi= 0+ 1X1i+ 2X2i+ ... +pXpi+ i

. . . . . . . . . .

Yn= 0+ 1X1n+ 2X2n+ ... +pXpn+ n

The above model can be written in the matrix form as

Y = X +

Where Y is a nx1 vector is (p+1)x1 vector X is a nx(p+1) vector.

Then the least square estimate of regression coefficient vector is given by

Analysis of Variance for regression

The ANOVA is used to test whether there is a linear relationship between

response variable Yi and regress variables Xi. The ANOVA table in multiple

linear regressions is given as following.

Source of Sum of Degrees of Mean Square F0

Variation squares freedom

Regression SSR p MSR MSR/ MSRES

Residuals SSRES n-(p+1) MSRES

Total SST n-1

Where the above notation are explained as following

SSR = sum of squares due to regress variables

SSRES = sum of squares due to residuals

SST = Total sum of squares

MSR = SSR/p = Mean square regressor

Here


7/9

SSRES= YY -

SST= YY -

One another method for testing the model adequacy is R2

which is also known as

the coefficient of determination. Value of R

2

lies between 1and 0. Higher thevalue of R

2the model fitted is considered to be better. The value of R

2is given as

=

Are the residuals (or errors) approximately normally distributed? A varietyof methods are available for checking this regression assumption:

Durbin Watson test

Anderson Daring test

Chi Square test

Testing the assumption for errors to be uncorrelated or independent can becarried out by using a non-parametric test Durbin Watson Test

The underlying Hypothesis is

H0: = 0

H0: > 0

The test statistics is

If d < dL reject H0

If d > dU do not reject H0

The Packages and statistical soft wares used are

SPSS MS- Excel


8/9

The data collection: The secondary data for 10 year (1999-2009) was

collected from the following sources

www.data.wordbank.com

www.tradingeconomics.com

www.rbi.org.in

Objectives:

To fit a multiple linear regression model considering GDP as adependent variable

Testing the significance of the regression by constructing theANOVA table. i.e. testing the hypothesisH0: 1= 2= 3 = 4= 5 =0

H1: j 0 for at least one j

Testing the significance of the model by computing R2.

Constructing 95 % confidence interval for estimates.

Testing Normality assumption for errors

DATA ANALYSIS

The 15 year data (1994 - 2009) of three countries is as following
http://www.data.wordbank.com/http://www.data.wordbank.com/http://www.tradingeconomics.com/http://www.tradingeconomics.com/http://www.rbi.org.in/http://www.rbi.org.in/http://www.rbi.org.in/http://www.tradingeconomics.com/http://www.data.wordbank.com/


9/9

project of harish

Documents