project of harish

Upload: ashish-sethi

Post on 05-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 Project of Harish

    1/9

    Factors Affecting GDP, using 3 country data

    A minor research project

    (In partial fulfilment of the requirements of the course: MSTA 427)

    Submitted by

    HARISH KUMAR

    Under the supervision of

    Dr. R.N. Rattihali sir

    Department of Statistics

    Central University of Rajasthan

    Kishangarh-305801

    April 2012

  • 8/2/2019 Project of Harish

    2/9

    CERTIFICATE

    This is to certify that the minor research project report entitled

    .submitted by........... a student of M.A./M.Sc.-

    Statistics (Actuarial) IV semester. This is a record of bonafied work carried out

    by him, under my guidance as part of the course: MSTA 427.

    To the best of my knowledge, the report presented in the project has not

    been submitted earlier for the award of any degree/diploma.

    Place: KishangarhProject supervisor

    Date:

    Name:

  • 8/2/2019 Project of Harish

    3/9

    INDEX

    1)Introduction and description of the problem

    2)Objectives

    3)Formulation: Model, Hypothesis, etc.

    4)Definitions5)Illustrations

    6)Review of the work with references

    7)Methodology (Statistical Tools, packages, graphs, etc. used)

    8)Data: Main source (which has been analysed), other sources

    9)Data Analysis

    10) Conclusions

    11) Major findings12) References

  • 8/2/2019 Project of Harish

    4/9

    The introducntion to the problem : Gross domestic product (GDP) refers to

    the market value of all officially recognized final goods and services produced within

    a country in a given period. The GDP is an economic indicator to guess the

    economic health of a country. The objective of this project is to give an analysis

    to the factors affecting the GDP in India

    Formulation and model : Although there so many factors affecting GDP

    of any country but the factors used in this project are

    Crop production Percentage change in industrial production

    Inflation rate

    Interest rate

    Taxes on goods and services

    Methodology and statistical tools used:-

    The methodology and statistical tools used were regression analysis,

    testing of hypothesis and some nonparametric tests.

    A brief introduction to regression analysis: Regression analysis is the most

    often applied technique of statistical analysis and modeling , widely used

    technique for analyzing multifactor data

    and for modelling the relationship between response variable (dependent

    variable)and regress variable (independent variable). In general, it is used to

    model a response variable (Y) as a linear function of one or more regress

    variables (X1, X

    2... X

    p). Here linearity means linearity in parameters. If

    there is only one regress variable then it is called simple linear regression

    otherwise it is called multiple linear regression. The notations for

    expressing the linear regression is generally given as

  • 8/2/2019 Project of Harish

    5/9

    Yi= 0+ 1Xi + i (simple linear regression)

    Yi = 0+ 1X1i+ 2X2i+ ... +pXpi + i (multiple linear regressions).

    The e term in the model is referred to as a random error term which may

    be due to various causes and may have following some particular statistical

    distribution.

    Model Assumptions:-

    We assume that error term is following normal distribution with common

    mean zero and common variance 2and are independent to each other.

    Estimation of the parameters:-

    in regression analysis our interest lies in estimating the best fitted

    model.Ordinarily the regression coefficients (the s) are of unknown value

    and must be estimated from sample information. There are well-

    established statistical/ mathematical methods for determining these

    estimates. The generally used methods are

    1.Method of least square2.Method of maximum likelihood.

    The resulting estimated model is

    The random error then is estimated by

    Estimation of parameters in multiple linear regressions:-

    Method of least square :- let us assume that n observations were taken on p

    regress variables and assuming errors to be i.i.d. N(0,2). Then the model can be

    expressed as

    Y1=

    0+

    1X

    11+

    2X

    22+ ... +

    pX

    p1+

    1

    Y2= 0+ 1X12+ 2X22+ ... +pXp2+ 2

    . . . . . . . . . . . .

  • 8/2/2019 Project of Harish

    6/9

    Yi= 0+ 1X1i+ 2X2i+ ... +pXpi+ i

    . . . . . . . . . .

    Yn= 0+ 1X1n+ 2X2n+ ... +pXpn+ n

    The above model can be written in the matrix form as

    Y = X +

    Where Y is a nx1 vector is (p+1)x1 vector X is a nx(p+1) vector.

    Then the least square estimate of regression coefficient vector is given by

    Analysis of Variance for regression

    The ANOVA is used to test whether there is a linear relationship between

    response variable Yi and regress variables Xi. The ANOVA table in multiple

    linear regressions is given as following.

    Source of Sum of Degrees of Mean Square F0

    Variation squares freedom

    Regression SSR p MSR MSR/ MSRES

    Residuals SSRES n-(p+1) MSRES

    Total SST n-1

    Where the above notation are explained as following

    SSR = sum of squares due to regress variables

    SSRES = sum of squares due to residuals

    SST = Total sum of squares

    MSR = SSR/p = Mean square regressor

    Here

  • 8/2/2019 Project of Harish

    7/9

    SSRES= YY -

    SST= YY -

    One another method for testing the model adequacy is R2

    which is also known as

    the coefficient of determination. Value of R

    2

    lies between 1and 0. Higher thevalue of R

    2the model fitted is considered to be better. The value of R

    2is given as

    =

    Are the residuals (or errors) approximately normally distributed? A varietyof methods are available for checking this regression assumption:

    Durbin Watson test

    Anderson Daring test

    Chi Square test

    Testing the assumption for errors to be uncorrelated or independent can becarried out by using a non-parametric test Durbin Watson Test

    The underlying Hypothesis is

    H0: = 0

    H0: > 0

    The test statistics is

    If d < dL reject H0

    If d > dU do not reject H0

    The Packages and statistical soft wares used are

    SPSS MS- Excel

  • 8/2/2019 Project of Harish

    8/9

    The data collection: The secondary data for 10 year (1999-2009) was

    collected from the following sources

    www.data.wordbank.com

    www.tradingeconomics.com

    www.rbi.org.in

    Objectives:

    To fit a multiple linear regression model considering GDP as adependent variable

    Testing the significance of the regression by constructing theANOVA table. i.e. testing the hypothesisH0: 1= 2= 3 = 4= 5 =0

    H1: j 0 for at least one j

    Testing the significance of the model by computing R2.

    Constructing 95 % confidence interval for estimates.

    Testing Normality assumption for errors

    DATA ANALYSIS

    The 15 year data (1994 - 2009) of three countries is as following

    http://www.data.wordbank.com/http://www.data.wordbank.com/http://www.tradingeconomics.com/http://www.tradingeconomics.com/http://www.rbi.org.in/http://www.rbi.org.in/http://www.rbi.org.in/http://www.tradingeconomics.com/http://www.data.wordbank.com/
  • 8/2/2019 Project of Harish

    9/9