stats 2 note 04

Upload: acrosstheland8535

Post on 03-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 Stats 2 Note 04

    1/23

    17.874LectureNotesPart 2: Matrix Algebra

  • 7/28/2019 Stats 2 Note 04

    2/23

    2. MatrixAlgebra2.1. Introduction: DesignMatrices andDataMatrices

    Matricesarearraysofnumbers. Weencountertheminstatisticsinatleastthreedierentways. First, they are a convenient device for systematically organizing our study designs.Second, data sets typically take the form rectangular arrays of numbers and characters.Third, algebraic and computational manipulation of data is facilitated bymatrixalgebra.This isparticularly important formultivariatestatistics.

    Over thenext 2weeksorso wewill learnthebasic matrixalgebraused in multivariatestatistics. Inparticular,wewilllearnhowtoconstructthebasicstatisticalconcepts{means,variances,andcovariances{inmatrixform,andtosolveestimationproblems,whichinvolvesolvingsystemsofequations.

    Ibeginthissectionofthe coursebydiscussingDesignMatricesandDataMatrices.

    DesignMatricesarearrays thatdescribe the structure ofastudy. Manystudies involvelittleactualdesign. Thedesignmatrix isthedatamatrix. Experiments,quasi-experiments,andsurveysofteninvolvesomedegreeofcomplexityandcareintheirdesign. Foranystudy,adesignmatrixcanbeveryusefulatthestartforthinking aboutexactlywhatdataneedstobecollectedtoansweraquestion.

    Thereareseveralcomponentstoadesignmatrix. Therowscorrespondtounitsofanalysisanduniquefactorsinastudy;thecolumnsarevariablesthatonewouldliketomeasure(evenifyoucan't). AttheveryleastoneneedstosetasidecolumnsfortheXvariableandtheYvariable. Youmightalsoconsideranassignmentprobability(inanexperimentorsurvey)orvariable (thismight bean instrumentalvariableinanobservationalstudy).

    Forexample, in theearlyandmid-1990s Idid aseriesof abouttwodozenexperimentsinvolvingcampaignadvertising. Earlyon inthosestudiesIwrotedownasimplematrixforkeeping the factors in the studies straight. Units were individuals recruitedto participate

    1

  • 7/28/2019 Stats 2 Note 04

    3/23

    in thestudy. Thefactorswerethatsomepeoplesawanadfroma Democrat,somepeoplesawan ad from a Republican, and some sawno ad. In addition, some ads were negativeand some were positive. I wanted to know how these factors aected variousoutcomes{especially, turnoutand vote intention. Do candidates do betterwith positiveor negativeads? Whichcandidates? Doesturnoutdrop?

    SimpleDesignMatrixSource Tone Assignment Turnout Vote

    D + .2 TD+ VD+D - .2 TD VDR + .2 TR+ VR+R - .2 TR VR0 0 .2 T0 V0

    Ifoundthatwritingdownamatrixlikethishelpedmetothinkaboutwhattheexperimentitselfcouldmeasure. Icanmeasure the eect ofnegativeversus positiveadsonturnoutbytakingthefollowingdierence: [TD++TR+][TD+TR].

    It alsoraisedsome interesting questions. Do Ineedthecontrolgroup tomeasurealloftheeectsofinterest?

    Data Matrices implement our design matrices and organize our data. Data matricesbegin withinformation abouttheunits in ourstudy{people, countries, years, etc. Theythencontain informationaboutthe variablesof interest. Itreally doesn'tmatterhowyourepresentdatainadatabase,exceptthatmostdatabasesreserveblanksand\."formissingdata.

    Below are hypothetical data for 10subject forour experiments. The content of eachvariable is recorded so that it ismost meaningful. UnderthevariableSource\D"meansDemocraticcandidate.

    2

  • 7/28/2019 Stats 2 Note 04

    4/23

    SimpleDesignMatrixSubject Source Tone Gender Turnout Vote

    1 D + F Yes D2 D - F Yes R3 C 0 M No 04 R - F Yes R5 C 0 M No 06 R + M Yes D7 D + M Yes R8 C 0 M Yes D9 R - F Yes R

    10

    D

    - M

    No

    0

    We can't easilyanalyze these data. We need numericalrepresentationsofthe outcomevariablesTurnoutandVoteandthecontrolvariables,ifany{inthiscaseGender. Forsuchcategoricaldatawewouldneedtoassign indicatorvariables toidentifyeachgroup. Exactlyhowwedothatmaydependonthesortofanalysiswewishtoperform. Source,forexample,mayleadtotwoindicatorvariables:DemocraticSourceandRepublicanSource. Eachequals1 if the statement istrueand0 otherwise. Tonemightalso leadto two dierent indicatorvariables.

    Usuallyvariablesthatare tobeusedinanalysisarecodednumerically. WemightcodeD as 1 and Rand 2 and Control group as 0. Tonemightbe 1 for +, 0 fornone, and -1fornegative. And so forth. Thiscodingcan bedoneduringdataanalysisorduringdataentry. A word ofcautionalways erron the side oftoorichof acoding,rather than skimpon informationavailablewhenyouassembleyourdata.

    Matriceshelp usto analyzethe information in adata matrixand helpusthink abouttheinformationcontained(andnot)inastudydesign. Wewilllearn,forexample,whentheeectsandparametersofinterestcaninfact beestimatedfromagivendesign.

    2.2. DenitionsVectorsarethe building blocks fordata analysis and matrix algebra.Avectorisacolumnofnumbers. Weusuallydenoteavectorasalowercaseboldletter

    3

  • 7/28/2019 Stats 2 Note 04

    5/23

    (x, y, a). When wedonot haveaspecic setof numbers at hand butareconsidering avectorintheabstract. For example,

    0 1x1;B CBx2;CB CBx3;CB CCx=B :;B CB CB :; CB C@ :; Axn

    Instatistics,avectorisusuallyavariable. Geometrically,avectorcorrespondstoapointinspace. Eachelementinthevectorcanbegraphedasthe\observationspace."Ifdatawere,say, fromasurvey,we couldgraphhow person1 answered the survey, person2, etc.,asavector.

    Wecouldtakeanothersortofvectorindataanalysis{therowcorrespondingtothevaluesofthevariablesforanyunitorsubject. That is,arowvector is aset ofnumbersarrangedinasinglerow:

    a= ( a1; a2; a3; : ; : ; : ;ak)Thetransposeofavectorisarewrittingofthevectorfromacolumntoaroworfroma

    rowtoacolumn. We denotethetransposewithanelongatedapostrophe 0.0x = ( x1; x2; x3; : ; : ; : ;xn)

    Amatrixisarectangulararrayofnumbers,oracollectionofvectors. Wewritematriceswithcapital letters. Theelementsofamatrixarenumbers.

    0 1x11; x21;:::xk1 CBx12; x22;:::xk2B CB CX=Bx13; x23;:::xk3 CB C@ A:::x1n; x2n;:::xkn

    orX= ( x1;x2;x3;:;:;:; xk)

    4

  • 7/28/2019 Stats 2 Note 04

    6/23

    Thedimensionsofamatrixarethenumberofrowsandcolumns. Above,Xhasnrowsandk columns so we say that it is an nkdimension matrixorjust\n by k."

    It is importanttokeepindexesstraightbecauseoperationssuchasadditionandmulti-plicationworkonindividualelements.

    The transpose of amatrixrepresents the reorganization of amatrix such that its rowsbecomeitscolumns. 0

    Bx ;01x ;021C

    B

    C

    0

    3xB CCX0 =B :;B CB CB :; CB C:;

    B

    C

    B ;C

    @ A0

    kx0 1x11; x12;:::x1nCBx21; x22;:::x2nCB CB

    X=Bx31; x32;:::x3nCB C@ A:::xk1; xk2;:::xkn

    Note: Thedimensionchangesfromnbyktokbyn.Aspecialtypeofmatrixthatissymmetric{thenumbersabovethediagonalmirrorthe

    numbersbelowthediagonal. ThismeansthatX=X0.

    As with simple algebra, we will want to have the \numbers" 0 and 1 so that we candenedivision,subtraction,and otheroperators. A 0 vectorormatrixcontainsall zeros.Theanalogue of thenumber1iscalledtheIdentitymatrix. Ithas1'sonthediagonaland0'selsewhere. 0 11;0;0;0;:::;0CB0;1;0;0;:::;0CBB C

    I=B0;0;1;0;:::;0CB C@ A:::0;0;0;0;:::;1

    2.3. AdditionandMultiplication

    5

  • 7/28/2019 Stats 2 Note 04

    7/23

    2.3.1. AdditionToaddvectorsandmatriceswe sum allelementswiththe same index.

    0 1x1+y1;B CBx2+y2;CB CBx3+y3;C

    x+y=BBB :; CCCBBB :; :;

    CCCA@xn+yn

    Matrixaddition isconstructedfromvectoraddition,becauseamatrixisavectorofvectors.Formatrices AandB,A+Bconsistsofrstaddingthevectorofvectors thatisAtothevectorofvectorsthatisBandthenperformingvectoradditionforeachofthevectors.

    Moresimply,keepthe indexesstraightand add each element in Ato theelementwiththesameindexinB.ThiswillproduceanewmatrixC.

    0 1a11+b11; a21+b21;:::ak1+bk1

    B C

    B a12+b12; a22+b22;:::ak2+bk2

    CC=A+B=B C@:::

    Aa1n +b1n; a2n+b2n;:::akn+bkn

    NOTE:AandBmusthavethesamedimensions inordertoaddthemtogether.

    2.3.2. MultiplicationMultiplicationtakesthreeforms: scalarmultiplication,innerproductandouterproduct.

    Wewilldenethemultiplicationrules for vectors rst,andthenmatrices.Ascalar productisanumber timesavectorormatrix. Letbeanumber:

    0 1x1;B CBx2;CB CBx3;CB CCx=B :;B CB CB :; CB C@ :; Axk

    6

  • 7/28/2019 Stats 2 Note 04

    8/23

    The inner product ofvectorsisthe sumofthe multipleofelementsoftwovectorswithP

    thesame index. That is, n Thisoperationisa rowvector timesa columnvector:i=1xiai.x0a. The rst element of the row is multipliedtimes the rst element of thecolumn. Thesecondelement of the row ismultiplied times the second elementof the column,andthatproduct isaddedtotheproductoftherstelements. Andsoon. Thisoperationreturnsanumber. Note: the columnsoftherowvector mustequaltherowsofthecolumnvector.

    In statistics thisoperation is of particular importance. Inner productsreturn sumsofsquaresandsumsofcross- products. Theseareusedtocalculatevariancesandcovariances.

    Theouterproduct oftwovectors isthemultiplicationofacolumnvector(dimensionn)timesarowvector (dimensionk). Theresultis amatrixofdimensionnbyk. The elementintherst rowofthecolumnvector is multipliedby theelement intherst columnoftherowvector. Thisproducestheelementinrow1andcolumn1inthenewmatrix. Generally,theithelementof the columnvector is multiplied times thej thelement of the rowvectortogettheijthelementofthenewmatrix.

    0 1a1b

    1; a

    1b

    2;:::a

    1b

    kB CBa2b1; a2b2;:::a2bkCab0 =B C@ ::: A

    anb1; anb2;:::anbkInstatistics you willfrequentlyencountertheouterproductofa vectoranditselfwhen

    weconsiderthevarianceofavector.For example, ee0 yieldsamatrixoftheresidualssquaredandcross-products.

    0 12e1; e1e2;:::e1enB 2 C

    0 Be2e1; e2;:::e2enCee =B C@ ::: Aene1; ene2; :::e2n

    2.3.3. DeterminantsThe length of a vector x equals x0x. This follows immediately from the Pythagorean

    Theorem. Consideravectorofdimension2. Thesquareofthehypotenuseisthesumofthe7

  • 7/28/2019 Stats 2 Note 04

    9/23

    squareofthetwosides. The squareofthe rstsideisthedistancetraveledondimension12 2(x1) andthesecondsideisthesquareofthedistance traveledondimension2(x2).

    Themagnitudeofamatrixistheabsolutevalueofthedeterminantofthematrix. Thedeterminantisthe (signed) areainscribed bythe sumofthe column vectors ofthematrix.Thedeterminantis onlydenedforasquarematrix.

    Considera2x2matrixwithelementsonthediagonalonly.

    x11; 0X=

    0; x22Theobjectdenedbythesevectorsisarectangle,whoseareaisthebasetimestheheight:x11x22.

    Fora2x2matrixthedeterminantisx11x22x12x21. Thiscanbederivedasfollows. Thesumofthevectors(x11; x12) and (x21; x22)denesaparallelogram. Andthedeterminant isthe formula for the area of theparallelogramsigned bythe orientation of theobject. Tocalculate the area of the parallelogram nd thearea of therectanglewith sidesx11 +x21and x12+x22. The parallelogram lieswithin thisrectangle. The area inside the rectanglebut not in the parallelogramcan be further divided into two identicalrectanglesand twopairsofidenticaltriangles. Onepairoftrianglesisdenedbytherstvectorandthesecondpairisdenedbythesecondvector. Therectanglesaredenedbytherstdimensionofthesecondvectorandtheseconddimensionoftherstvector. Subtractingotheareasofthetrianglesandrectangles leavestheformulaforthedeterminant(uptothesign).

    Note: Iftherstvectorequalledascalartimesthesecondvector,thedeterminantwouldequal0. This istheproblem ofmulticollinearity{ the two vectors (or variables) have thesamevaluesandtheycannotbedistinguishedwiththeobservedvaluesathand. Or,perhaps,theyarereallyjustthesamevariables.

    8

  • 7/28/2019 Stats 2 Note 04

    10/23

    Letusgeneralizethisbyconsidering,rst,a\diagonalmatrix."0x11;0;0;0;:::;01CB0; x22;0;0;:::;0CBB C

    X=B0;0; x33;0;:::;0CB C@ A:::0;0;0;0;:::;xnn

    This isann-dimensionalbox. Itsareaequalstheproductofitssides: x11x22:::xnn.Foranymatrix,thedeterminantcanbecomputedby\expanding"thecofactorsalonga

    givenrow(orcolumn),sayi.nX

    jXj= xij(1)i+jjX(ij)j;j=1

    where X(ij), called the ijth cofactor, isthe matrix leftupon deleting the ith row andjthcolumnand jX(ij)jisthedeterminantoftheijthcofactor.

    Toverifythat thisworks,considerthe2x2case. Expandalongtherstrow:jXj=x11(1)1+1jx22j+x12(1)1+2jx21j=x11x22x12x21

    2.4. EquationsandFunctions

    2.4.1. Functions.Itisimportanttokeepinmindwhattheinputeofafunctionisandwhatitreturns(e.g.,

    anumber,avector,amatrix). Generally,wedenote functionsastakingavectorormatrixasinputandreturninganumberormatrix: f(y). LinearandQuadraticForms:

    Linearform: AxQuadraticform: x0Ax2.4.2. Equations inMatrixForm. Asinsimplealgebraanequationissuchthattheleft

    side of the equation equals the right side. In vectors or matrices, the equality must holdelement byelement,and thusthetwosides of an equation must havethe samedimension.

    9

  • 7/28/2019 Stats 2 Note 04

    11/23

    Asimplelinearequationis: Ax=c

    or,formatrices:AX=C

    Thisdenesann-dimensionalplane.Aquadraticformdenesanellipsoid.

    0xAx=cGiven aspecicnumber c, the values of xi'sthatsolvethisequationmapoutan ellipticalsurface.

    2.5. Inverses.Weusedivisiontorescalevariables,asanoperation(e.g.,whencreatinganewvariable

    thatisaratioofvariables),andtosolveequations.The inverseof thematrix isthedivision operation. The inverse isdened asamatrix,

    B,suchthatAB =I:

    Thegeneralformulaforsolvingthisequationisbij jA(ij)j

    0;=

    jAjwhere jC(ij)jisthedeterminant oftheijthcofactorofA.

    Consider the2x2case b11; b a11; a12 12 1; 0=

    a21; a22 b21; b22 0; 1This operationproduces4equationsin4unknowns(theb's)

    a11b11+a12b21= 1 a21b11+a22b21= 0

    10

  • 7/28/2019 Stats 2 Note 04

    12/23

    a11b12+a12b22= 0 a21b12+a22b22= 1

    Solvingtheseequations: b11 = a22 ,b12 = a12 ,b21 = a21 , and b22 =a11a22a12a21 a11a22a12a21 a11a22a12a21a11 . Youcanverifythatthegeneralformulaaboveleadstothesamesolutions.

    a11a22a12a21

    11

  • 7/28/2019 Stats 2 Note 04

    13/23

    2.6. StatisticalConcepts inMatrixForm

    2.6.1. ProbabilityConceptsWegeneralizetheconceptofa randomvariabletoarandomvector,x, each of whose

    elements isa randomvariable,xi. Thejointdensityfunctionofx is f(x). The cumulativedensity function of x is the areaunderthe density functionuptoa point a1; a2;:::an. This is ann-fold integral:

    Z an Z

    a1F(a) = ::: f(x)dx1:::dxn1 1

    Forexample,supposethatx isuniformontheintervals(0,1). Thisdensity functionis:f(x) = 1if 0< x1

  • 7/28/2019 Stats 2 Note 04

    14/23

    This isdenedastheexpectedvalueoftheouterproductofthevector:0 1E[(x11)2]; E[(x11)(x22)];:::;E[(x11)(xnn)]B CBE[(x22)(x11)]; E[(x22)2];:::;E[(x22)(xnn)]CE[(x)(x)0] = B C@ ::: A

    E[(xnn)(x11)]; E[(xnn)(x22)];:::E[(xnn)2]0 1

    12; 12; 13;:::;1nB C

    2; 23;:::;2nCB12; 2=B C@ :::: A1n; 2n; 3n;:::;2n

    2.6.1.2. TheNormalandRelatedDistributionsThersttwomomentsaresucienttocharacterizeawiderangeofdistributionfunctions.

    Most importantly, the mean vector and the variance covariance matrix characterize thenormaldistribution.

    12

    (x)0 1 (x)f(x) = (2)n=2jj1=2eIfthexi are independent(0covariance)andidentical(samevariancesandmeans),then simplies 2I, and the normal becomes:

    12 n 22 (x)0(x)f(x) = (2)n= e

    Alineartransformationofanormalrandom variableisalsonormal. LetAbeamatrixof constantsand cbeavectorofconstantsandxbearandomvector,then

    Ax+cN(A+c;AA0)Wemaydene the standardnormalasfollows. LetAbeamatrixsuchthatAA0 =.

    Letc=. Let zbearandomvectorsuchthatx=Az+. Then, z=A1(x). Becausethe new variable is a linear transformation of x, we know that theresult willbenormal.Whatarethemeanandvariance:

    E[z] = E[A1(x)]=A1E(x) = 013

  • 7/28/2019 Stats 2 Note 04

    15/23

    V(z) = E[(A1(x))(A1(x))0]=E[(A1(x))((x))0A01]

    =A1 =A10A0 1E[(x))((x))0]A0 1 =I:Thelastequalityholdsbecause AA0 =.

    Fromthenormalwederivetheotherdistributionsofuseinstatisticalinference. Thetwokeydistributionsarethe2 and theF.

    The quadraticformofanormallydistributedrandomvectorwillfollowthe2 distribu-tion. Assumeastandardnormalvectorz.

    0z z2nbecause this is the sum of the squares of n normal random variables with mean 0 andvariance 1. Alternatively, if x is a normal randomvariablewith mean and variance ,then(x)01(x) follows a 2 withndegreesoffreedom.

    The ratio of two quadratic forms will follow theF distribution. Let B and C be twomatricessuchthatBC=0. Let AandBhavera andrb independentcolumns,respectively(i.e.,ranks). Letxbearandomvector withvariance2I, then

    (x0AA=2)=ra)(x0Bx=2)=rb) F[ra; rb]

    Instatisticalinferenceaboutregressionswewilltreatthevectorofcoecientsasarandomvector, b. We will construct hypothesesabout sets of coecients from a regression, andexpressthoseintheformofthematrices Band C. The 2-statisticamounts tocomparingsquared deviations of the estimated parameters from the hypothesized paramenters. TheF-statistic comparestwomodels{twodierentsetsofrestrictions.

    2.6.2.3. ConditionalDistributions.

    14

  • 7/28/2019 Stats 2 Note 04

    16/23

    For completeness, I present the conditionaldistributions. These are frequently used inBayesian statisticalanalysis;theyarealso usedas atheoreticalframeworkwithinwhichtothinkaboutregression. Theywon'tbeusedmuchinwhatwedointhiscourse.

    Todeneconditionaldistributionswewillneedtointroduceanadditionalconcept,par-titioning. Amatrixcanbesubdividedintosmallermatrices. Forexample,annxnmatrixAmay bewrittenas,say,four submatrices A11;A12;A21;A22:

    A11;A12A=A21;A22

    WecanwritetheinverseofAas theinverseofthepartitionsasfollows:1

    11(I+A12FA21A11 );A11A12FA1 =A1 ;FA21A1 F

    11 ;

    whereF= (A22A21A111A12)1. Wecanusethepartitioningresultstocharacterizetheconditionaldistributions.

    Considertwonormallydistributedrandomvectorsx1,x2: Thedensityfunctionforx1 is

    f(x1)

    N(1;

    11):

    Thedensityfunctionforx1 is

    f(x2)N(2;22):Theirjointdistribution is

    f(x1;x2)N(;);Thevariancematrixcanbewrittenas follows:

    11;12= 21;22

    wherejk=E[(xj jxkk)]andj= 1;2 and k= 1;2.The conditionaldistributionisarrived atbydividing thejointdensitybythemarginal

    densityofx1,yieldingx2jx1 N(2:1;22:1)

    1 1where2:1 =2+1211 (x22) and 22:1 =22121121.15

  • 7/28/2019 Stats 2 Note 04

    17/23

    2.6.2. Regressionmodel inmatrix form.1. LinearModelThe linearregression modelconsists of n equations. Each equation expresses howthe

    value of yforagivenobservationdependsonthevaluesofxi0 andi.y1 =0+ 1X11+2X21:::+kXk1+1y2 =0+ 1X12+2X22:::+kXk2+2

    . ..yi =0+1X1i+2X2i:::+kXki+i

    . ..yn =0+1X1n+2X2n:::+kXkn+n

    This can be rewritten using matrices. Let y be the random vector of the dependentvariable

    and

    be

    the

    random

    vector

    of

    the

    errors.

    The

    matrix

    X

    contains

    the

    values

    of

    the

    independentvariables. Therowsaretheobservationsandthecolumnsarethevariables. Tocapturetheintercept,therstcolumn isacolumnof1's.

    01; x11;:::xk1 1CB1; x12;:::xk2 CBB C

    X=B1; x13;:::xk3 CB C@ ::: A1; x1n;:::xkn

    Let beavectorofk+ 1 constants: 0 = (0; 1; 2;::k). Then, X producesavectorinwhicheachentry isaweightedaverageofthevaluesofX1i;:::Xki for a given observationiwhere theweightsarethe's.

    0 10+ 2X21:::+1X11+ kXk1+1B CB 0+1X12+ 2X22:::+kXk2+2 CB CCB :::B CX=B CB 0+1X1i+2X2i:::+kXki+i CB C@ ::: A0+1X1n+2X2n:::+kXkn+n

    16

  • 7/28/2019 Stats 2 Note 04

    18/23

    Withthisnotation,wecanexpressthelinearregressionmodelmoresuccinctlyas:y=X+

    2. AssumptionsaboutErrors'A1. Errors havemean0.

    E[] = 0 A2. Sphericaldistributionoferrors(noautocorrelation,homoskedastic).

    E[0] = 2IA3. IndependenceofX and.

    E[X0] = 0 3. MethodofMomentsEstimatorTheempiricalmoments impliedby Assumptions1and3arethat

    i0e= 0 X0e= 0

    wheree=yXb. We canuse thesetoderivethe methodof momentsestimatorofb, i.e.,theformulaweusetomakeourbestguessaboutthevalueofbased on the observed yandX.

    Substitutethedenitionofeintothesecondequation. This yieldsX0(yXb) = 0

    Hence,X0yX0X b =0,whichwecanrewriteas(X0X)b=X0y. This is a system of linear equations. X0X isakxk matrix. We can invertittosolvethesystem:

    b= (X0X)1X0y17

  • 7/28/2019 Stats 2 Note 04

    19/23

    Conceptually,thisisthecovariance(s)betweenyandXdividedbythevarianceofX,whichisthesameideaastheformulaforbivariateregression. Theexactformulaforthecoecientonanygiven variable,sayXj,willnotbe the sameasthe formula forthe coecient fromthebivariateregressionofY on Xj. We must adjust for theother factorsaectingY thatarecorrelatedwithXj (i.e.,theother X variables).

    Consideracasewith2independentvariablesandaconstant. Forsimplicity,letusdeviateallvariables fromtheirmeans: Thiseliminatestheconstantterm,butdoesnotchangethesolutionfortheslopeparameters. Sothevectorxj isreallyxjxj. Usingtheinnerproductsvectorstodenotethesumsineachelement,wecanexpresstherelevantmatricesasfollows.

    0 x10 ; xx11 2 xX0X=0 x20 ; xx12 2x

    0

    2x10 ;2 x x2

    1

    (X0X)1x

    0

    1x x1=

    x 20 )2x10 (2x x2

    0

    1x10 ;1x x2x

    (X0y) = 0 y10 y2

    x

    xCalculation ofthe formulaforbconsistsofmultiplyingtheX0X matrix(x-prime-xmatrix)timesthevectorX0y. ThisisanalogoustodividingthecovarianceofXandYbythevarianceofX.

    0 ;2 x x2xx

    0

    2x10y10y2

    xx

    1

    b=20 )2x1

    0 (2x x20

    1x10 ;1x x2

    0

    1x x1x 0 10 )y2

    0 )(x x2120 )x21

    0 ) (y x10 ) (x x2 2

    0 )(x x220 )(x x11

    (x Bb1 =b2

    CCB (xB C@ A0 )y1

    0 )(x x2120 )x21

    0 ) (y x2 0 ) (x x22

    0 )(x x110 )(x x11

    (x

    (x

    18

  • 7/28/2019 Stats 2 Note 04

    20/23

    2.7. DierentiationandOptimization

    We willencounter dierentiation instatistics in estimation,wherewe wishtominimizesquarederrorsormaximizelikelihood,andindevelopingapproximations,especiallyTaylor'sTheorem.

    Theprimary optimizationproblems take oneofthefollowingtwoforms.First,leastsquaresconsistsofchoosingthevaluesofbthatminimizethesumofsquared

    errorswithrespectto:S=0= (yX)0(yX)

    This isaquadraticformin.Second,maximumlikelihoodestimationconsistsofchoosingthevaluesoftheparameters

    thatmaximizetheprobabilityofobservingthedata. Iftheerrorsareassumedtocomefromthenormal distribution,the likelihoodof thedata isexpressed asthejoint density oftheerrors:

    222

    L(;

    ;) = (2)n=2 ne

    1

    0

    22= (2)n=2ne 1(yX)0(yX)

    This function is verynon-linear,but itcan beexpressed asaquadratic formaftertakinglogarithms:

    n 1ln(L) = ln(2)nln() (yX)0(yX)

    2 22Theobjectiveinbothproblemsistondavectorbthatoptimizestheobjectivefunctions

    withrespect to . Inthe least squares case we are minimizing and the likelihoodcase wearemaximizing. Inbothcases,thefunctionsinvolvedarecontinuous,sotherstandsecondorderconditionsforamaximumwillallowustoderivetheoptimalb.

    Although the likelihood problem assumes normality, the approach can be adapted tomany dierent probability densities, and providesa fairly exible framework for derivingestimates.

    19

  • 7/28/2019 Stats 2 Note 04

    21/23

    0

    Regardlessof theproblem, though, maximizing likelihood orminimizingerror involvesdierentiationof the objective function. Iwillteachyouthe basic rulesused in statisticalapplications. Theyarestraightforwardbutinvolvesomecareinaccountingtheindexesanddimensionality.

    2.7.1. Partial Dierentiation: Gradients and Hessians. Let y = f(x). The outcomevariable y changesalong eachof thexi dimensions. Wemayexpress thesetof rst partialderivativesasavector,calledthegradient.

    0@f(x)1@x1B@f(x)CB CB @x2 C@y CB :B C=B C@x B : CB C@ A:

    @f(x)@xn

    Instatisticswewillencounterlinearformsandquadraticformsfrequently. Forexample,the regression formula is a linear form and the sums of squared errors is expressed as aquadraticform. Thegradientsforthelinearandquadraticformsareas follows:

    @Ax=A0

    @x@x0Ax

    = (A+A0)x@x

    @x0Ax0=xx

    @AConsiderthelinearform. WebuildupfromthespecialcasewhereAisavectora. Then,

    Pny=a x= i=1aixi. Thegradient returnsavectorofcoecients:P0 @ n aixi 1 0 1

    i=1 @x1 a1PB n C B CB CB @ i=1 aixi C Ba2CB CB @x2 C@a0x B C B : C

    =B : C=B C =aB CB C B : C@x B : C B CB C@ A @ : AP :n

    aixi an@ i=1 @xn

    Note: aistransposed inthelinearform,butnotinthegradient.20

  • 7/28/2019 Stats 2 Note 04

    22/23

    Moregenerally, thegradientof y= Axwithrespect tox returnsthe transpose ofthematrix A. Any element, yi, equals a row vector of A = ai times the column vector x.Dierentiationofthe ithelementofy returns acolumnvectorthat isthe transposeoftheith rowvector ofA. The gradientofeach elementofy returnsanewcolumnvector. Asaresult,therows ofAbecomethecolumnsofthematrixofgradientsofthelinearform.

    TheHessianisthematrixofsecondpartialderivatives:10 @f2(x) @f2(x) @f2(x);::: @x1@xn@x1@x1;@x1@x2

    B C

    B @f2(x); @f2(x) C;::: @f2(x) C

    H=

    B@x2@x1 @x2@x2 @x2@xnB C@ ::: A@f2(x) @f2(x) @f2(x);::: @xn@xn@xn@x1;@xn@x2

    Note: This is a symmetric matrix. Each column is the derivative of the function f withrespecttothetransposeofX. SotheHessianisoftenwritten:

    @2H= [fij] = y :@xx0

    Hessian'sarecommonlyused inTaylorSeriesapproximations,suchas used intheDeltaTechniqueto approximatevariances. ThesecondorderTaylorSeriesapproximationcanbeexpressedasfollows:

    1X X 0 0y n n (xixi)(xjxj)fij (x0) = 1(xx0)0H(x0)(xx0)2 2i=1j=1They are also important in maximum likelihoodestimation, where the Hessian is used

    to construct the variance ofthe estimator. For the case ofa normallydistributed randomvectorwithmeanand variance2,theHessian is

    P

    0 n

    1

    n=

    2

    ; i=1(xi)@ P AnH= 4 i=1(xi) n Pn; 24 16 i=1(xi)24

    The expected value of the inverse of this matrix is the variance-covariance matrix of theestimatedparameters.

    2.7.2. Optimization. At anoptimum the gradient equals 0{ inalldirectionstherateofchangeis0. Thisconditioniscalledtherst ordercondition. Tocheckwhether agiven

    21

  • 7/28/2019 Stats 2 Note 04

    23/23

    point is a maximum or a minimum requires checking second order conditions. I will notexamine second orderconditions,butyouare welcome toread aboutand analyzethem inthetextbook.

    The rst order condition amountsto asystem of equations. At anoptimum,there is avector that solves the minimization or maximization problem in question. That vector isarrivedatbysettingthegradientequaltozeroandsolvingthesystem ofequations.

    Consider theleastsquaresproblem. Findthevalueof=bthatminimizes:S= (yX)0(yX)

    Usingtherulesoftransposition,wecanrewritethisas follows:0y+ 0X0 0X0X0 0y+0X0S= (y00X0)(yX) = (y Xy y) = y X2y0X

    The gradient of S with respect to consists of derivatives ofquadratic and linear forms.Usingtherulesabove:

    @S= 2X0X2X0y

    @Ataminimum,thevaluebguaranteesthatthegradientequals0:

    @S= 2X0Xb2X0y=0

    @The solution in terms of bis

    b=X0X1X0yForthemaximumlikelihoodproblem,therstorderconditions are:

    @ln(L) 1 ^= (2X0X2X0y) = 0@ 22^

    @ln(L)=n^ ^ ^1+3+ ((yX)0(yX) = 0 @

    Thesolutionis^=X0X1X0y

    1 1^ 0^ ^= (yX)0(yX) = e e

    n nwhereeisthevectorofresiduals.

    22