missing data ppt
DESCRIPTION
this ppt is about missing data and how to handle this type of data with the appropriate handling techniquesTRANSCRIPT
7/18/2019 Missing Data Ppt
http://slidepdf.com/reader/full/missing-data-ppt 1/35
Missing data & how to
handle it
Arooj Arshad
PhD Scholar
1
7/18/2019 Missing Data Ppt
http://slidepdf.com/reader/full/missing-data-ppt 2/35
•Carol Dweck,based on research
on belief systems,
and their role in
motivation and
achievement, has
a key contribution
in originating and
explaining implicit
theories of
intelligence/ability
.
issing data and how to
handle it!
"oals• Discuss ways to evaluate
and understand missing data
• Discuss common missing
data methods• Know the advantages and
disadvantages of common
methods
• Treatment of the missing
data
• Efficient ways of missing
data handling
7/18/2019 Missing Data Ppt
http://slidepdf.com/reader/full/missing-data-ppt 3/35
Reasons of Missing Data
issing data can occur for many reasons#
• Participants can fail to respond to $uestions
%legitimately or illegitimately&more on that later',
• ($uipment and data collecting or recording mechanisms
can malfunction,
• Subjects can withdraw from studies before they are
completed,
• Data entry errors can occur)
7/18/2019 Missing Data Ppt
http://slidepdf.com/reader/full/missing-data-ppt 4/35
Difference between missing and legitimatemissing data
issingData
+f any data on any variablefrom any participant is notpresent, the researcher isdealing with missing orincomplete data
Example: he missing ofresponse on a particular item
that assesses a particularconstruct )
-egitimateissingData
-egitimate missing data is anabsence of data when it isappropriate for there to be anabsence)
Example #whether you are
arried and if so, how longyou have
been married) +f you say youare not
married, it is legitimate foryou to skip the follow.up$uestion
(Cole, 2008)4
7/18/2019 Missing Data Ppt
http://slidepdf.com/reader/full/missing-data-ppt 5/35
• ethods for analy/ing missing data re$uireassumptions about the nature of the data andabout the reasons for the missing observationsthat are often not acknowledged)
• 0eviewing the stages of data collection, datapreparation, data analysis, and interpretation ofresults will highlight the issues that researchersmust consider in making a decision about how tohandle missing data in their work)
5
7/18/2019 Missing Data Ppt
http://slidepdf.com/reader/full/missing-data-ppt 6/35
•Key Elements of missingness
• he number of cases missing pervariable
• he number of variables missing percase)
• he pattern of correlation among
variables)
7/18/2019 Missing Data Ppt
http://slidepdf.com/reader/full/missing-data-ppt 7/35
•Point to be remembered…….• All researchers should examine their data formissingness, and researchers wanting the best %i)e),the most epli!able and "enerali#able' resultsfrom their research need to be prepared to deal withmissing data in the most appropriate and desirableway possible)
•
+f the proportion of cases with missing data is small,say 1ve percent or less, listwise deletion may beacceptable %0oth, 2334') +f 56 %or fewer' cases arenot missing completely at random, inconsistentparameter estimates can result) 7therwise, missing
data experts %-ittle 80ubin, 239:' recommend usinga - method for analysis, a method that makes use ofall available data points)
$
7/18/2019 Missing Data Ppt
http://slidepdf.com/reader/full/missing-data-ppt 8/35
%atre of Missingness• Missing Completely at Random (MCAR)
• Probability of the missing data on Y is unrelated to Y and X.
Missingness is random not depend on anything.• Eam!le" the re!orting of income #y the res!ondents$
• Chec%ed with the hel! of &ittle's C* test$ The test is #ased on mean
differences across grou! of su#+ects with the same missing data !attern$
*eaders interested more on it should read this article (henoi et al$,
20-2)$• Missing at Random (MAR)
• Probability of missing data on y is relayed to X.
• Eam!le" for really sic% !atients, clinicians may not draw #lood for
routine la#s$
• Missing Not at Random (MNAR)
• Probability of missing data on Y is dependent on value of Y
• Eam!le" *es!ondents with high income less li%ely to re!ort income
'
7/18/2019 Missing Data Ppt
http://slidepdf.com/reader/full/missing-data-ppt 9/35
Missing (ata )onse*en!es
+ias
• (stimatesystematicallydeviates from the$uantity of
interest)• ;o bias if the datais CA0, but biascan occur with not
CA0)• -ost data decreasestatistical power
,arian!e• issing data cansometimes leadto wrong
standard errors)• <rong studyconclusionsabout
relationship ofvariables tooutcomes)
(*oth, 200-)
-
7/18/2019 Missing Data Ppt
http://slidepdf.com/reader/full/missing-data-ppt 10/35
)ommonly/sed Missing (ata
0andling Methods
1
7/18/2019 Missing Data Ppt
http://slidepdf.com/reader/full/missing-data-ppt 11/35
)ommonly/sed Missing (ata Methods
(eletion Methods• -istwise=complete case deletion, pairwisedeletion
2ingle 3mptation Methods•
ean=mode substitution, dummyvariable method, single regression, >otDeck +mputation
Model+ased Methods• aximum -ikelihood, ultiple imputation
11
7/18/2019 Missing Data Ppt
http://slidepdf.com/reader/full/missing-data-ppt 12/35
•(eletion Method
1
7/18/2019 Missing Data Ppt
http://slidepdf.com/reader/full/missing-data-ppt 13/35
istwise (eletion 6)omplete )ase7nalysis8
• 7nly analy/e caseswith complete datadropping the missingvariables)
• <hen a researcher isestimating a model,such as a linear
regression, moststatistical packagesuse listwise deletionby default)
(Cole, 2008)19
7/18/2019 Missing Data Ppt
http://slidepdf.com/reader/full/missing-data-ppt 14/35
istwise (eletion 6)omplete )ase7nalysis8
• 7dantages• (ase of implementation)
• Comparability across analyses
• (isadantage• 0educes statistical power %be!ase lowers n aresearcher cannot anticipate if an ade$uate amountof data remain for the analysis')
• Doesn?t use all information
• (stimates may be biased if data isn?t CA0
%complete case analysis assumes that the observedcomplete cases are a random sample of theoriginally targeted sample, or in 0ubin@s %23:'terminology, that the missing data are CA0'
14
7/18/2019 Missing Data Ppt
http://slidepdf.com/reader/full/missing-data-ppt 15/35
•Pairwise deletion 67ailable )ase 7nalysis8
(Cole, 2008)15
• Analysis with all casesin which the variables
of interest are present)• 7dantage:
• Beeps as manycases as possible for
each analysis)• ses all information
possible with eachanalysis)
(isadantage:Can?t compareanalyses becausesample dierent eachtime)
7/18/2019 Missing Data Ppt
http://slidepdf.com/reader/full/missing-data-ppt 16/35
0ot(e!; 3mptation• 0esearcher should replace a missing value withthe actual score from a similar case in the currentdata set)
• he imputed score is termed E>otF because it isused by the computer)
7dantages• end to increase accuracy because missing datavalues are replaced by the realistic values)
• Particularly helpful when data are missing in certainpatterns
(isadantages• ;o) of classi1cation variables may becomeunmanageable in large surveys)
1<
7/18/2019 Missing Data Ppt
http://slidepdf.com/reader/full/missing-data-ppt 17/35
2ingle 3mptation Methods
1$
7/18/2019 Missing Data Ppt
http://slidepdf.com/reader/full/missing-data-ppt 18/35
2ingle 3mptation Methods
• ean=ode substitution
• Dummy variable control• Conditional mean substitution
1'
7/18/2019 Missing Data Ppt
http://slidepdf.com/reader/full/missing-data-ppt 19/35
Mean=Mode 2bstittion• 0eplace missing value with sample mean
or mode• 0un analyses as if complete cases analysis
7dantagesCan use complete case analysis methods
(isadantages0educes variability%underestimate standarderror')
<eakens covariance and correlation estimatesin the data %because +t ignores relationshipbetween variables'
1-
7/18/2019 Missing Data Ppt
http://slidepdf.com/reader/full/missing-data-ppt 20/35
Computed variance estimated
decrease as more means are added tocalculations)
• Gor example, a researcher might have HIsubjects, but 5 have missing data)
hrough mean substitution we add 5means to the J5 scores this wouldincrease the N in the calculation of the
variance but would not increase thedeviations around the mean)
7/18/2019 Missing Data Ppt
http://slidepdf.com/reader/full/missing-data-ppt 21/35
• ean substitution is worth
considering when correlationsbetween variables in the data are lowand less than 2I6 of the data are
missing %Donner, 239J')
1
7/18/2019 Missing Data Ppt
http://slidepdf.com/reader/full/missing-data-ppt 22/35
(mmy ,ariable 7d>stment• Create an indicator for missing value
%2Kvalue is missing for observationLIKvalue is observed for observation'
• +mpute missing values to a constant %suchas the mean'
7dantage• ses all available information about missingobservation
(isadantage
• 0esults in biased estimates
• ;ot theoretically driven
7/18/2019 Missing Data Ppt
http://slidepdf.com/reader/full/missing-data-ppt 23/35
egression 3mptation• 0eplaces missing values with
predicted score from a regressione$uation)
7dantage:•
ses information from observed data(isadantages:
• 7verestimates model 1t and correlation
estimates• <eakens variance
9
7/18/2019 Missing Data Ppt
http://slidepdf.com/reader/full/missing-data-ppt 24/35
Model +ased Methods
4
7/18/2019 Missing Data Ppt
http://slidepdf.com/reader/full/missing-data-ppt 25/35
Model +ased Methods
• aximum -ikelihood sing (algorithm
• ultiple imputation
• hese methods share two assumptions# that the
joint distribution of the data is multivariatenormal, and that the missing data mechanism isignorable)
5
7/18/2019 Missing Data Ppt
http://slidepdf.com/reader/full/missing-data-ppt 26/35
Maximm i;elihood /sing EM algorithm
• +denti1es the set of parameter values thatproduces the highest log.likelihood)
• - estimate# value that is most likely to haveresulted in the observed data
• Conceptually, process the same with or withoutmissing data
7dantages:• ses full information %both complete cases andincomplete cases' to calculate log likelihood
• nbiased parameter estimates with CA0=A0 data
(isadantages• S(s biased downward&can be adjusted by usingobserved information matrix
<
7/18/2019 Missing Data Ppt
http://slidepdf.com/reader/full/missing-data-ppt 27/35
• we can base estimation on the
likelihood of the observed data)
$
7/18/2019 Missing Data Ppt
http://slidepdf.com/reader/full/missing-data-ppt 28/35
Mltiple 3mptation• +mpute# Data is M1lled inN with imputed valuesusing speci1ed regression model
• his step is repeated m times, resulting in aseparate dataset each time)
• Analy/e# Analyses performed within each
dataset• Pool# 0esults pooled into one estimate
• +mputation is done by the Donald 0ubin formula#
• OK <Q%2Q2=m' R)• < and R are the within and between imputedvariances)
'
7/18/2019 Missing Data Ppt
http://slidepdf.com/reader/full/missing-data-ppt 29/35
Mltiple 3mptation• 7dantages#
•
Oariability more accurate with multipleimputations for each missing value
• Considers variability due to samplingA;D variability due to imputation
• (isadantages:• Cumbersome coding
• 0oom for error when specifying models
-
7/18/2019 Missing Data Ppt
http://slidepdf.com/reader/full/missing-data-ppt 30/35
Mltiple 3mptation
sing this likelihood function the -
procedure provides parameter estimatesbased on all available data, including theincomplete cases) >owever, simulationstudies show that - is an inade$uate
estimation techni$ue for some smallsample problems and results in biasedestimates %-ittle and 0ubin, 2393') Gorlarge samples - is a preferred method fordealing with missing data %Schafer and"raham, JIIJ')
9
7/18/2019 Missing Data Ppt
http://slidepdf.com/reader/full/missing-data-ppt 31/35
(i?eren!e between EM algorithmand M3•
.or the E algorithm we su#stituted a !redicted value on the #asis of the varia#les that
were availa#le for each case$ /n multi!le
im!utation we will do something similar, #ut
will add error com!onents to counteract the
tendency of E and aimum &i%elihood to
underestimate standard errors$
91
7/18/2019 Missing Data Ppt
http://slidepdf.com/reader/full/missing-data-ppt 32/35
*oth, -1
9
7/18/2019 Missing Data Ppt
http://slidepdf.com/reader/full/missing-data-ppt 33/35
99
7/18/2019 Missing Data Ppt
http://slidepdf.com/reader/full/missing-data-ppt 34/35
•Referenesllison, $ D$ (200-)$ Missing Data$ Sage University Papers Series on
Quantitative Applications in the Social Sciences. Thousand
3a%s" age$
Cole, 4$ C$ (2008)$ 5ow to deal with missing data$ /n 4$ 6$ 3s#orne
(Ed$), Best practices in quantitative methods (2-1728)$ Thousand
3a%s, C" age$
Enders, C$ (20-0)$ Applied Missing Data Analysis$ 9uilford ress" :ew;or%$
&ittle, *$ 4$, < Donald, *$ (2002)$ Statistical Analysis with Missing
Data$ 4ohn 6iley < ons, /nc" 5o#o%en$
*oth, $ (-1)$ issing data" conce!tual review for a!!lied
!sychologists$ Personnel Psychology, 1=, >=?>@0$
chafer, 4$ &$, 4ohn 6$ 9$ (2002)$ issing Data" 3ur Aiew of the tate
of the rt$ Psychological Methods, (=), -1=?-==$
94
7/18/2019 Missing Data Ppt
http://slidepdf.com/reader/full/missing-data-ppt 35/35