unido.org/statistics international workshop on industrial statistics 8 – 10 july, beijing non...
TRANSCRIPT
unido.org/statisticsInternational workshop on industrial statistics
8 – 10 July, Beijing
Non response in industrial surveys
Shyam Upadhyaya
unido.org/statisticsWhat is non-
response?
• Failure of obtaining data from some units of the target population of a survey
• Unlike the survey of human population, which is relatively homogeneous, non-response may create serious problems in industrial survey
• Larger establishments account for higher share in estimates of total value as well as the variance of key variables
• A certain number of non-response is always expected. Thus, a plan for non-response treatment should be thought in priori.
unido.org/statisticsHow does the non-response affect –
conceptual framework
unido.org/statistics
Response rate
In the frame Not in the frame
Total
Within scope A1 A2
In scope = A1 + A2
Outside the scope
B1 B2
Outside the scope = B1 + B2
Total
In the frame = A1 + B1
Missing units = A2 + B2
Response rate is the ratio of statistical units actually observed with respect to the number of eligible units for the survey.
This ratio may not be found when the frame is imperfect
unido.org/statistics
Total number of statistical units in the register
Statistical units in the survey frame
Statistical units not in the frame
Units within the scope of survey (A1)
Units outside the scope of survey
Non-respondents
Units identified as other activity
Unidentified units – non-existent
Permanently closed units
Temporarily closed units
Respondents
Refusals
No contacts
Units within the scope of survey (A2)
Units outside the scope of survey
The existing frame should be updated with the additional information data from listing operation or administrative sources.
unido.org/statisticsMeasurement of response rates
• Unit response rate
Particularly important for monitoring the progress of survey
• Weighted response rate
Share of respondents in total value of a key variable of the survey (in case of sample survey w means design weights)
For survey estimates, WRR carries more value as it reflects the actual coverage, thus representativeness of the survey
;M
RURR 21 AAM
YywWRRR
iii
1
unido.org/statisticsVariation of URR and WRR by sub-
population
URR and WRR are rarely equal due to the variation of size of establishments. If better response is achieved from larger establishments WRR is higher.
unido.org/statistics
Types of non-response
1. Unit non-response, when there was no response from some statistical units
2. Item non-response – when some statistical units provided incomplete data (data missing for some variables within the unit)
3. Wave non-response – it may occur in panel surveys, when some statistical units respond in one round but do not respond in another.
unido.org/statistics
How to handle non-response?
Treatment of non-response depends on the type of non-response as well as the type of survey
Unit non-responseItem non-response
Sample survey
Weight adjustment to reflect the reduction of sample size
Imputation
Census
No-internal solution External sources such as admin data or past survey data
Imputation
unido.org/statistics
Unit non-responseIn sample survey
Weight adjustment: design weight
estimation weight
Non-response in sample survey is considered as reduction of the sample size. Subsequently design weight is inflated, assuming that non-response has occurred at random.
n
Nw
n
N
URRw
1
In census: There is no weights to adjust. Other ways to compensate unit non-response are :
• administrative data or • earlier survey data adjusted with applicable growth
rates
unido.org/statistics
Imputation for non-response
• Imputation is a technique of finding some artificial values to replace missing data due to non-response
• Basis consideration of replacement is that imputation is done from the observed value of a statistical unit that is quite similar to the non-respondent
• Imputation is particularly effective for item non-response.
Many variables of industrial survey are highly correlated; therefore mean and ratio of observed units may serve as predictor for non-respondents
unido.org/statisticsSome imputation methods
• Imputation based on mean valueMissing data is estimated by the mean value of observed units
Effective for homogeneous statistical units, for example within a size class of industry group at 4-digit level of ISIC
• Hot deck imputation
Missing data are replaced by the value of observed units. For this purpose a pool of ‘’donors ‘’ created. Under the random hot-deck method ‘’donors’’ are selected at random
Alternatively, a ‘’donor’’ can be the nearest neighbour. This method is called deterministic hot-deck method.
unido.org/statistics
Est_IDNumber of employees
SaleDistance
[Abs] Replacing
value
4781 989 144560
4782 895 147675 109
4783 786 … ←128589
4784 771 128589 15
4785 653 101868
4786 554 84762
4787 321 68150
4788 205 30135 7
4789 198 … ←30135
4790 106 25946 92
Example: Imputation with nearest neighbour method
unido.org/statisticsImputation methods…cont
• Cold deck methodsAs opposed to hot-deck (the term refers to punch cards) cold deck method is based on past data
• Post stratificationStatistical units are further stratified to create homogenous groups from which mean, median or ratio are computed to replace the missing value
• Statistics modellingRegression or similar models are constructed where the regression coefficients (or parameters) may serve as predictor of the missing value
unido.org/statisticsImputation for unit non-response using
external data sources
• Administrative dataIn case of unit non-response, there would not be any information from the survey. Alternatively, data for some key variables might be obtained from administrative sources.
• Data from the previous survey Often termed as Carry forward replacing the major values by results from earlier survey – effective for quarterly/monthly surveys
For annual and surveys with longer interval growth adjustment is necessary
unido.org/statistics
Some other points on non response
• Imputation does not necessarily reduce the bias, in sample survey it may even increase the standard error
• Unlike the household survey where ratio and mean estimates are important, industrial survey results are supposed to produce the total measure – such as industrial output, employment
• Imputation for missing data helps to improve the coverage of the survey estimates
• Imputation for large database requires carefully developed software application