appendix iii limitations of the data - censusappendix iii limitations of the data introduction—the...

24
Appendix III Limitations of the Data Introduction—The data presented in this Statistical Abstract came from many sources. The sources include not only fed- eral statistical bureaus and other organi- zations that collect and issue statistics as their principal activity, but also govern- mental administrative and regulatory agencies, private research bodies, trade associations, insurance companies, health associations, and private organizations such as the National Education Associa- tion and philanthropic foundations. Con- sequently, the data vary considerably as to reference periods, definitions of terms and, for ongoing series, the number and frequency of time periods for which data are available. The statistics presented were obtained and tabulated by various means. Some statistics are based on complete enumera- tions or censuses while others are based on samples. Some information is extracted from records kept for adminis- trative or regulatory purposes (school enrollment, hospital records, securities registration, financial accounts, social security records, income tax returns, etc.), while other information is obtained explicitly for statistical purposes through interviews or by mail. The estimation pro- cedures used vary from highly sophisti- cated scientific techniques, to crude ‘‘informed guesses.’’ Each set of data relates to a group of indi- viduals or units of interest referred to as the target universe or target population, or simply as the universe or population. Prior to data collection the target universe should be clearly defined. For example, if data are to be collected for the universe of households in the United States, it is necessary to define a ‘‘household.’’ The target universe may not be completely tractable. Cost and other considerations may restrict data collection to a survey universe based on some available list, such list may be inaccurate or out of date. This list is called a survey frame or sam- pling frame. The data in many tables are based on data obtained for all population units, a census, or on data obtained for only a portion, or sample, of the population units. When the data presented are based on a sample, the sample is usually a sci- entifically selected probability sample. This is a sample selected from a list or sampling frame in such a way that every possible sample has a known chance of selection and usually each unit selected can be assigned a number, greater than zero and less than or equal to one, repre- senting its likelihood or probability of se- lection. For large-scale sample surveys, the prob- ability sample of units is often selected as a multistage sample. The first stage of a multistage sample is the selection of a probability sample of large groups of population members, referred to as pri- mary sampling units (PSUs). For example, in a national multistage household sample, PSUs are often counties or groups of counties. The second stage of a multi- stage sample is the selection, within each PSU selected at the first stage, of smaller groups of population units, referred to as secondary sampling units. In subsequent stages of selection, smaller and smaller nested groups are chosen until the ulti- mate sample of population units is obtained. To qualify a multistage sample as a probability sample, all stages of sam- pling must be carried out using probabil- ity sampling methods. Prior to selection at each stage of a multi- stage (or a single stage) sample, a list of the sampling units or sampling frame for that stage must be obtained. For example, for the first stage of selection of a national household sample, a list of the counties and county groups that form the PSUs must be obtained. For the final stage of selection, lists of households, and sometimes persons within the house- holds, have to be compiled in the field. For surveys of economic entities and for Appendix III 897 U.S. Census Bureau, Statistical Abstract of the United States: 2009

Upload: others

Post on 22-Mar-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Appendix III

Limitations of the Data

Introduction—The data presented in thisStatistical Abstract came from manysources. The sources include not only fed-eral statistical bureaus and other organi-zations that collect and issue statistics astheir principal activity, but also govern-mental administrative and regulatoryagencies, private research bodies, tradeassociations, insurance companies, healthassociations, and private organizationssuch as the National Education Associa-tion and philanthropic foundations. Con-sequently, the data vary considerably asto reference periods, definitions of termsand, for ongoing series, the number andfrequency of time periods for which dataare available.

The statistics presented were obtainedand tabulated by various means. Somestatistics are based on complete enumera-tions or censuses while others are basedon samples. Some information isextracted from records kept for adminis-trative or regulatory purposes (schoolenrollment, hospital records, securitiesregistration, financial accounts, socialsecurity records, income tax returns, etc.),while other information is obtainedexplicitly for statistical purposes throughinterviews or by mail. The estimation pro-cedures used vary from highly sophisti-cated scientific techniques, to crude‘‘informed guesses.’’

Each set of data relates to a group of indi-viduals or units of interest referred to asthe target universe or target population,or simply as the universe or population.Prior to data collection the target universeshould be clearly defined. For example, ifdata are to be collected for the universeof households in the United States, it isnecessary to define a ‘‘household.’’ Thetarget universe may not be completelytractable. Cost and other considerationsmay restrict data collection to a surveyuniverse based on some available list,such list may be inaccurate or out of date.This list is called a survey frame or sam-pling frame.

The data in many tables are based ondata obtained for all population units, acensus, or on data obtained for only aportion, or sample, of the populationunits. When the data presented are basedon a sample, the sample is usually a sci-entifically selected probability sample.This is a sample selected from a list orsampling frame in such a way that everypossible sample has a known chance ofselection and usually each unit selectedcan be assigned a number, greater thanzero and less than or equal to one, repre-senting its likelihood or probability of se-lection.

For large-scale sample surveys, the prob-ability sample of units is often selected asa multistage sample. The first stage of amultistage sample is the selection of aprobability sample of large groups ofpopulation members, referred to as pri-mary sampling units (PSUs). For example,in a national multistage householdsample, PSUs are often counties or groupsof counties. The second stage of a multi-stage sample is the selection, within eachPSU selected at the first stage, of smallergroups of population units, referred to assecondary sampling units. In subsequentstages of selection, smaller and smallernested groups are chosen until the ulti-mate sample of population units isobtained. To qualify a multistage sampleas a probability sample, all stages of sam-pling must be carried out using probabil-ity sampling methods.

Prior to selection at each stage of a multi-stage (or a single stage) sample, a list ofthe sampling units or sampling frame forthat stage must be obtained. Forexample, for the first stage of selection ofa national household sample, a list of thecounties and county groups that form thePSUs must be obtained. For the final stageof selection, lists of households, andsometimes persons within the house-holds, have to be compiled in the field.For surveys of economic entities and for

Appendix III 897

U.S. Census Bureau, Statistical Abstract of the United States: 2009

the economic censuses the Bureau gener-ally uses a frame constructed from theBureau’s Business Register. The BusinessRegister contains all establishments withpayroll in the United States includingsmall single establishment firms as wellas large multi-establishment firms.

Wherever the quantities in a table refer toan entire universe, but are constructedfrom data collected in a sample survey,the table quantities are referred to assample estimates. In constructing asample estimate, an attempt is made tocome as close as is feasible to the corre-sponding universe quantity that would beobtained from a complete census of theuniverse. Estimates based on a samplewill, however, generally differ from thehypothetical census figures. Two classifi-cations of errors are associated with esti-mates based on sample surveys: (1) sam-pling error—the error arising from the useof a sample, rather than a census, to esti-mate population quantities and (2) non-sampling error—those errors arising fromnonsampling sources. As discussedbelow, the magnitude of the samplingerror for an estimate can usually be esti-mated from the sample data. However,the magnitude of the nonsampling errorfor an estimate can rarely be estimated.Consequently, actual error in an estimateexceeds the error that can be estimated.

The particular sample used in a survey isonly one of a large number of possiblesamples of the same size which couldhave been selected using the same sam-pling procedure. Estimates derived fromthe different samples would, in general,differ from each other. The standard error(SE) is a measure of the variation amongthe estimates derived from all possiblesamples. The standard error is the mostcommonly used measure of the samplingerror of an estimate. Valid estimates ofthe standard errors of survey estimatescan usually be calculated from the datacollected in a probability sample. For con-venience, the standard error is sometimesexpressed as a percent of the estimateand is called the relative standard error orcoefficient of variation (CV). For example,an estimate of 200 units with an esti-mated standard error of 10 units has anestimated CV of 5 percent.

A sample estimate and an estimate of itsstandard error or CV can be used to con-struct interval estimates that have a pre-scribed confidence that the intervalincludes the average of the estimatesderived from all possible samples with aknown probability. To illustrate, if all pos-sible samples were selected under essen-tially the same general conditions, andusing the same sample design, and if anestimate and its estimated standard errorwere calculated from each sample, then:1) approximately 68 percent of the inter-vals from one standard error below theestimate to one standard error above theestimate would include the average esti-mate derived from all possible samples;2) approximately 90 percent of the inter-vals from 1.6 standard errors below theestimate to 1.6 standard errors above theestimate would include the average esti-mate derived from all possible samples;and 3) approximately 95 percent of theintervals from two standard errors belowthe estimate to two standard errors abovethe estimate would include the averageestimate derived from all possiblesamples.

Thus, for a particular sample, one can saywith the appropriate level of confidence(e.g., 90 percent or 95 percent) that theaverage of all possible samples isincluded in the constructed interval.Example of a confidence interval: An esti-mate is 200 units with a standard error of10 units. An approximately 90 percentconfidence interval (plus or minus 1.6standard errors) is from 184 to 216.

All surveys and censuses are subject tononsampling errors. Nonsampling errorsare of two kinds—random and nonran-dom. Random nonsampling errors arisebecause of the varying interpretation ofquestions (by respondents or interview-ers) and varying actions of coders, key-ers, and other processors. Some random-ness is also introduced when respondentsmust estimate. Nonrandom nonsamplingerrors result from total nonresponse (nousable data obtained for a sampled unit),partial or item nonresponse (only a por-tion of a response may be usable), inabil-ity or unwillingness on the part of respon-dents to provide correct information,difficulty interpreting questions, mistakes

898 Appendix III

U.S. Census Bureau, Statistical Abstract of the United States: 2009

in recording or keying data, errors of col-lection or processing, and coverage prob-lems (overcoverage and undercoverage ofthe target universe). Random nonre-sponse errors usually, but not always,result in an understatement of samplingerrors and thus an overstatement of theprecision of survey estimates. Estimatingthe magnitude of nonsampling errorswould require special experiments oraccess to independent data and, conse-quently, the magnitudes are seldom avail-able.

Nearly all types of nonsampling errorsthat affect surveys also occur in completecensuses. Since surveys can be conductedon a smaller scale than censuses, non-sampling errors can presumably be con-trolled more tightly. Relatively more fundsand effort can perhaps be expendedtoward eliciting responses, detecting andcorrecting response error, and reducingprocessing errors. As a result, surveyresults can sometimes be more accuratethan census results.

To compensate for suspected nonrandomerrors, adjustments of the sample esti-mates are often made. For example,adjustments are frequently made for non-response, both total and partial. Adjust-ments made for either type of nonre-sponse are often referred to as imputa-tions. Imputation for total nonresponse isusually made by substituting for thequestionnaire responses of the nonre-spondents the ‘‘average’’ questionnaireresponses of the respondents. Theseimputations usually are made separatelywithin various groups of sample mem-bers, formed by attempting to placerespondents and nonrespondentstogether that have ‘‘similar’’ design orancillary characteristics. Imputation foritem nonresponse is usually made by sub-stituting for a missing item the responseto that item of a respondent having char-acteristics that are ‘‘similar’’ to those ofthe nonrespondent.

For an estimate calculated from a samplesurvey, the total error in the estimate iscomposed of the sampling error, whichcan usually be estimated from thesample, and the nonsampling error, whichusually cannot be estimated from the

sample. The total error present in a popu-lation quantity obtained from a completecensus is composed of only nonsamplingerrors. Ideally, estimates of the total errorassociated with data given in the Statisti-cal Abstract tables should be given. How-ever, due to the unavailability of esti-mates of nonsampling errors, only esti-mates of the levels of sampling errors, interms of estimated standard errors orcoefficients of variation, are available. Toobtain estimates of the estimated stand-ard errors from the sample of interest,obtain a copy of the referenced reportwhich appears at the end of each table.

Source of Additional Material: The FederalCommittee on Statistical Methodology(FCSM) is an interagency committee dedi-cated to improving the quality of federalstatistics <http://fcsm.ssd.census.gov>.

Principal data bases—Beginning beloware brief descriptions of 35 of the samplesurveys and censuses that provide a sub-stantial portion of the data contained inthis Abstract.

U.S. DEPARTMENT OFAGRICULTURE, NationalAgriculture Statistics Service

Basic Area Frame SampleUniverse, Frequency, and Types of Data:

June agricultural survey collects data onplanted acreage and livestock invento-ries on all land in the 48 contiguousstates and Hawaii. The survey alsoserves to measure list incompletenessand is subsampled for multiple framesurveys.

Type of Data Collection Operation: Strati-fied probability sample of about 11,000land area units of about 1 sq. mile(range from 0.1 sq. mile in cities to sev-eral sq. miles in open grazing areas).Sample includes 42,000 parcels of agri-cultural land. About 20 percent of thesample replaced annually.

Data Collection and Imputation Proce-dures: Data collection is by personalenumeration. Imputation is based onenumerator observation or datareported by respondents having similaragricultural characteristics.

Estimates of Sampling Error: EstimatedCVs range from 1 percent to 2 percentfor regional estimates to 3 percent to

Appendix III 899

U.S. Census Bureau, Statistical Abstract of the United States: 2009

6 percent for state estimates of majorcrop acres and livestock inventories.

Other (nonsampling) Errors: Minimizedthrough rigid quality controls on the col-lection process and careful review of allreported data.

Sources of Additional Material: U.S.Department of Agriculture, NationalAgricultural Statistics Service, USDA’sNational Agricultural Statistics Service:The Fact Finders of Agriculture, March2007.

Multiple Frame SurveysUniverse, Frequency, and Types of Data:

Surveys of U.S. farm operators to obtaindata on major livestock inventories,selected crop acreage and production,grain stocks, and farm labor characteris-tics, farm economic data, and chemicaluse data. Estimates are made quarterly,semi-annually, or annually depending onthe data series.

Type of Data Collection Operation: Pri-mary frame is obtained from general orspecial purpose lists, supplemented by aprobability sample of land areas used toestimate for list incompleteness.

Data Collection and Imputation Proce-dures: Mail, telephone, or personal inter-views used for initial data collection.Mail nonrespondent follow-up by phoneand personal interviews. Imputationbased on average of respondents.

Estimates of Sampling Error: EstimatedCVs range from 1 percent to 2 percentat the U.S. level for crop and livestockdata series and 3 to 5 percent for eco-nomic data. Regional CVs range from3 to 6 percent, while state estimate CVsrun 5 to 10 percent.

Other (nonsampling) Errors: In addition toabove, replicated sampling proceduresused to monitor effects of changes insurvey procedures.

Sources of Additional Material: U.S.Department of Agriculture, NationalAgricultural Statistics Service), USDA’sNational Agricultural Statistics Service:The Fact Finders of Agriculture, March2007.

Objective Yield SurveysUniverse, Frequency, and Types of Data:

Monthly surveys during the growingseason of corn, cotton, potatoes, soy-beans, and wheat fields in top produc-ing states for forecasting and estimatingyield per acre.

Type of Data Collection Operation: Ran-dom location of plots in probabilitysample. Corn, cotton, soybeans, springwheat, and durum wheat selected inJune from Basic Area Frame Sample (seeabove). Winter wheat and potatoesselected from March and June multipleframe surveys, respectively.

Data Collection and Imputation Proce-dures: Enumerators count and measureplant characteristics in sample fields.Production measured from plots at har-vest. Harvest loss measured from postharvest gleanings.

Estimates of Sampling Error: CVs fornational estimates of production areabout 2 to 3 percent.

Other (nonsampling) Errors: In addition toabove, replicated sampling proceduresused to monitor effects of changes insurvey procedures.

Sources of Additional Material: U.S.Department of Agriculture, NationalAgricultural Statistics Service), USDA’sNational Agricultural Statistics Service:The Fact Finders of Agriculture, March2007.

U.S. BUREAU OF JUSTICESTATISTICS (BJS)

National Crime VictimizationSurvey

Universe, Frequency, and Types of Data:Monthly survey of individuals andhouseholds in the United States toobtain data on criminal victimization ofthose units for compilation of annualestimates.

Type of Data Collection Operation:National probability sample survey ofabout 42,000 interviewed households in203 PSUs selected from a list ofaddresses from the 1990 census,supplemented by new construction per-mits and an area sample where permitsare not required.

900 Appendix III

U.S. Census Bureau, Statistical Abstract of the United States: 2009

Data Collection and Imputation Proce-dures: Interviews are conducted every6 months for 3 years for each householdin the sample; 7,000 households areinterviewed monthly. Personal inter-views are used in the first interview; theintervening interviews are conducted bytelephone whenever possible.

Estimates of Sampling Error: CVs for2005 estimates are: 4.1 percent for per-sonal crimes (includes all crimes of vio-lence plus purse snatching crimes), 4.2percent for crimes of violence; 16.2 per-cent for estimate of rape/sexual assaultcounts; 9.7 percent for robbery counts;4.4 percent for assault counts; 15.1 per-cent for purse snatching/pocket picking;2.2 percent for property crimes; 4.2 per-cent for burglary counts; 2.5 percent fortheft (of property); and 6.6 percent formotor vehicle theft counts.

Other (nonsampling) Errors: Respondentrecall errors which may include report-ing incidents for other than the refer-ence period; interviewer coding andprocessing errors; and possible mis-taken reporting or classifying of events.Adjustment is made for a householdnoninterview rate of about 9 percentand for a within-household noninterviewrate of 16 percent.

Sources of Additional Material: U.S.Bureau of Justice Statistics, Criminal Vic-timization in the United States, annual.

U.S. Bureau of Labor Statistics

Consumer Expenditure Survey(CE)

Universe, Frequency and Types of Data:Consists of two continuous compo-nents: a quarterly interview survey anda weekly diary or recordkeeping survey.They are nationwide surveys that collectdata on consumer expenditures,income, characteristics, and assets andliabilities. Samples are national probabil-ity samples of households that are rep-resentative of the civilian noninstitu-tional population. The surveys havebeen ongoing since 1980.

Type of Data Collection Operation: TheInterview Survey is a panel rotation sur-vey. Each panel is interviewed for fivequarters and then dropped from the sur-vey. About 7,000 consumer units are

interviewed each quarter. The Diary Sur-vey sample is new each year and con-sists of about 7,000 consumer units.Data are collected on an ongoing basisin 91 areas of the country.

Data Collection and Imputation Proce-dures: For the Interview Survey, data arecollected by personal interview witheach consumer unit interviewed onceper quarter for five consecutive quar-ters. Designed to collect informationthat respondents can recall for 3 monthsor longer, such as large or recurringexpenditures. For the Diary Survey,respondents record all their expendi-tures in a self-reporting diary for twoconsecutive one-week periods. Designedto pick up items difficult to recall over along period, such as detailed foodexpenditures. Missing or invalidattributes, expenditures, or incomes areimputed. Assets and liabilities are notimputed. The U.S. Census Bureau col-lects the data for the Bureau of LaborStatistics.

Estimates of Sampling Error: Standarderror tables are available since 2000.

Other (nonsampling) Errors: Includesincorrect information given by respon-dents, data processing errors, inter-viewer errors, and so on. They occurregardless of whether data are collectedfrom a sample or from the entire popu-lation.

Sources of Additional Material: Bureau ofLabor Statistics, see Internet site<http://www.bls.gov/cex>.

Consumer Price Index (CPI)Universe, Frequency, and Types of Data: A

monthly survey of price changes of alltypes of consumer goods and servicespurchased by urban wage earners andclerical workers prior to 1978, andurban consumers thereafter. Bothindexes continue to be published.

Type of Data Collection Operation: Priorto 1978, and since 1998, sample of vari-ous consumer items in 87 urban areas;from 1978−1997, in 85 PSUs, exceptfrom January 1987 through March 1988,when 91 areas were sampled.

Data Collection and Imputation Proce-dures: Prices of consumer items areobtained each month from about 25,500retail outlets and from about 4,000

Appendix III 901

U.S. Census Bureau, Statistical Abstract of the United States: 2009

housing units in 87 areas. Prices offood, fuel, and a few other items areobtained monthly; prices of most othercommodities and services are collectedevery month in the three largest geo-graphic areas and every other month inothers.

Estimates of Sampling Error: Estimates ofstandard errors are available.

Other (nonsampling) Errors: Errors resultfrom inaccurate reporting, difficulties indefining concepts and their operationalimplementation, and introduction ofproduct quality changes and new prod-ucts.

Sources of Additional Material: U.S.Bureau of Labor Statistics, Internet site<http://www.bls.gov/cpi/home.htm>and BLS Handbook of Methods, Chapter17, see Internet site <http://www.bls.gov/opub/hom/pdf/homch17.pdf>.

Current EmploymentStatistics (CES) Program

Universe, Frequency, and Types of Data:Monthly survey drawn from a samplingframe of over 8 million unemploymentinsurance tax accounts in order toobtain data by industry on employment,hours, and earnings.

Type of Data Collection Operation: In2006, the CES sample included about160,000 businesses and governmentagencies, which represent approxi-mately 400,000 individual worksites.

Data Collection and Imputation Proce-dures: Each month, the state agenciescooperating with BLS, as well as BLSData Collection Centers, collect datathrough various automated collectionmodes and mail. BLS Washington staffprepares national estimates of employ-ment, hours, and earnings while statesuse the data to develop state and areaestimates.

Estimates of Sampling Errors: The relativestandard error for total nonfarm employ-ment is 0.1 percent. From April 2002 toMarch 2003, the cumulative net birth/death model added 469,000.

Other (nonsampling) Errors: Estimates ofemployment adjusted annually to reflectcomplete universe. Average adjustmentis 0.2 percent over the last decade, withan absolute range from less than 0.05percent to 0.6 percent.

Sources of Additional Material: U.S.Bureau of Labor Statistics, Employmentand Earnings, monthly, ExplanatoryNotes and Estimates of Errors, Tables2-A through 2-F. See <http://www.bls.gov/web/cestntab.htm>.

National CompensationSurvey (NCS)

Universe, Frequency, and Types of Data:NCS collects data from establishmentsof all employment-size classes in privateindustries as well as state and local gov-ernments. The survey stratifies its databy geographic area and industry. NCScollects data on work schedules, wages,salaries, and employer costs foremployee benefits. For approximately80 metropolitan areas and the nation,NCS produces information on workers’earnings and benefits in a variety ofoccupations at different work levels.NCS is also responsible for two quarterlyreleases: the Employment Cost Index(ECI), which measures percent changesin the cost of employment, and theEmployer Costs for Employee Compen-sation (ECEC), which measures costs perhour worked for individual benefits. Thesurvey provides data by industry sector,industry division, occupational group,bargaining status, metropolitan area sta-tus, census region, and census division.ECEC also provides data byestablishment-size class.

Type of Data Collection Operation: Estab-lishments are selected for the surveybased on a probability-proportionate-to-employment technique. NCS replaces itssample on a continual basis. Privateindustry establishments are in the sur-vey for approximately 5 years.

Data Collection and Imputation Proce-dures: A personal visit to the establish-ment is the initial source for collectingdata. Communication via mail, fax, andtelephone provide quarterly updates.Imputation is done for individual ben-efits.

Estimates of Sampling Error: NCS usesstandard errors to evaluate publishedseries. These standard errors are avail-able at <http://www.bls.gov/ncs/ect/home.htm>.

Other (nonsampling) Errors: Nonsamplingerrors have a number of potentialsources. The primary sources are

902 Appendix III

U.S. Census Bureau, Statistical Abstract of the United States: 2009

(1) survey nonresponse and (2) data col-lection and processing errors. Nonsam-pling errors are not measured. The useof quality assurance programs reducesthe potential for nonsampling errors.These programs include the use of rein-terviews, interview observations, andthe systematic professional review ofreports. The programs also serve as atraining device that provides feedbackon errors for field economists (or datacollectors). Quality assurance programsalso provide information on sources oferror. This information is used toimprove procedures that result in fewererrors. NCS also conducts extensivetraining of field economists to maintainhigh standards in data collection.

Sources of Additional Material: Bureau ofLabor Statistics, BLS Handbook of Meth-ods, Chapter 8 <http://www.bls.gov/opub/hom/pdf/homch8.pdf>.

Producer Price Index (PPI)Universe, Frequency, and Types of Data:

Monthly survey of producing companiesto determine price changes of all com-modities and services produced in theUnited States for sale in commercialtransactions. Data on agriculture, for-estry, fishing, manufacturing, mining,gas, electricity, construction, publicutilities, wholesale trade, retail trade,transportation, healthcare, and otherservices.

Type of Data Collection Operation: Prob-ability sample of approximately 30,000establishments that result in about100,000 price quotations per month.

Data Collection and Imputation Proce-dures: Data are collected by mail andfacsimile. Missing prices are estimatedby those received for similar products orservices. Some prices are obtained fromtrade publications, organizedexchanges, and government agencies.To calculate index, price changes aremultiplied by their relative weightstaken from the Census Bureau’s 2002shipment values from their Census ofIndustries.

Estimates of Sampling Error: Not appli-cable.

Other (nonsampling) Errors: Not availableat present.

Sources of Additional Material: U.S.Bureau of Labor Statistics, BLS Handbookof Methods, Chapter 14, Bulletin 2490.U.S. Bureau of Labor Statistics Internetsite <http://stats.bls.gov/ppi/>.

BOARD OF GOVERNORS OF THEFEDERAL RESERVE SYSTEMSurvey of Consumer FinancesUniverse, Frequency, and Types of Data:

Periodic sample survey of families. Inthis survey a given household is dividedinto a primary economic unit and othereconomic units. The primary economicunit, which may be a single individual,is generally chosen as the person orcouple who either holds the title to thehome or is listed on the lease, alongwith all other people in the householdwho are financially dependent on thatperson or couple. The primary economicunit is used as the reference family. Thesurvey collects detailed data on thecomposition of family balance sheets,the terms of loans, and relationshipswith financial institutions. It also gathersinformation on the employment historyand pension rights of the surveyrespondent and the spouse or partner ofthe respondent.

Type of Data Collection Operation: Thesurvey employs a two-part strategy forsampling families. Some families areselected by standard multistage areaprobability sampling methods applied toall 50 states. The remaining families inthe survey are selected using statisticalrecords derived from tax returns, underthe strict rules governing confidentialityand the rights of potential respondentsto refuse participation.

Data Collection and Imputation Proce-dures: National Opinion Research Center(NORC) at the University of Chicago hascollected data for the survey since 1992.Since 1995, the survey has usedcomputer-assisted personal interview-ing. Adjustments for nonresponse aremade through multiple imputation ofunanswered questions and throughweighting adjustments based on dataused in the sample design for familiesthat refused participation.

Estimates of Sampling Error: Because ofthe complex design of the survey, theestimation of potential sampling errorsis not straightforward. A replicate-basedprocedure is available.

Appendix III 903

U.S. Census Bureau, Statistical Abstract of the United States: 2009

Other (nonsampling) Errors: The surveyaims to complete 4,500 interviews, withabout two thirds of that number deriv-ing from the area-probability sample.The response rate is typically about 70percent for the area-probability sampleand about 35 percent over all strata inthe tax-data sample. Proper training andmonitoring of interviewers, carefuldesign of questionnaires, and system-atic editing of the resulting data wereused to control inaccurate surveyresponses.

Sources of Additional Material: Board ofGovernors of the Federal Reserve Sys-tem, ‘‘Recent Changes in U.S. FamilyFinances: Evidence from the 2001 and2004 Survey of Consumer Finances,’’Federal Reserve Bulletin, 2006, <http://www.federalreserve.gov/Pubs/Bulletin>.

U.S. CENSUS BUREAU

2002 Economic Census(Industry Series, GeographicArea Series and Subject SeriesReports) (for NAICS sectors 22,42, 44−45, 48−49, and 51−81).

Universe, Frequency, and Types of Data:Conducted every 5 years to obtain dataon number of establishments, numberof employees, total payroll size, totalsales/receipts/revenue, and otherindustry-specific statistics. In 2002, theuniverse was all employer and nonem-ployer establishments excluding agricul-ture, forestry, fishing and hunting, andgovernment.

Type of Data Collection Operation: Alllarge employer firms were surveyed(i.e., all employer firms above payroll-size cutoffs established to separatelarge from small employers) plus a 5percent to 25 percent sample of thesmall employer firms. Firms with noemployees were not sent a censusreturn.

Data Collection and Imputation Proce-dures: Mail questionnaires were usedwith both mail and telephone follow-upsfor nonrespondents. Businesses alsohad the option to respond electronically.Data for nonrespondents and for smallemployer firms not mailed a question-naire were obtained from administrative

records of other federal agencies orimputed. Nonemployer data wereobtained exclusively from IRS 2002income tax returns.

Estimates of Sampling Error: Not appli-cable for basic data such as sales, rev-enue, receipts, payroll, etc. Other(nonsampling) errors: establishmentresponse rates by NAICS sector in 2002ranged from 80 percent to 89 percent.Item response rates generally rangedfrom 50 percent to 90 percent withlower rates for the more detailed ques-tions. Nonsampling errors may occurduring the collection, reporting, andkeying of data, and due to industry mis-classification.

Sources of Additional Material: U.S. Cen-sus Bureau, 2002 Economic Census:Industry Series, Geographic Area Seriesand Subject Series Reports (by NAICSsector), Appendix C and <http://www.census.gov/econ/census02/guide/index.html>.

American Community Survey(ACS)

Universe, Frequency, and Types of Data:Nationwide survey to obtain data aboutdemographic, social, economic, andhousing characteristics of people,households, and housing units. Coversthe household population and, for thefirst time in 2006, also includes thepopulation living in institutions, collegedormitories, and other group quarters.

Type of Data Collection Operation: First-phase sampling is performed duringboth Main and Supplemental samplingfor approximately 3,000,000 housingunits in the U.S. and 36,000 in PuertoRico (PR). First stage sampling definesthe universe for the second stage ofsampling through two steps. First, alladdresses that were in a first-stagesample within the past four years areexcluded from eligibility. This ensuresthat no address is in sample more thanonce in any 5-year period. The secondstep is to select a 20 percent systematicsample of ‘‘new’’ units, i.e. those unitsthat have never appeared on a previousMaster Address File (MAF) extract. Eachnew address is systematically assignedto either the current year or to one offour back-samples. This proceduremaintains five equal partitions of the

904 Appendix III

U.S. Census Bureau, Statistical Abstract of the United States: 2009

universe. Sampling is performed sepa-rately on small (15 persons or fewer)and large (more than 15 persons) groupquarters, but the target rate for both is a2.5% sample of the group quarterspopulation. Approximately 200,000group quarters were selected in the U.S.,and an additional 1,000 in Puerto Rico.

Data Collection and Imputation Proce-dures: The American Community Surveyis conducted every month on indepen-dent samples. Each housing unit in theindependent monthly samples is maileda prenotice letter announcing the selec-tion of the address to participate, a sur-vey questionnaire package, and areminder postcard. These sample unitsreceive a second (replacement) ques-tionnaire package if the initial question-naire has not been returned by ascheduled date. In the mail-out/mail-back sites, sample units for which aquestionnaire is not returned in the mailand for which a telephone number isavailable are defined as the telephonenonresponse follow-up universe. Inter-viewers attempt to contact and inter-view these mail nonresponse cases.Sample units from all sites that are stillunresponsive two months after the mail-ing of the survey questionnaires anddirectly after the completion of the tele-phone follow-up operation are sub-sampled at rates between 1 in 2 and 1in 3. The selected nonresponse units areassigned to Field Representatives (FRs),who visit the units, verify their exist-ence or declare them nonexistent, deter-mine their occupancy status, andconduct interviews. Collection of groupquarters data is conducted by field rep-resentatives. Their methods includecompleting the questionnaire whilespeaking to the resident in person orover the telephone, conducting a per-sonal interview with a proxy, such as arelative or guardian, or leaving paperquestionnaires for residents to completefor themselves and then pick them uplater. This last option is used for datacollection in federal prisons. After datacollection is completed, any remainingincomplete or inconsistent informationwas imputed during the final automatededit of the collected data.

Estimates of Sampling Error: The data inthe ACS products are estimates of the

actual figures that would have beenobtained by interviewing the entirepopulation using the same methodol-ogy. The estimates from the chosensample also differ from other samples ofhousing units and persons within thosehousing units.

Other (nonsampling) Errors: NonsamplingError—In addition to sampling error,data users should realize that othertypes of errors may be introduced dur-ing any of the various complex opera-tions used to collect and process surveydata. An important goal of the ACS is tominimize the amount of nonsamplingerror introduced through nonresponsefor sample housing units. One way ofaccomplishing this is by following up onmail nonrespondents.

Sources of Additional Material: U.S. Cen-sus Bureau, American Community Sur-vey Web site available on Internet,<http://www.census.gov/acs/www/index.html> U.S. Census Bureau, AmericanCommunity Survey Accuracy of the Datadocuments available on the Internet,<http://www.census.gov/acs/www/UseData/Accuracy/Accuracy1.htm>.

American Housing SurveyUniverse, Frequency, and Types of Data:

Conducted nationally in odd numberedyears to obtain data on the approxi-mately 124 million occupied or vacanthousing units in the United States(group quarters are excluded). Datainclude characteristics of occupied hous-ing units, vacant units, new housing andmobile home units, financial characteris-tics, recent mover households, housingand neighborhood quality indicators,and energy characteristics.

Type of Data Collection Operation: Thenational sample was a multistage prob-ability sample with about 57,000 unitseligible for interview in 2005. Sampleunits, selected within 394 PSUs, weresurveyed over a 4-month period.

Data Collection and Imputation Proce-dures: For 2005, the survey was con-ducted by personal interviews. Theinterviewers obtained the informationfrom the occupants or, if the unit wasvacant, from informed persons such aslandlords, rental agents, or knowledge-able neighbors.

Appendix III 905

U.S. Census Bureau, Statistical Abstract of the United States: 2009

Estimates of Sampling Error: For thenational sample, illustrations of theStandard Error (SE) of the estimates areprovided in the Appendix D of the 2005report.

Other (nonsampling) Errors: Responserate was about 90 percent. Nonsamplingerrors may result from incorrect orincomplete responses, errors in codingand recording, and processing errors.Appendix D of the 2005 report has acomplete discussion of the errors.

Sources of Additional Material: U.S. Cen-sus Bureau, Current Housing Reports,Series H-150 and H-170, American Hous-ing Survey see <http://www.census.gov/hhes/www/ahs.html>.

Annual Survey of GovernmentEmployment and Payroll

Universe, Frequency, and Types of Data:The survey measures the number ofstate, local, and federal civilian govern-ment employees and their gross payrollsfor the pay period including March 12,2006. The survey is conducted annuallyexcept in years ending in ‘2’ and ‘7’when a census of all state and local gov-ernments is done. The survey providesdata on full-time and part-time employ-ment, part-time hours worked, full-timeequivalent employment, and payroll sta-tistics by governmental function (ele-mentary and secondary education,higher education, police protection, fireprotection, financial administration, cen-tral staff services, judicial and legal,highways, public welfare, solid wastemanagement, sewerage, parks and rec-reation, health, hospitals, water supply,electric power, gas supply, transit, natu-ral resources, correction, libraries, airtransportation, water transport and ter-minals, other education, state liquorstores, social insurance administration,and housing and community develop-ment).

Type of Data Collection Operations: Thesurvey sample is taken from the 2002Census of Governments and containsapproximately 11,000 local governmentunits. These units were sampled from asampling frame that contained 83,767local governments (county, city, town-ship, special district, school districts) inaddition to 50 state governments andthe District of Columbia. This frame was

slightly different from the AnnualFinance Survey sampling frame. Forty-two of the state governments provideddata from central payroll records for allor most of their agencies/institutions.Data for agencies and institutions forthe remaining state governments wereobtained by mail canvass question-naires. Local governments were alsocanvassed using a mail questionnaire.However, elementary and secondaryschool system data in Florida, NorthDakota, and Washington were suppliedby special arrangements with the stateeducation agency in each of thesestates. All respondents receiving themail questionnaire had the option ofresponding using the employment Website developed for reporting data.Approximately 25% of the state agencyand local government respondentschose to respond on the Web.

Editing and Imputation Procedures: Edit-ing: Editing is a process that ensuressurvey data are accurate, complete, andconsistent. Efforts are made at allphases of collection, processing, andtabulation to minimize errors. Althoughsome edits are built into the Internetdata collection instrument and the dataentry programs, the majority of theedits are performed after the case hasbeen loaded into the Census Bureau’sdatabase. Edits consist primarily of twotypes: consistency and a ratio of the cur-rent year’s reported value to the prioryear’s value. The consistency editscheck the logical relationships of dataitems reported on the form. Forexample, if a value exists for employeesfor a function then a value must existfor payroll also. If part-time employeesand payroll are reported then part-timehours must be reported and vice versa.The current year/prior year edits com-pare data for the number of employees,the function reported for the employees,and the average salary between report-ing years. If data falls out of acceptabletolerance levels, the item is flagged forreview. Some additional checks aremade comparing data from the AnnualFinance Survey to data reported on theAnnual Survey of Government Employ-ment and Payroll to verify that employ-ees reported on the Annual Survey ofGovernment Employment and Payroll at

906 Appendix III

U.S. Census Bureau, Statistical Abstract of the United States: 2009

a particular function have a correspond-ing expenditure on the Finance Survey.For both types of edits, the edit resultsare reviewed by analysts and adjustedwhen needed. When the analyst isunable to resolve or accept the edit fail-ure, contact is made with the respond-ent to verify or correct the reporteddata.

Imputation: Not all respondents answerevery item on the questionnaire. Thereare also questionnaires that are notreturned despite efforts to gain aresponse. Imputation is the process offilling in missing or invalid data withreasonable values in order to have acomplete data set. For general purposegovernments and for schools, the impu-tations were based on recent historicaldata from either a prior year annual sur-vey or the most recent Census of Gov-ernments, if it was available. These datawere adjusted by a growth rate that wasdetermined by the growth of units thatwere similar (in size, geography, andtype of government) to the nonrespon-dent. If there was no recent historicaldata available, the imputations werebased on the data from a randomlyselected donor that was similar to thenonrespondent. This donor’s data wasadjusted by dividing each data item bythe population (or enrollment) of thedonor and multiplying the result by thenonrespondent’s population (or enroll-ment). For special districts, if prior yeardata are available, the data are broughtforward with a national level growthrate applied. Otherwise, the data areimputed to be zero. In cases wheregood secondary data sources exist, thedata from those sources were used.

Estimates of Sampling Error: Estimatedrelative standard errors for all variablesare given in tabulations on the Web site.For U.S. and state-and-local government-level estimates of total full-time equiva-lents and total payroll, most relativestandard errors are generally less thanone percent, but vary considerably fordetailed characteristics.

Other (nonsampling) Errors: Althoughevery effort is made in all phases of col-lection, processing, and tabulation tominimize errors, the sample data aresubject to nonsampling errors such asinability to obtain data for every variable

from all units in the sample, inaccura-cies in classification, response errors,misinterpretation of questions, mistakesin keying and coding, and coverageerrors. These same errors may be evi-dent in census collections and mayaffect the Census of Governments dataused to adjust the sample during theestimation phase and used in the impu-tation process.

Sources of Additional Material:<http://www.census.gov/govs/www/apes.html> and <http://www.census.gov/govs/www/apesstl06.html>.

Annual Survey of GovernmentFinances

Universe, Frequency, and Types of Data:The United States Census Bureau con-ducts an Annual Survey of GovernmentFinances, as authorized by law underTitle 13, United States Code, Section182. Alternatively, every five years, inyears ending in a ‘2’ or ‘7,’ a Census ofGovernments, including a Finance por-tion, is conducted under Title 13, Sec-tion 161. The survey coverage includesall state and local governments in theUnited States. For both the Census andannual survey, the finance detail data isequivalent, encompassing the entirerange of government finance activities—revenue, expenditure, debt, andassets.

Type of Data Collection Operations: Thedata collection phase for the annual sur-vey made use of three methods toobtain data: mail canvass, internet col-lection, and central collection from statesources. In 28 states, all or part of thegeneral purpose finance data for localgovernments was obtained from coop-erative arrangements between the Cen-sus Bureau and a state governmentagency. These usually involved a datacollection effort carried out to meet theneeds of both agencies—the stateagency for purposes of audit, oversight,or information, and the Census Bureaufor statistical purposes. Data for the bal-ance of local governments in this annualsurvey were obtained via mail question-naires sent directly to county, munici-pal, township, special district, andschool district governments. School dis-trict data were collected via cooperativearrangements with state education

Appendix III 907

U.S. Census Bureau, Statistical Abstract of the United States: 2009

agencies. Data for state governmentswere compiled by analysts of the Cen-sus Bureau, usually with the cooperationand assistance of state officials. Thedata were compiled from state govern-ment audits, budgets, and other finan-cial reports, either in printed or elec-tronic format. The compilation generallyinvolved recasting the state financialrecords into the classification categoriesused for reporting by the CensusBureau.

Data Collection and Imputation Proce-dures: Survey is conducted by mail withmail follow-ups of nonrespondents.Imputation for all nonresponse items isbased on previous year reports or, fornew governments, on data from similardonors.

Estimates of Sampling Error: The localgovernment statistics in this survey aredeveloped from a sample survey. There-fore, the local totals, as well as aggre-gates of state and local governmentdata, are considered estimated amountssubject to sampling error. State govern-ment finance data are not subject tosampling. Consequently, state-localaggregates shown here are more reli-able (on a relative standard error basis)than the local government estimatesthey include. Estimates of major UnitedStates totals for local governments aresubject to a computed sampling variabil-ity of less than one-half of 1 percent.State and local government totals aregenerally subject to sampling variabilityof less than 3 percent.

Other (nonsampling) Errors: The esti-mates are also subject to the inaccura-cies in classification, response, andprocessing. Efforts were made at allphases of collection, processing, andtabulation to minimize errors. However,the data are still subject to errors fromestimating for missing data, errors frommisreported data, errors from miscod-ing, and difficulties in identifying everyunit that should be included in thereport. Every effort was made to keepsuch errors to a minimum through carein examining, editing, and tabulating thedata reported by government officials.

Sources of Additional Material:<http://www.census.gov/govs/www/financegen.html> and <http://www

.census.gov/govs/www/05censustechdoc.html>.

Annual Survey ofManufactures (ASM)

Universe, Frequency, and Types of Data:The Annual Survey of Manufactures isconducted annually, except for yearsending in ‘2’ and ‘7’ for all manufactur-ing establishments having one or morepaid employees. The purpose of theASM is to provide key intercensal mea-sures of manufacturing activity, prod-ucts, and location for the public andprivate sectors. The ASM provides statis-tics on employment, payroll, workerhours, payroll supplements, cost ofmaterials, value added by manufactur-ing, capital expenditures, inventories,and energy consumption. It also pro-vides estimates of value of shipmentsfor 1,800 classes of manufactured prod-ucts.

Type of Data Collection Operation: TheASM includes approximately 50,000establishments selected from the censusuniverse of 346,000 manufacturingestablishments. Approximately 24,000large establishments are selected withcertainty, and the remaining 26,000other establishments are selected withprobability proportional to a compositemeasure of establishment size. The sur-vey is updated from two sources: Inter-nal Revenue Service administrativerecords are used to include new single-unit manufacturers and the CompanyOrganization Survey identifies newestablishments of multiunit forms.

Data Collection and Imputation Proce-dures: Survey is conducted by mail withphone and mail follow-ups of nonre-spondents. Imputation (for all nonre-sponse items) is based on previous yearreports, or for new establishments insurvey, on industry averages.

Estimates of Sampling Error: Estimatedrelative standard errors for number ofemployees, new expenditures, and forvalue added totals are given in annualpublications. For U.S. level industry sta-tistics, most estimated relative standarderrors are 2 percent or less, but varyconsiderably for detailed characteristics.

Other (nonsampling) Errors: The unitresponse rate is about 85 percent. Non-sampling errors include those due to

908 Appendix III

U.S. Census Bureau, Statistical Abstract of the United States: 2009

collection, reporting, and transcriptionerrors, many of which are correctedthrough computer and clerical checks.

Sources of Additional Material: U.S. Cen-sus Bureau, Annual Survey of Manufac-tures, and Technical Paper 24.

Census of PopulationUniverse, Frequency, and Types of Data:

Complete count of U.S. population con-ducted every 10 years since 1790. Dataobtained on number and characteristicsof people in the U.S.

Type of Data Collection Operation: In the1990, and 2000 censuses the 100 per-cent items included: age, date of birth,sex, race, Hispanic origin, and relation-ship to householder. In 1980, approxi-mately 19 percent of the housing unitswere included in the sample; in 1990and 2000, approximately 17 percent.

Data Collection and Imputation Proce-dures: In 1980, 1990, and 2000, mailquestionnaires were used extensivelywith personal interviews in the remain-der. Extensive telephone and personalfollow-up for nonrespondents was donein the censuses. Imputations were madefor missing characteristics.

Estimates of Sampling Error: Samplingerrors for data are estimated for allitems collected by sample and vary bycharacteristic and geographic area. Thecoefficients of variation (CVs) fornational and state estimates are gener-ally very small.

Other (nonsampling) Errors: Since 1950,evaluation programs have been con-ducted to provide information on themagnitude of some sources of nonsam-pling errors such as response bias andundercoverage in each census. Resultsfrom the evaluation program for the1990 census indicated that the esti-mated net undercoverage amounted toabout 1.5 percent of the total residentpopulation. For Census 2000, the evalu-ation program indicated a net overcountof 0.5 percent of the resident popula-tion.

Sources of Additional Material: U.S. Cen-sus Bureau, The Coverage of Populationin the 1980 Census, PHC80-E4; ContentReinterview Study: Accuracy of Data forSelected Population and Housing Char-acteristics as Measured by Reinterview,

PHC80-E2; 1980 Census of Population,Vol. 1, (PC80-1), Appendixes B, C, andD. Content Reinterview Survey: Accuracyof Data for Selected Population andHousing Characteristics as Measured byReinterview, 1990, CPH-E-1; Effective-ness of Quality Assurance, CPH-E-2; Pro-grams to Improve Coverage in the 1990Census, 1990, CPH-E-3. For Census2000 evaluations, see <http://www.census.gov/pred/www>.

County Business PatternsUniverse, Frequency, and Types of Data:

County Business Patterns is an annualtabulation of basic data items extractedfrom the Business Register, a file of allknown single- and multilocationemployer companies maintained andupdated by the U.S. Census Bureau.Data include number of establishments,number of employees, first quarter andannual payrolls, and number of estab-lishments by employment size class.Data are excluded for self-employedindividuals, private households, railroademployees, agricultural productionworkers, and most government employ-ees.

Type of Data Collection Operation: Theannual Company Organization Surveyprovides individual establishment datafor multilocation companies. Data forsingle establishment companies areobtained from various Census Bureauprograms, such as the Annual Survey ofManufactures and Current Business Sur-veys, as well as from administrativerecords of the Internal Revenue Service,the Social Security Administration, andthe Bureau of Labor Statistics.

Estimates of Sampling Error: Not appli-cable.

Other (nonsampling) Errors: The data aresubject to nonsampling errors, such asinability to identify all cases in the uni-verse; definition and classification diffi-culties; differences in interpretation ofquestions; errors in recording or codingthe data obtained; and estimation ofemployers who reported too late to beincluded in the tabulations and forrecords with missing or misreporteddata.

Sources of Additional Materials: U. S. Cen-sus Bureau, County Business Patterns

Appendix III 909

U.S. Census Bureau, Statistical Abstract of the United States: 2009

Current Population Survey(CPS)

Universe, Frequency, and Types of Data:Nationwide monthly sample designedprimarily to produce national and stateestimates of labor force characteristicsof the civilian noninstitutionalized popu-lation 16 years of age and older.

Type of Data Collection Operation: Multi-stage probability sample that currentlyincludes 72,000 households from 824sample areas. Sample size increased insome states to improve data reliabilityfor those areas on an annual averagebasis. A continual sample rotation sys-tem is used. Households are in sample 4months, out for 8 months, and in for 4more. Month-to-month overlap is 75percent; year-to-year overlap is 50 per-cent.

Data Collection and Imputation Proce-dures: For first and fifth months that ahousehold is in sample, personal inter-views; other months, approximately 85percent of the data collected by phone.Imputation is done for item nonre-sponse. Adjustment for total nonre-sponse is done by a predefined clusterof units, by MSA size and residence; foritem nonresponse imputation varies bysubject matter.

Estimates of Sampling Error: The nationaltotal estimates of the civilian labor forceand of employment have monthly CVsof about .2 percent and annual averageCVs of about .125 percent. Unemploy-ment is a much smaller characteristicand consequently has substantiallylarger CVs than the civilian labor forceor employment. The national unemploy-ment rate, the most important CPS sta-tistic, has a monthly CV of about 2percent and an annual average CV ofabout 1 percent. The CVs for states varysince more populous states have largersamples (and smaller CVs) than stateswith smaller populations. Assuming a 6percent unemployment rate, the small-est states have monthly CVs of about 17percent and annual average CVs ofabout 8 percent. The estimated CVs forfamily income and poverty rate for allpersons in 2005 are .4 percent and 1.2percent, respectively. CVs for subna-tional areas, such as states, tend to belarger and vary by area.

Other (nonsampling) Errors: Estimates ofresponse bias on unemployment areavailable. Estimates of unemploymentrate from reinterviews range from −2.4percent to 1.0 percent of the basic CPSunemployment rate (over a 30-monthspan from January 2004 through June2006). Eligible CPS households areapproximately 82 percent of theassigned households, with a corre-sponding response rate of 91 percent.

Sources of Additional Material: U.S. Cen-sus Bureau and Bureau of Labor Statis-tics, Current Population Survey: Designand Methodology, (Technical Paper 66),available on the Internet <http://www.census.gov/prod/2006pubs/tp-66.pdf>and the Bureau of Labor Statistics,<http://www.bls.gov/cps/> and the BLSHandbook of Methods, Chapter 1, avail-able on the Internet at <http://www.bls.gov/opub/hom/homch1a.htm>.

Foreign Trade—ExportStatistics

Universe, Frequency, and Types of Data:The export declarations collected byU.S. Bureau of Customs and Border Pro-tection are processed each month toobtain data on the movement of U.S.merchandise exports to foreign coun-tries. Data obtained include value, quan-tity, and shipping weight of exports bycommodity, country of destination, dis-trict of exportation, and mode of trans-portation.

Type of Data Collection Operation: Ship-per’s Export Declarations (paper andelectronic) are generally required to befiled for the exportation of merchandisevalued over $2,500. U.S. Bureau of Cus-toms and Border Protection officials col-lect and transmit the documents to theCensus Bureau on a flow basis for datacompilation. Data for shipments valuedunder $2,501 are estimated, based onestablished percentages of individualcountry totals.

Data Collection and Imputation Proce-dures: Statistical copies of Shipper’sExport Declarations are received on adaily basis from ports throughout thecountry and subjected to a monthly pro-cessing cycle. They are fully processedto the extent they reflect items valuedover $2,500. Estimates for shipments

910 Appendix III

U.S. Census Bureau, Statistical Abstract of the United States: 2009

valued at $2,500 or less are made,based on established percentages ofindividual country totals.

Estimates of Sampling Error: Not appli-cable.

Other (nonsampling) Errors: The goodsdata are a complete enumeration ofdocuments collected by the U.S. Bureauof Customs and Border Protection andare not subject to sampling errors; butthey are subject to several types of non-sampling errors. Quality assurance pro-cedures are performed at every stage ofcollection, processing and tabulation;however the data are still subject to sev-eral types of nonsampling errors. Themost significant of these include report-ing errors, undocumented shipments,timeliness, data capture errors, anderrors in the estimation of low-valuedtransactions. Additional information onerrors affecting export data can befound at <http://www.census.gov/foreign-trade/Press-Release/currentpressrelease/explain.pdf>.

Sources of Additional Material: U.S. Cen-sus Bureau, FT 900 U.S. InternationalTrade in Goods and Services, FT 895U.S. Trade with Puerto Rico and U.S. Pos-sessions, FT 920 U.S. Merchandisetrade: selected highlights, and Informa-tion Section on Goods and Services at<http://www.census.gov/ft900>.

Foreign Trade—ImportStatistics

Universe, Frequency, and Types of Data:The import entry documents collectedby U.S. Bureau of Customs and BorderProtection are processed each month toobtain data on the movement of mer-chandise imported into the UnitedStates. Data obtained include value,quantity, and shipping weight by com-modity, country of origin, district ofentry, and mode of transportation.

Type of Data Collection Operation: Importentry documents, either paper or elec-tronic, are required to be filed for theimportation of goods into the UnitedStates valued over $2,000 or for articleswhich must be reported on formalentries. U.S. Bureau of Customs and Bor-der Protection officials collect and trans-mit statistical copies of the documentsto the Census Bureau on a flow basis for

data compilation. Estimates for ship-ments valued under $2,001 and notreported on formal entries are based onestimated established percentages forindividual country totals.

Data Collection and Imputation Proce-dures: Statistical copies of import entrydocuments, received on a daily basisfrom ports of entry throughout thecountry, are subjected to a monthly pro-cessing cycle. They are fully processedto the extent they reflect items valued at$2,001 and over or items which must bereported on formal entries.

Estimates of Sampling Error: Not appli-cable.

Other (nonsampling) Errors: The goodsdata are a complete enumeration ofdocuments collected by the U.S. Bureauof Customs and Border Protection andare not subject to sampling errors; butthey are subject to several types of non-sampling errors. Quality assurance pro-cedures are performed at every stage ofcollection, processing, and tabulation;however the data are still subject to sev-eral types of nonsampling errors. Themost significant of these include report-ing errors, undocumented shipments,timeliness, data capture errors, anderrors in the estimation of low-valuedtransactions. Additional information onerrors affecting import data can befound at <http://www.census.gov/foreign-trade/Press-Release/currentpressrelease/explain.pdf>.

Sources of Additional Material: U.S. Cen-sus Bureau, FT 900 U.S. InternationalTrade in Goods and Services, FT 895U.S. Trade with Puerto Rico and U.S. Pos-sessions, FT920 U.S. Merchandise Trade:selected highlights, and Information Sec-tion on Goods and Services at<http://www.census.gov/ft900>.

Monthly Retail Trade andFood Service Survey

Universe, Frequency, and Types of Data:Provides monthly estimates of retail andfood service sales by kind of businessand end of month inventories of retailstores.

Type of Data Collection Operation: Prob-ability sample of all firms from a listframe. The list frame is the Bureau’sBusiness Register updated quarterly for

Appendix III 911

U.S. Census Bureau, Statistical Abstract of the United States: 2009

recent birth Employer Identification (EI)Numbers issued by the Internal RevenueService and assigned a kind of businesscode by the Social Security Administra-tion. The largest firms are includedmonthly; a sample of others is includedevery month also.

Data Collection and Imputation Proce-dures: Data are collected by mail ques-tionnaire with telephone follow-ups andfax reminders for nonrespondents.Imputation is made for each nonre-sponse item and each item failing editchecks.

Estimates of Sampling Error: For the 2006monthly surveys, CVs are about 0.4 per-cent for estimated total retail sales and0.7 percent for estimated total retailinventories. Sampling errors are shownin monthly publications.

Other (nonsampling) Errors: Imputationrates are about 22 percent for monthlyretail and food service sales, and 29 per-cent for monthly retail inventories.

Sources of Additional Material: U.S. Cen-sus Bureau, Current Business Reports,Annual Revision of Monthly Retail andFood Services: Sales and Inventories.

Monthly Survey ofConstruction

Universe, Frequency, and Types of Data:Survey conducted monthly of newly con-structed housing units (excludingmobile homes). Data are collected onthe start, completion, and sale of hous-ing. (Annual figures are aggregates ofmonthly estimates.)

Type of Data Collection Operation: A mul-tistage probability sample of approxi-mately 900 of the 20,000 permit-issuingjurisdictions in the U.S. was selected.Each month in each of these permitoffices, field representatives list andselect a sample of permits for which tocollect data. To obtain data in areaswhere building permits are not required,a multistage probability sample of 70land areas (census tracts or subsectionsof census tracts) was selected. All roadsin these areas are canvassed and dataare collected on all new residential con-struction found. Sampled buildings arefollowed up until they are completed(and sold, if for sale).

Data Collection and Imputation Proce-dures: Data are obtained by telephoneinquiry and/or field visit. Nonresponse/undercoverage adjustment factors areused to account for late reported data.

Estimates of Sampling Error: EstimatedCV of 3 percent to 4 percent for esti-mates of national totals of units started,but may be higher than 20 percent forestimated totals of more detailed char-acteristics, such as housing units in mul-tiunit structures.

Other (nonsampling) Errors: Responserate is over 90 percent for most items.Nonsampling errors are attributed todefinitional problems, differences ininterpretation of questions, incorrectreporting, inability to obtain informationabout all cases in the sample, and pro-cessing errors.

Sources of Additional Material: All dataare available on the Internet at<http://www.census.gov/const/www/newressalesindex.html> or <http://www.census.gov/const/www/newsresconstindex.html>. Furtherdocumentation of the survey is alsoavailable at those sites.

Nonemployer Statistics

Universe, Frequency, and Types of Data:Nonemployer statistics are an annualtabulation of economic data by industryfor active businesses without paidemployees that are subject to federalincome tax. Data showing the numberof firms and receipts by industry areavailable for the U.S., states, counties,and metropolitan areas. Most types ofbusinesses covered by the CensusBureau’s economic statistics programsare included in the nonemployer statis-tics. Tax-exempt and agricultural-production businesses are excludedfrom nonemployer statistics.

Type of Data Collection Operation: Theuniverse of nonemployer firms is cre-ated annually as a byproduct of the Cen-sus Bureau’s Business Register process-ing for employer establishments. If abusiness is active but without paidemployees, then it becomes part of thepotential nonemployer universe. Indus-try classification and receipts are avail-able for each potential nonemployer

912 Appendix III

U.S. Census Bureau, Statistical Abstract of the United States: 2009

business. These data are obtained pri-marily from the annual business incometax returns of the Internal Revenue Serv-ice (IRS). The potential nonemployer uni-verse undergoes a series of complexprocessing, editing, and analyticalreview procedures at the Census Bureauto distinguish nonemployers fromemployers, and to correct and completedata items used in creating the datatables.

Estimates of Sampling Error: Not appli-cable.

Other (nonsampling) Errors: The data aresubject to nonsampling errors, such asindustry misclassification as well aserrors of response, keying, nonreport-ing, and coverage.

Sources of Additional Material: U. S. Cen-sus Bureau, Nonemployer Statistics

Service Annual SurveyUniverse, Frequency, and Types of Data:

The U.S. Census Bureau conducts theService Annual Survey to provide nation-wide estimates of revenues andexpenses for selected service industries.Estimates are summarized by industryclassification based on the 2002 NorthAmerican Industry Classification System(NAICS). Selected service industries cov-ered by the Service Annual Surveyinclude all or part of the following NAICSsectors: Transportation and Warehous-ing (NAICS 48−49); Information (NAICS51); Finance and Insurance (NAICS 52);Real Estate and Rental and Leasing(NAICS 53); Professional, Scientific, andTechnical Services (NAICS 54); Adminis-trative and Support and Waste Manage-ment and Remediation Services (NAICS56); Health Care and Social Assistance(NAICS 62); Arts, Entertainment, andRecreation (NAICS 71); and Other Ser-vices, except Public Administration(NAICS 81). Data collected include totalrevenue, total expenses, detailedexpenses, revenue from e-commercetransactions; and for selected industries,revenue from detailed service products,revenue from exported services, andinventories. For industries with a signifi-cant nonprofit component, separate esti-mates are developed for taxable firmsand firms and organizations exempt

from federal income taxes. Question-naires are mailed in January and requestannual data for the prior year. Estimatesare published approximately 12 monthsafter the initial survey mailing.

Type of Data Collection Operation: TheService Annual Survey estimates aredeveloped from a probability sample ofemployer firms and administrativerecords for nonemployers. ServiceAnnual Survey questionnaires are mailedto a probability sample that is periodi-cally reselected from a universe of firmslocated in the United States and havingpaid employees. The sample includesfirms of all sizes and covers both tax-able firms and firms exempt from fed-eral income taxes. Updates to thesample are made on a quarterly basis toaccount for new businesses. Firms with-out paid employees, or nonemployers,are included in the estimates throughimputation and/or administrativerecords data provided by other federalagencies. Links to additional informationabout confidentiality protection, sam-pling error, nonsampling error, sampledesign, definitions, and copies of thequestionnaires may be found on theInternet at <http://www.census.gov/econ/www/servmenu.html>.

Estimates of Sampling Error: Coefficientsof variation (CVs) for the 2005 ServiceAnnual Survey estimates range from 0.4percent to 1.8 percent for total revenueestimates computed at the NAICS sector(2-digit NAICS code) level. The full 2005Service Annual Survey results, includingcoefficients of variations (CVs), can befound at <http://www.census.gov/econ/www/servmenu.html>. Links to addi-tional information regarding samplingerror may be found at: <http://www.census.gov/svsd/www/cv.html>.

Other (Nonsampling) Errors: Data areimputed for unit nonresponse, item non-response, and for reported data thatfails edits. The percent of imputed datafor total revenue for the 2005 ServiceAnnual Survey is approximately 9 per-cent.

Sources of Additional Material: U.S. Cen-sus Bureau, Current Business Reports,Service Annual Survey, Census BureauWeb site: <http://www.census.gov/econ/www/servmenu.html>.

Appendix III 913

U.S. Census Bureau, Statistical Abstract of the United States: 2009

Survey of Business Owners(SBO)

Universe, Frequency, and Types of Data:The Survey of Business Owners (SBO),formerly known as the Surveys ofMinority- and Women-Owned BusinessEnterprises (SMOBE/SWOBE), providesstatistics that describe the compositionof U.S. businesses by gender, Hispanicor Latino origin, and race. Data are pre-sented for businesses owned by Ameri-can Indians and Alaska Natives, Asians,Blacks, Hispanics, Native Hawaiians andOther Pacific Islanders, and Women. AllU.S. firms operating during 2002 withreceipts of $1,000 or more, which areclassified by the North American Indus-try Classification System (NAICS) codes11 through 99, are represented, exceptfor the following: NAICS 111, 112, 4811(part), 482, 491, 525 (part), 813, 814,and 92. The lists of all firms (or sampleframes) are compiled from a combina-tion of business tax returns and datacollected on other economic censusreports. The published data include thenumber of firms, gross receipts, numberof paid employees, and annual payroll.Data are presented by industry classifi-cations and/or geographic area (states,metropolitan and micropolitan statisticalareas, counties, and corporate munici-palities (places) including cities, towns,townships, villages, and boroughs), andsize of firm (employment and receipts).

Type of Data Collection Operation: Thesurvey is based on a stratified probabil-ity sample of approximately 2.3 millionfirms from a universe of approximately23 million firms. There were 5.5 millionfirms with paid employees and 17.4 mil-lion firms with no paid employees. Thedata are based on the entire firm ratherthan on individual locations of a firm.

Data Collection and Imputation Proce-dures: Data were collected through amailout/mailback operation. Compensa-tion for missing data is addressedthrough reweighting, edit correction,and standard statistical imputationmethods.

Estimates of Sampling Error: Samplingerror is present in these estimatesbecause they are based on the results ofa sample survey and not on an enumera-tion of the entire universe. Since these

estimates are based on a probabilitysample, it is possible to estimate thesampling variability of the survey esti-mates. The standard error (SE) providesa measure of the variation. The relativestandard error or coefficient of variation(CV) provides a measure of the magni-tude of the variation relative to the esti-mate and is calculated as 100 multipliedby the ratio of the estimate to the SE.The CVs for number of firms andreceipts at the national level typicallyrange from 0 to 4 percent.

Other (nonsampling) Error: Nonsamplingerrors are attributed to many sources:inability to obtain information for allcases in the universe, adjustments tothe weights of respondents to compen-sate for nonrespondents, imputation formissing data, data errors and biases,mistakes in recording or keying data,errors in collection or processing, andcoverage problems. Explicit measures ofthe effects of these nonsampling errorsare not available. However, it is believedthat most of the important operationaland data errors were detected and cor-rected through an automated data editdesigned to review the data for reason-ableness and consistency. Quality con-trol techniques were used to verify thatoperating procedures were carried outas specified.

Sources of Additional Materials: U.S. Cen-sus Bureau, Guide to the 2002 EconomicCensus and Related Statistics; <http://www.census.gov/econ/census02/guide/index.html>.

U.S. DEPARTMENT OF EDUCATIONNational Center for Education Sta-tistics

Integrated PostsecondaryEducation Data Survey(IPEDS), Completions

Universe, Frequency, and Types of Data:Annual survey of all institutions andbranches listed in the Education Direc-tory, Colleges and Universities to obtaindata on earned degrees and other for-mal awards, conferred by field of study,level of degree, sex, and by racial/ethniccharacteristics (every other year prior to1989, then annually).

Type of Data Collection Operation: Com-plete census.

914 Appendix III

U.S. Census Bureau, Statistical Abstract of the United States: 2009

Data Collection and Imputation Proce-dures: Data are collected through a Web-based survey in the fall of every year.Missing data are imputed by using dataof similar institutions.

Estimates of Sampling Error: Not appli-cable.

Other (nonsampling Errors: For 2005−06,the response rate for degree-grantinginstitutions was 100.0 percent.

Sources of Additional Material: U.S.Department of Education, National Cen-ter for Education Statistics, Postsecond-ary Institutions in the United States: Fall2006 and Degrees and Other AwardsConferred: 2005−06. See <http://www.nces.ed.gov/ipeds/>.

National Household EducationSurveys (NHES) Program

Universe, Frequency, and Types of Data:The National Household Education Sur-veys Program is a system of telephonesurveys of the noninstitutionalized civil-ian population of the United States. Sur-veys in NHES have varying universes ofinterest depending on the particular sur-vey. Specific topics covered by each sur-vey are at the NHES Web site <http://nces.ed.gov/nhes>. A list of the sur-veys fielded as part of NHES, each uni-verse, and the years they were fielded isprovided below. 1) Adult Education—Interviews were conducted with a repre-sentative sample of civilian, noninstitu-tionalized persons age 16 and olderwho were not enrolled in grade 12 orbelow (1991, 1995, 1999, 2001, 2003,2005). 2) After-School Programs andActivities—Interviews were conductedwith parents of a representative sampleof students in grades K through 8(1999, 2001, 2005). 3) CivicInvolvement—Interviews were con-ducted with representative samples ofparents, youth, and adults (1996, 1999).4) Early Childhood Program Participation—Interviews were conducted with par-ents of a representative sample of chil-dren from birth through grade 3, withthe specific age groups varying by sur-vey year (1991, 1995, 1999, 2001,2005). 5) Parent and Family Involvementin Education—Interviews were con-ducted with parents of a representativesample of children age 3 through grade12 or in grades K through 12 depending

on the survey year (1996, 1999, 2003,and 2007 forthcoming). 6) SchoolReadiness—Interviews were conductedwith parents of a representative sampleof 3-to-7-year-old children (1993 and1999) and of 3-to-5-year-old children,not yet in kindergarten, 2007 forthcom-ing).

Type of Data Collection Operation: NHESuses telephone interviews to collectdata.

Data Collection and Imputation Proce-dures: Telephone numbers are selectedusing random digit dialing (RDD) tech-niques. Approximately 45,000 to64,000 households are contacted inorder to identify persons eligible for thesurveys. Data are collected usingcomputer-assisted telephone interview-ing (CATI) procedures. Missing data areimputed using hot-deck imputation pro-cedures.

Estimates of Sampling Error: Unweightedsample sizes range between 2,250 and55,708. The average root design effectsof the surveys in NHES range from 1.1to 1.5.

Other (nonsampling) Errors: Because ofunit nonresponse and because thesamples are drawn from householdswith telephone instead of all house-holds, nonresponse and/or coveragebias may exist for some estimates. How-ever, both sources of potential bias areadjusted for in the weighting process.Analyses of both potential sources ofbias in the NHES collections have beenstudied and no significant bias has beendetected.

Sources of Additional Material: Please seethe NHES Web site at <http://nces.ed.gov/nhes>.

Schools and Staffing Survey(SASS)

Universe, frequency and Types of Data:NCES designed the SASS survey systemto emphasize teacher demand andshortage, teacher and administratorcharacteristics, school programs, andgeneral conditions in schools. SASS alsocollects data on many other topics,including principals’ and teachers’ per-ceptions of school climate and problemsin their schools; teacher compensation;district hiring practices and basic char-acteristics of the student population.

Appendix III 915

U.S. Census Bureau, Statistical Abstract of the United States: 2009

The SASS has had four core compo-nents: the School Questionnaire, theTeacher Questionnaire, the PrincipalQuestionnaire, and the School DistrictQuestionnaire. For the 2003−04 SASS, asample of public charter schools isincluded in the sample as part of thepublic school questionnaire. Since1987−88, the SASS is the largest, mostextensive survey of K through 12 schooldistricts, schools, teachers, and adminis-trators in the U.S. Surveys have beenconducted every 3 to 4 years dependingon budgetary constraints. The SASSincludes data from public, private, andBureau of Indian Affairs school sectors.Therefore, the SASS provides a multi-tude of opportunities for analysis andreporting on elementary and secondaryeducational issues

Type of Data Collection Operation: TheU.S. Census Bureau performs the datacollection and begins by sendingadvance letters to the sampled LocalEducation Agencies (LEAs) and schoolsin August and September of collectionyears. Beginning in October, question-naires are delivered by U.S. CensusBureau field representatives. The sam-pling frame for the public school sampleis the most recent Common Core ofData (CCD) school file. CCD is a universefile that includes all elementary and sec-ondary schools in the United States.Schools operated by the Department ofDefense or those that offered only kin-dergarten or prekindergarten or adulteducation were excluded from the SASSsample. The list frame used for the pri-vate school sample is the most recentPrivate School Universe Survey (PSS) list,updated with association lists. An areaframe supplement is based on the can-vassing of private schools within spe-cific geographical areas. A separateuniverse of schools funded by theBureau of Indian Affairs (BIA) is drawnfrom the Program Education Directorymaintained by the BIA. To avoid dupli-cates in the BIA files, BIA schools in theCCD school file are treated as publicschools.

Estimates of Sampling Error: Sampleerrors can be calculated using replicateweights and Balanced Repeated Replica-tion complex survey design methodol-ogy. Errors depend on cell sizes and

range from less than 1 percent to over 5percent (for reasonable cell sizes).

Other (nonsampling) Errors: Because ofunit nonresponse, bias may exist insome sample cells. However, bias hasbeen adjusted for in the weighting pro-cess. Analysis of bias has been studiedand no significant bias has beendetected.

Sources of Additional Material: Please seethe SASS Web site at <http://nces.ed.gov/surveys/sass/>.

U.S. DEPARTMENT OF JUSTICE,FEDERAL BUREAU OFINVESTIGATION

Uniform Crime Reporting(UCR) Program

Universe, Frequency, and Types of Data:Monthly reports on the number of crimi-nal offenses that become known to lawenforcement agencies. Data are also col-lected on crimes cleared by arrest orexceptional means; age, sex, and race ofarrestees and for victims and offendersfor homicides, number of law enforce-ment employees, on fatal and nonfatalassaults against law enforcement offic-ers, and on hate crimes reported.

Type of Data Collection Operation: Crimestatistics are based on reports of crimedata submitted either directly to the FBIby contributing law enforcement agen-cies or through cooperating state UCRPrograms.

Data Collection and Imputation Proce-dures: States with UCR programs collectdata directly from individual lawenforcement agencies and forwardreports, prepared in accordance withUCR standards, to the FBI. Accuracy andconsistency edits are performed by theFBI.

Estimates of Sampling Error: Not appli-cable.

Other (nonsampling) Errors: During 2006,law enforcement agencies active in theUCR Program represented 94.2 percentof the total population. The coverageamounted to 95.6 percent of the UnitedStates population in Metropolitan Statis-tical Areas, 86.3 percent of the popula-tion in cities outside metropolitan areas,and 88.1 percent in nonmetropolitancounties.

916 Appendix III

U.S. Census Bureau, Statistical Abstract of the United States: 2009

Sources of Additional Material: U.S.Department of Justice, Federal Bureau ofInvestigation, Crime in the UnitedStates, annual, Hate Crime Statistics,annual, Law Enforcement Officers Killedand Assaulted, annual, <http://www.fbi.gov/ucr/ucr.htm>.

U.S. INTERNAL REVENUE SERVICE

Corporation Income TaxReturns

Universe, Frequency, and Types of Data:Annual study of unaudited corporationincome tax returns, Forms 1120,1120-A, 1120-F, 1120-L, 1120-PC, 1120-REIT, 1120-RIC, and 1120S, filed by cor-porations or businesses legally definedas corporations. Data provided on vari-ous financial characteristics by industryand size of total assets, and businessreceipts.

Type of Data Collection Operation: Strati-fied probability sample of approximately110,000 returns for Tax Year 2005, allo-cated to sample classes which are basedon type of return, size of total assets,size of net income or deficit, andselected business activity. Samplingrates for sample classes varied from .25percent to 100 percent.

Data Collection and Imputation Proce-dures: Computer selection of sample oftax return records. Data adjusted duringediting for incorrect, missing, or incon-sistent entries to ensure consistencywith other entries on return and to com-ply with statistical definitions.

Estimates of Sampling Error: EstimatedCVs for Tax Year 2005: Returns withassets over $10 million are self-representing. Coefficients of variationare published in the 2005 Statistics ofIncome Corporation Income TaxReturns, Table 1, by industry group.

Other (nonsampling) Errors: Nonsamplingerrors include coverage errors, process-ing errors, and response errors.

Sources of Additional Material: U.S. Inter-nal Revenue Service, Statistics ofIncome, Corporation Income TaxReturns, annual.

Individual Income TaxReturns

Universe, Frequency, and Types of Data:Annual study of unaudited individualincome tax returns, Forms 1040, 1040A,and 1040EZ, filed by U.S. citizens andresidents. Data provided on variousfinancial characteristics by size ofadjusted gross income, marital status,and by taxable and nontaxable returns.Data by state, based on the populationof returns filed, also include returnsfrom 1040NR, filed by nonresidentaliens plus certain self employment taxreturns.

Type of Data Collection Operation: Strati-fied probability sample of 292,966returns for tax year 2005. The sample isclassified into sample strata based onthe larger of total income or total lossamounts, the size of business plus farmreceipts, and other criteria such as thepotential usefulness of the return for taxpolicy modeling. Sampling rates forsample strata varied from 0.01 percentto 100 percent.

Data Collection and Imputation Proce-dures: Computer selection of sample oftax return records. Data adjusted duringediting for incorrect, missing, or incon-sistent entries to ensure consistencywith other entries on return.

Estimates of Sampling Error: EstimatedCVs for tax year 2005: Adjusted grossincome less deficit 0.08 percent; salariesand wages 0.16 percent; and taxexempt interest received 1.45 percent.(State data not subject to samplingerror.)

Other (nonsampling) Errors: Processingerrors and errors arising from the use oftolerance checks for the data.

Sources of Additional Material: U.S. Inter-nal Revenue Service, Statistics ofIncome, Individual Income Tax Returns,annual, (Publication 1304).

Partnership Income TaxReturns

Universe, Frequency, and Types of Data:Annual study of preaudited income taxreturns of partnerships related to finan-cial and tax-related activity during calen-dar years 2003 to 2006 and reported on

Appendix III 917

U.S. Census Bureau, Statistical Abstract of the United States: 2009

Forms 1065 and 1065B to the IRS in cal-endar year 2007. Data are provided byindustry, based on the NAICS industrycoding used by IRS.

Type of Data Collection Operation: Strati-fied probability sample of approximately45,000 partnership returns from a popu-lation of 3.2 million filed during calen-dar year 2007. The sample is classifiedbased on combinations of industrycode, gross receipts, net income or loss,and total assets. Sampling rates varyfrom 0.07 percent to 100 percent.

Data Collection and Imputation Proce-dures: The sample of tax return recordsare selected via computer after data aretranscribed by IRS and placed on anadministrative file. Data are manuallyadjusted during editing for incorrect,missing, or inconsistent entries toensure consistency with other entries onreturn. Data not available due to regula-tions are handled with weighting adjust-ments.

Estimates of Sampling Error: Some of theestimated Coefficients of Variation (theestimated standard error of the totaldivided by the estimated total) for taxyear 2006: For number of partnerships,0.36 percent; business receipts, 0.18percent; net income, 0.73 percent; andordinary business income; 0.53 percent.

Other (nonsampling) Errors: The potentialexists for coverage error due to unavail-able returns; processing errors; and tax-payer reporting errors, since data arepreaudit.

Sources of Additional Material: U.S. Inter-nal Revenue Service, Statistics ofIncome, Partnership Returns and Statis-tics of Income Bulletin, Vol. 27, No. 2(Fall 2007).

Sole Proprietorship IncomeTax Returns

Universe, Frequency, and Types of Data:Annual study of unaudited income taxreturns of nonfarm sole proprietorships,Form 1040 with business schedules.Data provided on various financial char-acteristics by industry.

Type of Data Collection Operation: Strati-fied probability sample of 82,689 soleproprietorships for tax year 2005. Thesample is classified based on presence

or absence of certain business sched-ules; the larger of total income or loss;size of business plus farm receipts, andother criteria such as the potential use-fulness of the return for tax policy mod-eling. Sampling rates vary from 0.1percent to 100 percent.

Data Collection and Imputation Proce-dures: Computer selection of sample oftax return records. Data adjusted duringediting for incorrect, missing, or incon-sistent entries to ensure consistencywith other entries on return.

Estimates of Sampling Error: EstimatedCVs for tax year 2005 are available. Forsole proprietorships, business receipts,0.56 percent; depreciation 1.26 percent.

Other (nonsampling) Errors: Processingerrors and errors arising from the use oftolerance checks for the data.

Sources of Additional Material: U.S. Inter-nal Revenue Service, Statistics ofIncome, Sole Proprietorship Returns (foryears 1980 through 1983) and Statisticsof Income Bulletin, Vol. 27, No. 1 (Sum-mer 2007, as well as bulletins for earlieryears).

U.S. NATIONAL CENTER FORHEALTH STATISTICS (NCHS)

National Health InterviewSurvey (NHIS)

Universe, Frequency, and Types of Data:Continuous data collection covering thecivilian noninstitutional population toobtain information on demographiccharacteristics, conditions, injuries,impairments, use of health services,health behaviors, and other health top-ics.

Type of Data Collection Operation: Multi-stage probability sample of 49,000households (in 198 PSUs) from 1985 to1994; 36−40,000 households (358design PSUs or 449 effective PSUs whendivided by state boundaries) from 1995to 2005; an estimated completed35,000 households (428 effective PSUs)beginning in 2006.

Data Collection and Imputation Proce-dures: Some missing data items (e.g.,race, ethnicity) are imputed using a hotdeck imputation value. Sequentialregression models are used to create

918 Appendix III

U.S. Census Bureau, Statistical Abstract of the United States: 2009

multiple imputation files for familyincome. Unit nonresponse is compen-sated for by an adjustment to the surveyweights.

Estimates of Sampling Error: For 2006medically attended injury episodes ratesin the past 12 months by falling for:females 49.93 (4.44), and males 38.83(4.59) per 1,000 population; for 2006injury episodes rates during the past 12months inside the home—34.89 (2.81)per 1,000 population.

Other (nonsampling) Errors: The responserate was 93.8 percent in 1996; in 2006,the total household response rate was87.3 percent, with the final familyresponse rate of 87.0 percent, and thefinal sample adult response rate of 70.8percent. (Note: the NHIS questionnairewas redesigned in 1997, and a newsample design was instituted in 2006).

Sources of Additional Material: NationalCenter for Health Statistics, SummaryHealth Statistics for the U.S. Population:National Health Interview Survey, 2006,Vital and Health Statistics, Series 10#236; National Center for Health Statis-tics, Summary Health Statistics for U.S.Children: National Health Interview Sur-vey, 2006, Vital and Health Statistics,Series 10 #234; National Center forHealth Statistics, Summary Health Statis-tics for U.S. Adults: National HealthInterview Survey, 2006, Vital and HealthStatistics, Series 10 #235; U.S. NationalCenter for Health Statistics, Design andEstimation for the National Health Inter-view Survey, 1995-2004, Vital andHealth Statistics, Series 2 #130.

National Survey of FamilyGrowth (NSFG)

Universe, Frequency, and Types of Data:Periodic survey of men and women15−44 years of age in the householdpopulation of the United States. Inter-views were conducted in 2002 in personby trained female interviewers. Inter-view topics covered include births andpregnancies, marriage, divorce, andcohabitation, sexual activity, contracep-tive use, and medical care. For men,data on father involvement with childrenwere collected. The most sensitivedata—on sexual behavior related to HIVand Sexually Transmitted Disease

risk—were collected in a self-administered form in which the data areentered into a computer.

Type of Data Collection Operation: In the2002 (Cycle 6) NSFG, the sample was amultistage area probability sample ofmen and women 15−44 years of age inthe household population of the UnitedStates. Only one person 15−44 wasselected from households with one ormore persons 15−44. Data were col-lected and entered into laptop (note-book) computers. In the self-administered portion, the respondententered his or her own answers into thecomputer. Sample included 12,571interviews. The response rate was 79percent. Hispanic and Black persons, aswell as those 15−19 years of age, weresampled at higher rates than Whiteadults. All percentages and other statis-tics shown for the NSFG are weighted tomake national estimates. The weightsadjust for the different rates of samplingfor each group, and for nonresponse.

Data Collection and Imputation Proce-dures: When interviews are received,they are reviewed for consistency andquality, and analysis variables (recodes)are created. Missing data on theserecodes were imputed using multipleregression techniques and checkedagain for consistency. Variables indicat-ing whether a value has been imputed(‘‘imputation flags’’) are included on thedata file.

Estimates of Sampling Error: Samplingerror codes are included on the data fileso that users can estimate samplingerrors for their own analyses. Samplingerror estimates for nine illustrativeanalyses are shown on the NSFG Website at <http://www.cdc.gov/nchs/nsfg.htm>. Sampling error estimatesare also shown in most NCHS reports.

Other (nonsampling) Errors: In any sur-vey, errors can occur because therespondent (the person being inter-viewed) does not recall the specific factor event being asked about. The NSFGquestionnaire in 2002 was programmedto check the consistency of many vari-ables during the interview, so that theinterviewer and respondent had achance to correct any inconsistent infor-mation. Further checking occurred after

Appendix III 919

U.S. Census Bureau, Statistical Abstract of the United States: 2009

the interview and during recoding andimputation. Typically, less than 1 per-cent of cases need imputation becauseof missing data.

Sources of Additional Material: The fol-lowing references can be found at<http://www.cdc.gov/nchs/nsfg.htm>.‘‘National Survey of Family Growth,Cycle 6: Sample Design, Weighting, andVariance Estimation.’’ Vital and HealthStatistics, Series 2, Number 142, July2006. ‘‘Plan and Operation of Cycle 6 ofthe National Survey of Family Growth.’’Vital and Health Statistics, Series 1, No.42. August 2005. ‘‘Sexual Behavior andSelected Health Measures: Men andWomen 15−44 Years of Age, UnitedStates, 2002.’’ Advance Data from Vitaland Health Statistics, No. 362, Sept 15,2005.

National Vital StatisticsSystem

Universe, Frequency, and Types of Data:Annual data on births and deaths in theUnited States.

Type of Data Collection Operation: Mortal-ity data based on complete file of deathrecords, except 1972, based on 50 per-cent sample. Natality statistics 1951−1971, based on 50 percent sample ofbirth certificates, except a 20 percent to50 percent sample in 1967, received byNCHS.

Data Collection and Imputation Proce-dures: Reports based on records fromregistration offices of all states, Districtof Columbia, New York City, PuertoRico, Virgin Islands, Guam, AmericanSamoa, and Northern Marianas.

Estimates of Sampling Error: For recentyears, there is no sampling for thesefiles; the files are based on 100 percentof events registered.

Other (nonsampling) Errors: It is believedthat more than 99 percent of the birthsand deaths occurring in this country areregistered.

Sources of Additional Material: U.S.National Center for Health Statistics,Vital Statistics of the United States, Vol. Iand Vol. II, annual, and the NationalVital Statistics Reports. See the NCHSWeb site at <http://www.cdc.gov/nchs/nvss.htm>.

National Highway Traffic SafetyAdministration (NHTSA)

Fatality Analysis ReportingSystem (FARS)

Universe, Frequency, and Types of Data:FARS is a census of all fatal motorvehicle traffic crashes that occurthroughout the United States includingthe District of Columbia and Puerto Ricoon roadways customarily open to thepublic. The crash must be reported tothe state/jurisdiction and at least onedirectly related fatality must occurwithin thirty days of the crash.

Type of Data Collection Operation: One ormore analysts, in each state, extractdata from the official documents andenter the data into a standardized elec-tronic database.

Data Collection and Imputation Proce-dures: Detailed data describing the char-acteristics of the fatal crash, the vehiclesand persons involved are obtained frompolice crash reports, driver and vehicleregistration records, autopsy reports,highway department, etc. Computerizededit checks monitor the accuracy andcompleteness of the data. The FARSincorporates a sophisticated mathemati-cal multiple imputation procedure todevelop a probability distribution ofmissing blood alcohol concentration(BAC) levels in the database for drivers,pedestrians, and cyclists.

Estimates of Sampling Error: Since this iscensus data, there are no samplingerrors.

Other (nonsampling) Errors: FARS repre-sents a census of all police-reportedcrashes and captures all data reportedat the state level. FARS data undergo arigorous quality control process to pre-vent inaccurate reporting. However,these data are highly dependent on theaccuracy of the police accident reports.Errors or omissions within police acci-dent reports may not be detected.

Sources of Additional Material: The FARSCoding and Validation Manual, ANSID16.1 Manual on Classification of MotorVehicle Traffic Accidents (Sixth Edition).

920 Appendix III

U.S. Census Bureau, Statistical Abstract of the United States: 2009