document resume kasprzyk, dan, ed. · salvucci, fan zhang, mingxiu hu, and david monaco); (5)...

Post on 03-Oct-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

DOCUMENT RESUME

ED 416 231 TM 028 111

AUTHOR Kasprzyk, Dan, Ed.TITLE Selected Papers on Education Surveys: Papers Presented at

the 1996 Meeting of the American Statistical Association.Working Paper Series.

INSTITUTION National Center for Education Statistics (ED), Washington,DC

REPORT NO NCES-WP-97-01PUB DATE 1997-02-00NOTE 80p.

AVAILABLE FROM U.S. Department of Education, Office of Educational Researchand Improvement, National Center for Education Statistics,555 New Jersey Avenue, N.W., Room 400, Washington, DC20208-5654.

PUB TYPE Numerical/Quantitative Data (110) Reports Evaluative(142)

EDRS PRICE MF01/PC04 Plus Postage.DESCRIPTORS Data Collection; *Elementary Secondary Education; *Equal

Education; Estimation (Mathematics); Interviews; Models;Private Schools; Public Schools; Qualitative Research;Questionnaires; *Research Methodology; Sampling;*Statistical Bias; *Surveys; Tables (Data); TestConstruction; Validity

IDENTIFIERS *American Statistical Association; *National Center forEducation Statistics; Schools and Staffing Survey (NCES)

ABSTRACTThe 11 papers in this volume were presented at the 1996

American Statistical Association (ASA) meeting in Chicago (Illinois), August4 through 8. This is the fourth collection of ASA papers of particularinterest to users of National Center for Education Statistics (NCES) surveydata published in the "Working Papers" series. The following are included:(1) "Teacher Quality and Educational Inequality" (Richard M. Ingersoll); (2)

"Using Qualitative Methods To Validate Quantitative Survey Instruments" (JohnE. Mullens and Daniel Kasprzyk); (3) "Revisiting the NCES Private SchoolSurvey: A Method To Design a Systematic Classification of Private Schools inthe United States" (Sylvia Kay Fisher and Daniel Kasprzyk); (4) "An Analysisof Response Rates of SASS (Schools and Staffing Survey) 1993-94" (Sameena M.Salvucci, Fan Zhang, Mingxiu Hu, and David Monaco); (5) "An Overview of NCESSurveys Reinterview Programs" (Valerie Conley, Steven Fink, and MehrdadSaba); (6) "Estimating Response Bias in an Adult Education Survey" (J.Michael Brick and David Morganstein); (7) "Optimal Periodicity of a Survey:Extensions of Probable-Error Models" (Wray Smith, Dhiren Ghosh, and MichaelChang); (8) "Estimating the Variance in the Presence of Imputation Using aResidual" (Steven Kaufman); (9) "Where Will It All End? Some Alternative SASSEstimation Research Opportunities" (Steven Kaufman and Fritz Scheuren); (10)

"Estimating State Totals from the Private School Universe Survey" (EasleyHoy, Beverley Causey, Leroy Bailey, and Steven Kaufman); and (11) "Effect ofHigh School Programs on Out-Migration of Rural Graduates" (Gary Huang,Michael P. Cohen, Stanley Weng, and Fan Zhang). Each chapter containsreferences. (Contains 3 figures and 22 tables.) (SLD)

NATIONAL CENTER FOR EDUCATION STATISTICS

Working Paper Series

Selected Papers on Education Surveys:

Papers Presented at the 1996 Meeting of theAmerican Statistical Association

Working Paper No. 97-01 February 1997

U.S. Department of EducationOffice of Educational Research and Improvement

2 I BEST COPY AVAILABLE

Selected Papers on Education Surveys:

11 Papers Presented at the 1996 Meeting of theAmerican Statistical Association

Working Paper No. 97-01 February 1997

Contact: Dan KasprzykSurveys and Cooperative Systems Group(202) 219-1588

e-mail: daniel_kasprzyk @ed.gov

U.S. Department of EducationRichard W. RileySecretary

Office of Educational Research and ImprovementMarshall S. SmithActing Assistant Secretary

National Center for Education StatisticsPascal D. Forgione, Jr.Commissioner

Surveys and Cooperative Systems GroupPaul D. PlanchonAssociate Commissioner

The National Center for Education Statistics (NCES) is the primary federal entity for collecting, analyzing,and reporting data related to education in the United States and other nations. It fulfills a congressionalmandate to collect, collate, analyze, and report full and complete statistics on the condition of educationin the United States; conduct and publish reports and specialized analyses of the meaning and significanceof such statistics; assist state and local education agencies in improving their statistical systems; and reviewand report on education activities in foreign countries.

NCES activities are designed to address high priority education data needs; provide consistent, reliable,complete, and accurate indicators of education status and trends; and report timely, useful, and high qualitydata to the U.S. Department of Education, the Congress, the states, other education policymakers,practitioners, data users, and the general public.

We strive to make our products available in a variety of formats and in language that is appropriate to avariety of audiences. You, as our customer, are the best judge of our success in communicatinginformation effectively. If you have any comments or suggestions about this or any other NCES productor report, we would like to hear from you. Please direct your comments to:

National Center for Education StatisticsOffice of Educational Research and ImprovementU.S. Department of Education555 New Jersey Avenue, NWWashington, DC 20208

Suggested Citation

U.S. Department of Education. National Center for Education Statistics. Selected Papers on Education Surveys: Papers Presentedat the 1996 Meeting of the American Statistical Association, Working Paper No. 97-01. Project Officer, Dan Kasprzyk.Washington, D.C.: 1997.

February 1997

4

Foreword

Each year a large number of written documents are generated by NCES staff andindividuals commissioned by NCES which provide preliminary analyses of survey results andaddress technical, methodological, and evaluation issues. Even though they are not formallypublished, these documents reflect a tremendous amount of unique expertise, knowledge, andexperience.

The Working Paper Series was created in order to preserve the information containedin these documents and to promote the sharing of valuable work experience and knowledge.However, these documents were prepared under different formats and did not undergo vigorousNCES publication review and editing prior to their inclusion in the series. Consequently, weencourage users of the series to consult the individual authors for citations.

To receive information about submitting manuscripts or obtaining copies of the series,please contact Ruth R. Harris at (202) 219-1831 or U.S. Department of Education, Office ofEducational Research and Improvement, National Center for Education Statistics, 555 NewJersey Ave., N.W., Room 400, Washington, D.C. 20208-5654.

Susan AhmedChief Mathematical StatisticianStatistical Standards and

Services Group

Samuel S. PengDirectorMethodology, Training, and Customer

Service Program

D

p

ID

Table of Contents

Foreword iii

Preface vii

Developing Questionnaires for Education Surveys

Chair: Theresa J. De MaioBureau of the Census

"Teacher Quality and Educational Inequality" 1

Richard M. Ingersoll, University of Georgia

"Using Qualitative Methods to Validate Quantitative Survey Instruments" 7John E. Mullens, Policy Studies AssociatesDaniel Kasprzyk, National Center for Education Statistics

"Revising the NCES Private School Survey: A Method to Design a Systematic Classification ofPrivate Schools in the United States" 13

Sylvia Kay Fisher, Bureau of Labor StatisticsDaniel Kasprzyk, National Center for Education Statistics

Data Quality in Education Surveys

P Chair: Paul D. PlanchonNational Center for Education Statistics

II

P

"An Analysis of Response Rates of SASS 1993-94" 19Sameena M. Salvucci, Fan Zhang, Mingxiu Hu, and David Monaco, Synectics for

Management Decisions, Inc.Kerry Gruber, National Center for Education Statistics

"An Overview of NCES Surveys Reinterview Programs" 25Valerie Conley, Steven Fink, and Mehrdad Saba, Synectics for Management Decisions, Inc.Steven Kaufman, National Center for Education Statistics

"Estimating Response Bias in an Adult Education Survey" 31J. Michael Brick and David Morganstein, Westat, Inc.

Design and Estimation in School-Based Surveys

Chair: Marilyn M. McMillenNational Center for Education Statistics I

"Optimal Periodicity of a Survey: Extensions of Probable-Error Models" 37Wray Smith, Dhiren Ghosh, and Michael Chang, Synectics for Management Decisions, Inc.

"Estimating the Variance in the Presence of Imputation Using a Residual" 45Steven Kaufman, National Center for Education Statistics

"Where Will It All End? Some Alternative SASS Estimation Research Opportunities" 51Steven Kaufman, National Center for Education StatisticsFritz Scheuren, The George Washington University

"Estimating State Totals from the Private School Universe Survey" 57Easley Hoy, Beverley Causey, and Leroy Bailey, Bureau of the CensusSteven Kaufman, National Center for Education Statistics

Policy Analysis with Education and Defense Manpower Survey Data

Chair: John L. CzajkaMathematica Policy Research, Inc.

"Effect of High School Programs on Out-Migration of Rural Graduates" 63Gary Huang, Synectics for Management Decisions, Inc.Michael P. Cohen, National Center for Education StatisticsStanley Weng, and Fan Zhang, Synectics for Management Decisions, Inc.

I

I

II

Preface

The 11 papers contained in this volume were presented at the 1996 American StatisticalAssociation (ASA) meeting in Chicago, Illinois (August 4-8). This is the fourth collection of ASApapers of particular interest to users of NCES survey data published in the Working Papers Series.The earlier collections were Working Paper 94-01, which included papers presented at ASA meetingsin August 1992 and August 1993 and the ASA Conference on Establishment Surveys in June 1993,Working Paper 95-01, which included papers from the 1994 ASA meeting, and Working Paper 96-02, which included papers from the 1995 ASA meeting.

The papers presented at the ASA meetings usually summarize ongoing research. In somecases, NCES publishes more complete reports on the topics. Readers who wish to learn more aboutthe topics discussed in this working paper may wish to examine these technical reports:

NCES Reports

II

I

An Analysis of Response Rates in the 1993-94 Schools and Staffing Survey (forthcoming)An Exploratory Analysis of Response Rates in the 1990-91 Schools and Staffing Survey (NCES96-089)Design Effects and Generalized Variance Functions for the 1990-91 Schools and StaffingSurveys (SASS) Volume I--User's Manual (NCES 95-342-I)Design Effects and Generalized Variance Functions for the 1990-91 Schools and StaffingSurveys (SASS) Volume 11--Technical Report (NCES 95-342-II)Quality Profile for SASS: Aspects of the Quality of Data in the Schools and Staffing Surveys(NCES 94-340)

Adjusting for Coverage Bias Using Telephone Service Interruption Data (NCES 97-336)Use of Cognitive Laboratories and Recorded Interviews in the National Household EducationSurvey (NCES 96-332)

NCES Measurement Error Programs (NCES 97-464; forthcoming)

Michael Garet's ASA paper, "Developing a National Survey of Private School Expenditures," couldnot be included in this collection, but for those interested in this topic, Strategies for CollectingFinance Data from Private Schools (Working Paper 96-16) is available. A complete list of WorkingPapers to date can be found at the end of this publication.

TEACHER QUALITY AND EDUCATIONAL INEQUALITY

Richard M. Ingersoll, University of GeorgiaDepartment of Sociology, UGA, Athens, GA 30602

KEY WORDS: NCES data; teacher quality,educational inequality

Introduction

Do all secondary students in the U.S. have equalaccess to qualified teachers? Are students in low-incomeschools more likely to be taught by teachers without basicqualifications, in their assigned teaching fields, than thosein more affluent schools? Do schools servingpredominantly minority student populations tend to haveless qualified faculties? Moreover, are there differences inaccess to qualified teachers across different types ofstudents and different types of classrooms within schools?

Over the past several decades, equality has beenone the most fimdamental concerns of education policy andresearch in the U.S. The focus of a vast amount of researchand reform in education has been to uncover and addressdisparities in the resources and opportunities in educationprovided to students from different socio-economicbackgrounds (e.g. Coleman et al. 1966). Among the mostimportant of these educational resources is the teachingforce. The largest single component of the cost of educationin any country is teacher compensation. Moreover, teachersare, of course, a highly important part of the actualeducational process, and student educational outcomesultimately depend on the work of teachers. Indeed, it isprecisely because the teaching force is a significant resourcethat equal access to qualified teachers and quality teachinghas been a source of contention in the national debate overequality of educational opportunity.

Among those concerned with issues of educationalequality, it is widely believed that students fromdisadvantaged backgrounds do not have equal access toqualified teachers. A number of critics have argued that,indeed, the most needy students in the U.S. -- those frompoor, minority, and disadvantaged communities -- are taughtby the least qualified teachers (e.g., Darling-Hammond1987; Kozol 1991; Oakes 1990). These critics argue thatlow-income and high-minority schools are unable to offercompetitive salaries, benefits or resources and, hence,simply cannot compete for the available supply of trainedteachers. In this view, unequal access to qualified teachersand, hence, to quality teaching, is one of the key reasons forunequal results in student educational outcomes.

These critics argue, moreover, that patterns ofunequal access to quality teachers also appear withinschools. Not only do students in low-income andpredominantly minority schools have less access to qualified

1

staff, but, the critics add, low-income and minority students,when in affluent schools, also have less access to the bestteachers. The latter is due to the practice of separatingstudents and teachers by purported ability -- the system oftracking. In this view, minority and poor students aredisproportionately placed in lower track and lowerachievement courses, which these critics further claim, aretaught by the least qualified teachers.

Despite the importance of this debate oneducational equality and the widespread belief that schools,programs, and classes serving low-income and minoritystudent populations have less access to quality teaching,there has actually not been much empirical research done onthis issue, especially at the national level. One of thereasons for this dearth of research is the difficulty involvedin obtaining data on the underlying issue of importance - thedegree of actual exposure to quality teachers and qualityteaching provided to students in classrooms. Assessing thecaliber of teachers' classroom performance and the degreeto which students have access to quality teaching inclassrooms is a difficult empirical task because there is littleconsensus concerning both how to define and how to bestmeasure quality teachers and teaching (e.g., Haney et al.1987; Ingersoll 1996a). As a result, researchers typicallyturn to what is more easily assessed and more readilyavailable - measures of teacher qualifications.

Although the qualifications of teachers - such astheir education, training, and preparation are only indirectmeasures of the quality of teaching that students receive,they provide useful information on this importanteducational resource.

Education and training are essential ingredients ofquality teachers and quality teaching. There is almostuniversal agreement that one of the most importantcharacteristics of a quality teacher is preparation in thesubject or field in which the teacher is teaching. Researchhas shown moderate but consistent support for thereasonable proposition that subject knowledge (knowingwhat to teach) and teaching skills (knowing how to teach)are important predictors of both teaching quality and studentlearning (for reviews of this research see: Shavelson et al.1989; Darling-Hammond and Hudson 1990; Murnane andRaizen 1988). Knowledge of subject matter and ofpedagogical methods do not, of course, guarantee qualityteachers nor quality teaching, but they are necessaryprerequisites.

The argument for the necessity of education insubject knowledge is especially clear for the secondaryschool level. First, at the secondary level, teachers are

9

divided by fields into departments; faculties are thus morespecialized than in elementary schools, and therefore thedifferences between fields are more distinct and, perhaps,greater. Moreover, the level of mastery needed to teachdifferent subjects is higher at the secondary school level,and therefore a clear case can be made that such teachersought to have adequate substantive background in thesubjects they teach.

In order to address fully the issue of access toqualified teachers, however, it is necessary to distinguishbetween teacher training and teaching assignment. Theserepresent two distinct elements. Teacher training refers tothe quantity and quality of teacher education andpreparation. Assessments of training levels typicallyexamine whether teachers have a basic college education,licensing and expertise in a specialty field. On the otherhand, assessments of teacher qualifications also need toexamine whether the fields of training and preparation ofteachers match their teaching assignments. That is, suchassessments need to assess the extent of out-of-fieldteaching - the phenomenon of trained teachers teachingsubjects for which they have little training. It is importantto distinguish between these two elements in assessments ofteachers' qualifications because they have very differentimplications for policy. If underqualified teaching is due toinadequacies in the quantity or quality of teacher educationand preparation, it is probable the source of the problemmay lie with teacher education programs and standards. Onthe other hand, if underqualified teaching is due to highlevels of mismatch between teachers' fields of training andtheir teaching assignments, then it is probable the source ofthe problem may lie with the supply of teachers or themanagement of schools.

The problem for research on teacher qualificationshas been that there have not been the necessary data,especially at the national level, to adequately assess theextent to which teachers are assigned to teach out of theirfields. Moreover, there has been little data on the numbersof students actually taught by out-of-field teachers --information crucial to understanding disparities in studentaccess to qualified teaching.

In order to address these and other data needsconcerned with the staffing, occupational and organizationaspects of schools, in the late 1980s the National Center forEducation Statistics (NCES) designed and conducted theSchools and Staffing Survey (SASS), a major new survey ofteachers and schools. NCES has since sponsored severalprojects designed to define and assess both thequalifications of the nation's teaching force and the extentof out-of-field teaching in the U.S. (McMillen and Bobbitt1993; Bobbitt and McMillen 1995; Ingersoll 1995a,1996b). These previous analyses have shown that, in fact,out-of-field teaching is extensive in U.S. schools.Moreover, these analyses have documented that thisunderqualified teaching was not due to a lack of basic

2

education or training on the part of teachers. The source ofout-of-field teaching lay in the lack of fit between teachers'fields of training and their teaching assignments. Mostteachers have training, such as a college major, in theirmain field of assignment. But, many teachers, especially atthe secondary level, are also assigned to teach additionalcourses in fields for which they have little or no formalbackground preparation.

This article expands on this earlier work byanalyzing national data from the 1990-91 SASS to examinethe issue of disparities in student access to qualifiedteachers. Rather than enter the debate as to what constitutesa qualified teacher, quality teaching or quality teachertraining, this analysis adopts a minimal definition ofadequate qualifications. The premise underlying thisanalysis is that adequately qualified staffing requiresteachers, especially at the secondary school level andespecially in the core academic fields, to hold, as aminimum prerequisite, at least a college minor in the fieldstaught. The analysis focuses on how many secondary levelstudents enrolled in the core academic subjects(mathematics, English, social studies, science) are taught byteachers without at least a college minor in the field. In thisview, even a moderate number of teachers lacking suchminimal training prerequisites is a strong indication ofinadequacies in the staffing of schools.

The analysis examines whether access to qualifiedteachers is equally distributed across different studentpopulations. It begins by focusing on differences betweenhigh-poverty and low-poverty schools, and also betweenhigh-minority and low-minority schools. Many researchersassume that the high-poverty and minority populations areone and the same. It is important, however, to examine thedata on out-of-field teaching by these two characteristicsseparately, because previous research has suggested thatdifferences in the levels of teacher qualifications are notalways the same across them (Pascal 1987).

The analysis also examines within-schooldifferences in teacher qualifications across classes ofdifferent student ability groupings, and of different studentraces and ethnicities. Again, it is also important to examineout-of-field teaching separately by these sets ofcharacteristics because it cannot be assumed that theirrelationships to teacher qualifications are the same.

Finally, this analysis examines within-schooldifferences in teacher qualifications across differentsecondary school grade levels - specifically, grades 7through 12. Although many may agree that basic educationis an essential prerequisite of qualified teachers, there isprobably less agreement whether out-of-field teaching hasas serious consequences at the junior high level as it has forthe senior high grades. Hence, it is important to distinguishamong grades at the secondary level and to determinewhether there are, in fact, differences in out-of-fieldteaching levels across these different grade levels.

Data and Methods

The data source for this study is the nationallyrepresentative 1990-91 Schools and Staffing Survey. TheU.S. Census Bureau collected these data for NCES in early1991 from a random sample stratified by state, sector andschool level. SASS is particularly useful for addressingquestions concerned with teachers' qualifications. It is thelargest and most comprehensive dataset available onteachers and school staffing characteristics in the U.S.Indeed, as indicated earlier, this survey was conducted forthe reason that there has been a paucity of nationallyrepresentative data on such issues. SASS, for example,includes a wide range of information on the training,education, qualifications and teaching assignments ofteachers that can be disaggregated by field and alsodisaggregated by the characteristics of schools, students andclassrooms (For more information on SASS, see Choy et al.1993).

The sample utilized in the analysis consists of25,427 public school teachers, including those employedboth full-time and part-time. This analysis focuses solely onthose teaching at the secondary-school level (grades 7through 12), regardless of whether the school was actuallya middle school, junior high school, a senior high school, asecondary school, or a combined school. Furthermore, itsolely focuses on those who taught departmentalizedcourses in any of the core academic fields (English,mathematics, science, social studies). For example,secondary level teachers teaching multiple subjects in self-contained classes were excluded from the analysis.Likewise, the non-7-12th grade portions of the schedules ofteachers in combined schools or middle schools wereexcluded.

For each class period in the school day of each ofthe sampled teachers, data were collected on the subjecttaught, grade level, class type or track, student achievementlevel, student race/ethnicity and the number of studentsenrolled. In addition, teachers reported their certificationstatus and the major and minor fields of study for each oftheir degrees earned, at both the undergraduate and graduatelevels. We have used these data in a series of projects todevelop and compare a range of different measures of out-of-field teaching (see McMillen and Bobbitt 1993; Bobbittand McMillen 1995; Ingersoll 1995a, 1996b). Thisanalysis focuses on one measure drawn from this earlierresearch: the percentage of public secondary schoolstudents enrolled in classes taught by teachers without atleast a minor in the field.

Fields are defined broadly in this analysis. Thecore academic subjects and college major/minors arebroadly categorized into four fields parallel to conventionalcore academic departments in secondary schools:mathematics, science, social studies, and English. To bedefined as in-field, English teachers must hold at least a

minor in either English, English education, language arts,literature, reading, communication or journalism.Mathematics teachers must hold at least a minor in eithermathematics, engineering, or mathematics education.Science teachers must hold a minor in any of the sciences.Social studies teachers (i.e., history, economics, civics,world civilization) must hold at least a minor in one of thesocial sciences, in history, or in social studies education.

The objective of this analysis is to examinedifferences in the levels of out-of-field teaching amongdifferent types of schools, based on the poverty level andrace/ethnicity of the students enrolled and across differentkinds of classrooms within schools. These measures are:Poverty enrollment of school - percentage students in eachschool receiving federal reduced or free lunch program.Low-poverty: less than 15%Medium-poverty: 15% to 50%High-poverty: 50% or moreMinority enrollment of classroom or school percentagenon-white students.Low-minority: less than 15%Medium-minority: 15% to 50%High-minority: 50% or moreType or track of class -Low- track: general, remedial, vocational, special educationMedium-track: academic/college preparatoryHigh-track: honors, advanced placement, gifted.Grade level of class - grades 7 through 12

Results

What proportion of the nation's public secondary studentsare taught core academic subjects by out-of -fieldteachers?

Overall, substantial proportions of students inpublic secondary schools in the U.S. were taught academicsubjects by teachers without basic qualifications in thosesubjects. The proportions of public secondary schoolstudents taught each of the core academic fields by teacherswithout at least a minor in the field are presented in table 1.

For example, about one fifth of all public schoolstudents enrolled in English classes in grades 7-12, or about4,310,000 of 20,700,000 students, were taught by teacherswho did not have at least a minor in English, literature,communications, speech, journalism, English education orreading education. In addition, over one quarter of allpublic school students enrolled in mathematics classes ingrades 7-12, or about 4,124,000 of 15,510,000 students,were taught by teachers without at least a minor inmathematics or in mathematics education. In science 17percent of all public school students enrolled in scienceclasses in grades 7-12 were taught by teachers without atleast a minor in any of the sciences or in science education.

Overall, a relatively low proportion of students were taughtsocial studies out of field; thirteen percent of studentsenrolled in social studies were taught by teachers without atleast a minor in any of the social sciences, in public affairs,in social studies education, or in history.

Table 1- Percentage of public secondary school students enrolled inclasses taught by teachers without at least a minor in the field, byfield and selected school characteristics

Englich Math ScienceSocialStudies

Total 20.8 26.6 16.5 13.4

MinorityEnrollmentof School:

Low-minority 20.0 24.3 13.9 11.6

Medium-minority

19.1 23.1 16.6 15.6

High-minority 24.4 33.6 17.8 14.4

PovertyEnrollmentof School:

Low-poverty 15.6 20.6 11.9 11.6

Medium-poverty

21.7 30.1 16.0 14.5

High-poverty 33.2 32.6 29.3 15.3

Are students in schools serving predominantly poverty-level or minority student populations more likely to betaught by out-of -field teachers than students in schoolsserving predominantly not poor or white students?

There were also differences in the amount of out-of-field teaching across different types of schools, but thisdepended on the type of schools compared and the fieldsexamined. Notably, although in some fields there appearto have been slight differences in levels of out-of-fieldteaching between high and low-minority schools, in nofields were these differences statistically significant.

In contrast, school poverty levels were clearly relatedto the amount of out-of-field teaching and the differenceswere in the direction predicted by the literature oneducational inequality. That is, in no fields did high-povertyschools have less out-of-field teaching than did low-povertyschools, while in several fields, students in high-povertyschools received distinctly more out-of-field teaching thenin low-poverty schools. For example, a third of Englishstudents in high-poverty schools, as opposed to 16 percent

in low-poverty schools, were taught by teachers who did nothave at least a minor in English, English education,language arts, literature, reading, communication orjournalism. There was, however, little difference in out-of-field teaching in social studies between schools of differentpoverty levels. Regardless of the school poverty level, allhad relatively low levels of out-of-field teaching in socialstudies.

Are students in low-track, or lower-grade level classes, orclasses predominantly comprised of minority students,more likely to be taught by out-of -field teachers thanstudents in high-track, higher-grade level orpredominantly white classes?

The amount of out-of-field teaching was not equallydistributed across different types of classes and groups inschools. These data are displayed in table 2.

Table 2 - Percentage of public secondary school students enrolled inclasses taught by teachers without at least a minor in the field, byfield and selected classroom characteristics

English Math ScienceSocial

Studies

Total 20.8 26.6 16.5 13.4

Type or Trackof Class.

Low-track 24.7 33.5 20.4 14.3

Medium-track 11.8 15.7 9.2 8.9

High-track 11.2 20.4 7.2 11.2

MinorityEnrollmentof Class:

Low-minority 19.2 22.7 14.6 12.3

Medium-minority

19.9 24.2 17.7 15.0

High-minority 25.2 36.1 19.6 14.3

Grade Levelof Class:

7th grade 32.2 48.8 31.8 23.9

8th grade 32.9 37.1 23.8 19.7

9th grade 15.7 18.1 10.7 8.7

10th grade 11.1 16.8 8.9 8.8

11th grade 11.2 15.9 6.4 6.8

12th grade 13.9 24.2 13.1 11.3

In several fields, students in high-track classes had lessout-of-field teaching than did those in the low-track classes.For instance, about one tenth of students in high-trackEnglish classes were taught by out-of-field teachers. But,about one quarter of those in low-track English classesreceived out-of-field teaching. There was, however, littledifference in levels of out-of-field teaching between the twohigher tracks the honors/gifted/AP track and the collegepreparatory track.

In contrast to tracks, there was little difference in out-of-field teaching between predominantly white andpredominantly minority classes. In none of the fields wasthere a statistically significant difference in out-of-fieldteaching between high-minority classes and low-minorityclasses.

There were, however, some distinct differencesbetween the junior high school grade levels and the seniorhigh school grade levels. Students in grade 7 were morelikely to have received out-of-field teaching than were 12thgrade students in all fields, with the exception of math. Forexample, about one third of science students in 7th gradewere taught by teachers without at least a minor in any ofthe sciences or science education; while this was true foronly about a tenth of the science students in 12th grade. Insome fields, students in grade 8 were also more likely tohave received out-of-field teaching than were 12th gradestudents. There were not, however, distinct differencesamong the senior high grade levels. Ninth grade students,for example, were not necessarily more likely to have beentaught by an out-of-field teacher than were 12th gradestudents.

Discussion

The data clearly show that many students in publicschools in grades 7-12, regardless of the type of school,were taught core academic subjects by teachers without atleast a college minor in the field taught. They also showthat there were some distinct inequities in the distribution ofout-of-field teaching across schools and classrooms. Thisarticle does not, however, address the question of what arethe reasons, causes or sources of out-of-field teaching, norwhy some schools or classrooms have more it than others.Other analyses using SASS data to examine out-of-fieldteaching offer some insights and I will review these below.

Many have argued that out-of-field teaching is aproblem of poorly trained teachers. As indicated earlier,this view is incorrect. Out-of-field teaching is not due to alack of education on the part of teachers but is due to a lackof match between teachers' fields of training and their fieldsof assignment.

Other educational analysts have argued that out-of-fieldteaching is due to teacher shortages. There is some truth tothis view. Some schools do report having difficulties

5

finding qualified candidates for teaching job openings andschool administrators commonly turn to the use of substituteteachers, in-school reassignments and hiring of theunderqualified as coping strategies. Out-of-field teachingis the inevitable result of these kinds of coping strategies.But, contrary to conventional wisdom, neither out-of-fieldassignments nor teacher shortages are primarily due toincreases in either student enrollments or teacherretirements.

The demand for new teachers is primarily from teacherturnover, not increases in student enrollments. Moreover,poor working conditions, not teacher retirements, createmost turnover. Hence, shortages result most often frompoor working conditions. Low teacher salaries, little facultyinput into school policies, and rampant student disciplineproblems all contribute to teacher turnover. Improvingthese things would decrease turnover, which would quicklyeliminate shortages. It would also remove much of the needfor out-of-field assignments in the first place.

This points to an alternative explanation of out-of fieldteaching the low status of the occupation. Unlike in manyof the other developed nations, teachers in the U.S. arelargely treated as low and semi-skilled workers. The datasuggest that out-of-field teaching is not an emergencycondition, but a normal and ongoing practice in manyschools. This prevalence attests to how widely accepted isthe idea that teaching does not require any special expertiseand that teachers are like interchangeable blocks that can beplaced in any empty slot regardless of their type of training.Clearly, if teaching was treated as a highly valuedprofession and provided with commensurate rewards,respect and working conditions, there would be no problemattracting and retaining more than enough qualifiedteachers, and out-of-field teaching would neither be needednor permitted.

Related to the question of the causes of out-of-fieldteaching, is a second question - why do some schools havemore of it than others? In particular, why do low-incomeschools have higher levels of out-of-field teaching?

As mentioned earlier, one view, widely held amongcritics of educational inequality, is that low-income schoolsare not able to attract, or to retain, adequately trainedteachers because they are unable to match the salaries,benefits and resources offered by more affluent schools. Asa result, these critics hold, such schools have difficultieshiring adequately trained teacher candidates and suffer fromhigh levels of teacher turnover (e.g., Kozol 1990; Oakes1990).

There has, however, been little empirical verification ofthis view and, moreover, data from SASS suggest that thisexplanation may not be entirely correct. The data show, forexample, that starting-level and advanced-level salaries inhigh-poverty schools are not appreciably lower than in otherschools. In addition, teacher turnover rates are also notappreciably higher in low-income schools (Ingersoll

13

1995a). Low-income schools do appear to have slightlymore difficulty in filling teaching openings. But, thesedifferences appear to account for some, but not all, of thehigh levels of misqualified teachers in such schools. SASSdata show, for example, that there are several factorsbesides the overall poverty or affluence of the studentpopulation that are related to the degree of out-of-fieldteaching in schools. For instance, school size and sectorare both strongly related to out-of-field levels; small schoolsand private schools both have distinctly higher proportionsof out-of-field teaching (Ingersoll 1995a, 1996c). Theseissues warrant further research.

An additional important issue concerns equalities inaccess to qualified teachers, according to the race /ethnicityof students. As noted above, it is commonly believedamong education analysts that both poor and minoritystudents do not have equal access to qualified teachers (e.g.,Kozol 1990; Oakes 1990). In contrast, this analysis findsfew distinct differences in levels of out-of-field teaching,according to the proportion of minority students inclassrooms or in schools. This does not mean, of course,that there are no inequalities in access to quality teachingand quality teachers, according to the race/ethnicity ofstudents. There may be other kinds of differences in accessthat are not revealed by the data and measures used in thisanalysis. Moreover, this analysis does not separatelyexamine different minority groups and, hence, there may bedifferences in access between different minority groups notrevealed here. What this analysis simply shows is thatminority students, as a whole, were not more likely to havebeen taught by out-of-field teachers. Moreover, it alsocorroborates the importance of distinguishing betweenrace/ethnicity and income/poverty characteristics of studentpopulations.

References

Bobbitt, S. & McMillen, M. (1995). Qualifications of thepublic school teacher workforce: 1988-1991. (NCESReport No. 95-665). Washington, DC: U.S.Department of Education, National Center forEducation Statistics.

Choy, S., Henke, R., Alt, M., Medrich, E., & Bobbitt, S.(1993). Schools and staffing in the US: A statisticalprofile, 1990-91. (NCES Report No. 93-146).Washington, DC: U.S. Department of Education,National Center for Education Statistics.

Coleman, J., Campbell, E., Hobson, C., McPartland, J.,Mood, A., Weinfeld, F., & York, R. 1966. Equality ofEducational Opportunity. Washington, D.C.: U.S.Government Printing Office.

Darling-Hammond, L. (1987). "Teacher Quality andEquality." In P. Keating and J.I. Goodlad, Access toKnowledge. New York: College Entrance Examination

Board.Darling-Hammond, L. and Hudson, L. (1990). "Pre-college

Science and Mathematics Teachers: Supply, Demandand Quality. Review of Research in Education.Washington, D.C.: American Educational ResearchAssociation.

Haggslmm, G. W., Darling-Hammond, L., & Grissmer, D.(1988). Assessing teacher supply and demand. SantaMonica CA: Rand Corporation.

Haney, W., Madus, G., & Kreitzer, A. (1987). Charmstalismanic: Testing teachers for the improvement ofAmerican education. Review of Research inEducation, 13: 169-238. Washington, DC: AmericanEducational Research Association.

Ingersoll, R. (1995a). Teacher Supply, Teacher Qualityand Teacher Turnover: 1990-91. (NCES Report No.95-744). Washington, DC: National Center forEducation Statistics.

Ingersoll, R. (1995b) "Teacher Supply and Demand in theU.S." In The Proceedings of the American StatisticalAssociation: 1995. Alexandria, Va: AmericanStatistical Association.

Ingersoll, R. (1996a). National Assessments of TeacherQuality. Washington, DC: U.S. Department ofEducation, National Center for Education Statistics.

Ingersoll, R. (1996b). Out-of-Field Teaching andEducational Equality. (NCES Report No. 96-040).Washington, DC: National Center for EducationStatistics.

Ingersoll, R. (1996c). The Problem of Out-of -fieldTeaching in the U.S. Paper presented at the annualmeeting of the American Sociological Association.

Kozol, J. (1991). Savage Inequalities. New York: Harper-Collins.

McMillen, M. & Bobbitt, S. (1993). Teacher certification,training and work assignments in public schools.Paper presented at the annual meeting of the AmericanEducation Research Association.

Murnane, R. and Raizen, S. (1988). eds. Improvingindicators of the quality of science and mathematicseducation in grades k -12. Washington, D.C.: NationalAcademy Press.

Oakes, J. (1990). Multiplying inequalities: The effects ofrace, social class, and tracking on opportunities tolearn mathematics and science. Santa Monica, CA:The RAND Corporation.

Pascal, A. (1987). The qualifications of teachers inAmerican high schools. Santa Monica, CA: TheRAND Corporation.

Shavelson, R., McDonnell, L. and Oakes, J. (1989).Indicators for monitoring mathematics and scienceeducation. Santa Monica, CA: Rand Corporation.

614

I

p

I

USING QUALITATIVE METHODS TO VALIDATE QUANTITATIVE SURVEY INSTRUMENTS

John E. Mullens, Policy Studies AssociatesDaniel Kasprzyk, National Center for Education Statistics

John E. Mullens, PSA, 1718 Connecticut Avenue, N.W., #400, Washington, DC 20009

Key Words: validation, case study, focus group

Surveys are among the most cost-effective and leastburdensome methods of collecting data on schools,classrooms, and teachers, but both researchers andrespondents know that brief, self-report strategies maynot portray a picture of instruction as sufficiently asneeded to confidently assess instructional effectiveness.For a project at the National Center for EducationStatistics (NCES) designed to investigate techniques andinstruments to measure and understand the instructionalprocesses used by eighth to tenth grade mathematicsteachers, we used a multi-step pilot-study process toconstruct, refine, and validate survey instruments. Inthe process, we honed our knowledge about instrumentsand methods for collecting accurate, valid, andmeaningful information that can be incorporated intofuture national data collection schemes.

In this paper, we outline the project scope, describethe data collection methods used, and assess their mle inevaluating survey responses and improving instrumentsto provide portraits of relevant classroom processes. Acomplete description and assessment of this project isreported in Mullens and Leighton (1996).

The Classroom Instructional Processes Study

The NCES project, "Understanding ClassroomInstructional Processes," was designed to (1) develop,pilot, and evaluate methods for collecting data onclassroom instructional practices; (2) explore thecombined use of questionnaires and related teacher logforms to portray classroom instructional processes; and(3) determine the feasibility of incorporating suchmethods into future NCES surveys or other datacollection efforts. The project piloted focus groups andcase studies (using classroom log forms, observations,and artifact collection) to assess the completeness andaccuracy of data obtained from questionnaire responses.Through this process, we used qualitative methods tovalidate responses on quantitative survey instruments.

The results were intended to help NCES makedecisions about data collection methods and instrumentswith which to develop an accurate portrait of eighth totenth grade mathematics instruction. Having such datawould expand NCES's ability to respond to Congress,

other offices in the Department of Education; and otherfederal agencies. state departments of education,associations concerned with elementary and secondaryeducation, and education research organizations. Datafrom previous surveys on similar topics have been usedby all of these sectors, and in recent years there hasbeen interest in expanding the scope of these data.

ContextIncreased use of high stakes student testing as a

measure of educational productivity has led to increasedinterest in determining the precise contribution ofschooling to achievement, distinct from, for example,the contributions of prior learning or socioeconomicstatus. Experts in identifying the correlates of studentachievement, such as Porter (1991) and Schmidt (1995)argue that many factors are at work: the content mustbe presented cogently, using subject-specificinstructional techniques appropriate to both the materialand the students' prior knowledge. and with emphasismatched to the topic's relative importance amongdesired outcomes. Valid and reliable assessments ofinstructional content and practices can contribute todescriptions of educational experiences, help explainachievement outcomes. and inform educational policydevelopment at the local, state, and national levels(Burstein, Oakes. Guiton, 1992; Smith, 1988; Mumane,1987). Despite this, minimal data on instructionalpractices are available from a nationally representativesample of U.S. classrooms.

This study builds on a prior review of existingmeasurement approaches (Mullens. 1995), and focuseson four major dimensions of classroom instruction: theconditions and context that direct or influence ateacher's selection of content and instructional methods;the course content and emphasis on those topics;patterns of classroom pedagogy and how teachersapproach the process of teaching; and the resourcesavailable and used in the classroom.

Research QuestionThe study goal was to produce and evaluate

instruments and methods that would provide data on.how the instructional processes and content of eighth totenth grade mathematics classes vary across the country.Within this overall goal, we also expected to advanceour understanding about instruments and methods for

capturing accurate and meaningful information aboutclassroom instructional processes: information that couldbe incorporated into national data collection schemes.Data from the field tests could provide evidence withwhich we could understand more about the items andinstruments themselves. The full study explored threemeasurement questions, one of which is the focus of thispaper:

Do qualitative data collection instruments andtechniques provide validating information withwhich we can better construct questionnaires?

Study DesignWe selected mathematics as the focus of this study

because it is a core subject of great interest to policymakers and thus one in which early exploratory workalready provided a sound researchfoundation for furtherstudy. Within this content area, the study concentratedon eighth to tenth grade mathematics courses, coveringtopics in pre-algebra, algebra, and geometry: the mathcourses designed to serve as a "bridge" to moreadvanced math courses, and to offer students aconceptual understanding of mathematics with broadapplications in life.

From preliminary research, we decided to validatethe teacher questionnaire through focus groups and casestudies. Focus groups are roundtable open discussionswith small numbers of respondents who had alreadycompleted the questionnaire, to examine each item forunfamiliar or inexact terminology and how well theitems and their responses represented their ownteaching.

Case studies of classroom teachers includedobserving their teaching, having them maintain dailylogs, and collecting artifacts of their instruction. Weobserved classroom instruction to evaluate thecompleteness of the instrument, looking especially forconceptual gaps in our understanding of how instructionoccurred, and in how it was represented on theinstruments. Daily logs, or diaries, were recordsdocumenting learning objectives, teachers' actions,students' activities, and the materials used during asingle class. Four weeks of data enabled us to evaluatethe consistency between teacher's questionnaireresponses and her daily recordings of activities.Examining the instructional materials or artifacts usedby teachers during that same period of time wereintended to provide information on the same eventsfrom a different slant.

8

Elements of these pro cess es to validate questionnaireitems have been explored and improved in severalrecent studies in this field, including Reform Up Close(Porter, Kirst, Osthoff, Smithson, & Schneider, 1993),Third International Study of Mathematics and Science(1991), and Validating National Curriculum Indicators(Burstein. McDonnell, Van Winkle, Ormseth, Mirocha,& Galion. 1995).

We piloted our instruments and process in twoschool districts, revised them, and obtained OMBclearance. We fieldtested the instruments and processin three school districts: one was a large, independent,urban district on the West Coast; the second was a largecity/county urban district in the Southeast: and the thirdwas a smaller, suburban/rural, county district in theMid-Atlantic region. In all, 111 teachers completedquestionnaires, and seven teachers, one or more fromevery field test site, participated in the case study.

Data Collection Methods

We used focus groups and case studies to validateresponses on the teacher questionnaire.

Focus GroupsAt each of the three sites, all teachers of eighth,

ninth, and tenth grade mathematics received a letter ofinvitation that explained the study and requested theirparticipation. Attendance at the seven voluntary focusgroup meetings ranged from one to 12 teachers andtotaled 38. Teachers commented on their understandingof the item's intent and the appropriateness of theresponse format.

The greatest teacher concern across all sites was not(as we suspected) that the teacher questionnaire wouldnot adequately portray their teaching, but that theparticular class they were asked to describe (the firstinstructional period of the day) was not representative oftheir whole teaching load. Specific characteristics of thestudents in that class, according to most teachers, causedthem to teach in some manner they felt was notrepresentative of their overall efforts. Despite thisconcern, most focus groups came to the conclusion thatwhile there was no single period that would catch eachof them at their most representative, the combinedresults of all sampled teachers would indeed representthe overall collection of the activities of all teachersthroughout the day.

16

p

a

Case StudiesVolunteer case-study teachers were observed by a

project researcher, kept a daily log of classroominstructional activities and those of the students in theirdesignated class during a four-week period, andcollected instructional artifacts. The project had sevencase studies.

Classroom logs. Information from classroom logswas used to assess the consistency of teachers' dailyrecordings of classroom practice with their one-timeaccount of practice from the teacher questionnaire. Thepicture of classroom practice obtained from multipleweeks of log form data is a finer grained view of theenacted pedagogy than that provided by a teacher'squestionnaire responses summarizing a semester ofpractice. For both practical and perceptive reasons,completing logs daily (or at most, weekly) is likely toresult in more accurate data than a one-timeretrospective survey. Because logs rely on teachers'short-term memory rather than their long-term memoryand the summative ability needed for the questionnaire,the resulting data can be presumed to be more accurate.Furthermore, teachers may be more inclined towardhonest accounts on a daily rendering since a single dailylog, unlike teacherquestionnaire responses, becomes oneof many depictions of their practice. Classroom logswere also intended to be used by researchers to recordevents and activities as they observed in classrooms.

A weakness of prior research had been the inabilityto use data from classroom logs to estimate thereliability of questionnaire items (Porter, 1993). Thisdifficulty stemmed from using different items of the logthan were included on the questionnaire. We designedthe log to be completed by case-study teachers as arecord of the classroom instruction occurring during asingle class period so that the data from four weeks oflogs could be used to evaluate the validity of theteacher's responses on the questionnaire. To make thisdirect link possible and to build on the knowledgegained from prior research, we constructed the log bydirectly copying specific items and activities from thoseon the questionnaire; frequency response optionscovering a semester were replaced with time per useresponse options covering a single period. Sharingidentical items between the two instruments wasintended to facilitate the later comparison of theteacher's daily logs with her responses on the survey.

Classroom observations. Researchers observedcase-study volunteers to help them understand thefunction of the classroom log and the process of using

it. After observing the teacher instructing the targetedclass, both the researcher and the case-study teachercompleted a log form. Teacher and researcher thencompared observations, discussing differences in coding.For all but two teachers, those differences were slight.Because these teachers had participated in the focusgroup discussions of the items, most already understoodnuances of meanings that might make a difference inhow they recorded their instruction. Researchers hadenough concerns about the coding patterns of oneteacher, however, to repeat the calibration process asecond time.

Artifact collection. To provide further detail abouttheir lessons (and reduce the need for writtenexplanations), case-study teachers were asked to submitcertain instructional items figuring prominently inlessons for the designated class. Such instructionalitems included copies of homework and in-classassignments; directions for papers. reports, or projects;copies of tests and quizzes; and any other writtenassignments. These artifacts were intended to provideanother avenue through which researchers couldinterpret the teacher log data for each lesson.

Assessing the Methods

While each qualitative method helped validate thequantitative data obtained from the teacher survey, somecontributed more information than others to ouranalysis.

Focus GroupsThe purpose of focus groups was to provide

respondent feedback on the survey instrument. Theyserved that purpose well, and, unexpectedly, proved tobe a major source of case-study volunteers. Especiallyat the beginning, focus groups allowed researchers todirectly hear respondents' comments and probe theirexact meanings. Such exchanges allowed bothresearchers and respondents to raise and explore manyissues usefully, to validate the relevance of certain itemsacross sites, to hone wording, and to generate additionalideas of emerging instructional practices. For example,an earlier version of the questionnaire included"calculator" on the list of instructional materials.Teachers were asked to indicate if calculators wereavailable for use by students. In one focus group,teachers suggested that was not the issue. They hadplenty of calculators available and even sufficientbatteries. But the calculators were not sophisticatedenough to do the kinds of operations the teacherswanted to teach their students. Because of that

discussion, the item was changed to "appropriatecalculator".

Teachers at another site complained that thequestionnaire afforded them no opportunity to indicatethey structured their classes around cooperative learningstrategies. They explained that cooperative learning wasa major effort in the school district's instructionalprogram, yet there was no place on the questionnaire toindicate the use of that strategy in the classroom. Whenthat same comment was heard in another location, it wasadded to the list of teacher activities and studentactivities.

As the project continued and researchers heldadditional focus groups in new fieldtest sites, however,the utility of additional information new to researcherssubstantially decreased. We obtained little newinformation from the later focus groups.

Case StudiesThe case studies provided substantial information

with which to assess the construction of the teacherquestionnaire.

Classroom logs. The project was not funded todesign a process to validate the reliability of

questionnaire items, but to fully understand the benefitsand limitations of the information that we could obtainfrom logkeeping, we measured the consistency betweenteachers' daily reports of instructional activities andtheir semester report of the same activities. Assumingthat the daily log reports had a higher level of teacherreporting accuracy than the questionnaire responses forreasons stated above, we used log responses to assessthe reliability of the questionnaire responses. For eachcase-study teacher, we compared the sum of log-reported activities across a representative four weekperiod with that teacher's questionnaire-reportedactivities over the semester. For example, if thequestionnaire responses indicated that the teacherstimulated student discussions of multiple approachesmore than once a week, we expected to see confirmingentries of such discussions on the teacher's daily log.This provided a measure of the reporting reliability ofindividual questionnaire items.

Our sample of seven case-study teachers waspurposive and too small to generalize to the largersample of all survey respondents; nonetheless, Table 1illustrates the type of information we might obtain usingthis process with a larger and appropriately randomsubsample of case-study teachers.

Table 1: Examples of consistency between (a) teachers' survey responses describing a semester of classes and (b)their class log entries over a four-week period, on the same teacher activity (nonrandom sample, N=7).

Teacher ActivitiesPercent direct

Percent agreementwithin one survey

agreement response value category

Provide individual or small group tutoring as needed duringindividual seatwork or small group activities involving everyone 100 NA

Lecture, perhaps occasionally using the board or overheadprojector to highlight a key term or present an outline 71 86

Demonstrate a concept, using two-dimensional graphics such asdrawings on the board, overhead projector, or computer 71 86

Provide supplementalremedial or enrichinginstruction to a pull-out group while the rest of the class works in assignments 71 86

Administer a test or a quiz 57 86

Demonstrate a concept, using three-dimensional tools such asmanipulatives, models, or other objects 57 71

Lead students in discussion, recitation, drills, or question-and- 43 100answer sessions

Observe or monitor student-led discussions 43 57

Work on administrative tasks while students work on assignments 29 57

Table reads: In a nonrandom sample of seven teachers over four weeks, teachers' responses on a survey item about tutoring wereconsistent in 100 percent of cases with their responses on the log item on the same topic. Teachers' responses on the survey itemabout lecturing were consistent with their log responses in 71 percent of cases and within one response value in 86 percent ofcases.

1.810

110

p

These data, reporting consistencies between theteacher questionnaire and the log form for teacheractivities from item 13 of the questionnaire, suggest thatteachers' recollection of their instructional behaviorvaries according to the activity being reported.Examining the extreme cases, for example, there wasone hundred percent consistency between teachers'reports on the questionnaire and on their daily logsabout the frequency with which they provided individualor small group tutoring. Teachers apparently rememberthat type of instruction well. There was far lessagreement (29 percent) between questionnaires and logson how often the teacher works on administrative taskswhile students work on assignments. Temporarilyrelaxing stringency to assess agreement within onesurvey response category increases the level ofconsistency between logs and questionnaires on thissame item but only to 57 percent.

For four items, the rate of agreement betweenresponses on the two instruments is above 70 percent,while in five it is 57 percent or less. To betterunderstand these differences, we examined the directionof the mismatch of the five items with low agreementfor possible evidence of socially desirable questionnaireresponses. For three such items, data from the dailylogs reported that the following activities actuallyoccurred more frequently than teachers' questionnaireresponses would indicate:

administrative tasksdrill and recitationstudent-led discussions

In one light, these responses may show evidence ofsocial desirability, since the first two activities could beconsidered old fashioned or less than desirable in aclimate of reform. Such an argument would suggestthat subtle pressures may have influenced teachers'questionnaire responses. That student-led discussionsappear to have actually happened more often thanteachers indicated they did does not seem to follow thatsame explanation.

For two other items, data from the daily logssuggested that the following activities occurred lessfrequently than teachers' questionnaire responses wouldindicate:

demonstrating a concept with three-dimensionaltoolsadministering a test or quiz

11

These differences suggest that teachers like to think theyuse three-dimensional manipulatives more than theyactually do and that teachers administer fewer tests andquizzes than they might think.

Recalling again that this is only a demonstrationanalysis based on seven nonrandom sets of logs, wesuggest no generalization of results beyond these seventeachers. The process, however, seems to show promise.Specific results from a larger and representativevalidation study might be different, but would likely beno less interesting.

Classroom observations. We designed theclassroom observations of case-study teachers and thelater discussion about completed log forms to providethose teachers with an experiential -based understandingof the meanings of the log form terms, and with practicein completing the form. Discussing specific eventsoccurring within a particular class and how theytranslated into log form responses established commonunderstandings of log form terms more directly thanwould have resulted from an abstract discussion only.Conducting multiple classroom observations acrossdifferent sites and the resulting observation data alsoprovided researchers with evidence with which to assess(1) the ability of the survey instrument to portrayclassroom processes accurately and (2) the matchbetween actual classroom practice and survey scope,individual items, and response formats. In addition, thenonjudgmental research approach to discussing theobserved instructional activities proved to be anunanticipated and effective method for cementingteacher cooperation and building confidence in theprocess.

During the fieldtests, researchers used their copy ofthe observation form to record classroom activities asthey occurred, creating a real-time log of theinstructional processes occurring during a single classsession. Having the researcher use a more structuredclassroom observation instrument with which to initiallyrecord teacher and student activities and elapsed timemay improve researchers' understanding of how teachersrecord their instructional processes on the log form, andmay result in a more accurate recording of the durationof instructional elements occurring during instructionand the order in which they occurred.

Artifacts. We collected artifacts from case-studyteachers to investigate the potential of such documentsto more completely describe or illuminate classroominstructional processes. Although this process was

19

inexpensive, it afforded little analytical benefit to thisstudy. The artifacts collected were primarily assignmentsheets and examples of student work. We know insome cases, and suspect in others, that participatingteachers sent incomplete records of the mathematictextbooks they used. Textbook pages or items usedduring instruction were the most notable void. Thoseartifacts we did have were difficult to assess. We canidentify, for example, what was used during class (e.g.,a practice sheet) and evaluate some elements of itscontent (e.g., estimation) but can tell little from theartifact itself about the instructional objective beingaddressed, how the artifact was used, or the amount ofemphasis given to each element of the artifact. Infurther experiments with artifacts, we would investigatedeveloping (1) a teacher checklist of contextual datasurrounding the artifact's use within the lesson; and (2)a specific protocol for assessing important features ofthe artifact (such as instructional objective) and its use.Such protocols may be time consuming (and thereforeexpensive) to implement, effectively negating theoriginal low cost of collecting the artifacts. So althoughartifact analysis may have great potential to addsubstance to self-reports, the process needs furtherattention.

Summary

With this task, we evaluated the usefulness ofsupplementary data collection in the form of focusgroups and case studies in contributing to anunderstanding of how well our instrument measured thedomains of interest, and how well the survey responsesrepresented what teachers actually do. The focus groupdiscussions provided excellent feedback on the survey,but are limited in the amount of new informationprovided by multiple focus groups. The case-studyprocess, and classroom logs in particular, provide avaluable estimation of the consistency betweenresponses on teacher questionnaires and on class logs.Classroom observation is beneficial to the researcher'sunderstanding of the phenomenon under investigationand to the process of gaining trust for later segments ofdata collection, but we were disappointed in the resultsof our attempts to use artifacts to expand ourunderstanding of classroom instructional processes, andsee further experimentation as the key to greaterbenefits.

Based on the results of the research described above,we think certain qualitative methods can expand theways in which we validate quantitative surveyinstruments. We are currently embarking on a projectto survey a sample of 400 teachers of eighth to twelfth

grade mathematics, engaging a subset of 60 in casestudies. We will use the results reported here to expandour use of classroom observations and logs to validatethe quantitative survey instruments.

References

Burstein, L., Oakes, J., & Guiton, G. (1992).Education indicators. In M.C. Alkin (Ed.),Encyclopedia of educational research (5th ed., pp.409-418). New York: MacMillan.

Burstein, L., McDonnell, L., Van Winkle, J., Ormseth,T., Mirocha, J., & Guiton, G. (1995). Validatingnational curriculum indicators. Santa Monica, CA:RAND.

Mullens, J. (1995, April). Classroom instructionalprocesses: A review of existing measurementapproaches and their applicability for the TeacherFollow-up Survey (Working Paper No. 95-15).Washington, DC: National Center for EducationStatistics.

Mullens, J., & Leighton, M. (1996). Understandingclassroom instructional processes (draft).Washington, DC: Policy Studies Associates.

Murnane, R. (1987). Improving education indicatorsand economic indicators. Educational Evaluationand Policy Analysis, 9(2), 101-116.

Porter, A. (1991). Creating a system of school processindicators. Educational Evaluation and PolicyAnalysis. 13(1), 13-29.

Porter, A. (1993). Defining and measuring opportunityto learn. The debate on opportunity-to-learnstandards: Supporting works. Washington, DC:National Governors'. Association.

Porter, A., Kirst, M., Osthoff, E., Smithson, J., &Schneider, S. (1993). Reform up close: Ananalysis of high school mathematics and scienceclassrooms. Madison, WI: Wisconsin Center forEducation Research.

Schmidt, W. (1995, June). Presentation made at theCouncil of Chief State School Officers NationalConference on Large Scale Assessment, Phoenix,AZ.

Smith, M. (1988, March). Educational indicators. PhiDelta Kalman, 487-491.

Third International Mathematics and Science Study.(1991). Project overview. East Lansing, MI:Author.

12 20

REVISING THE NCES PRIVATE SCHOOL SURVEY:A METHOD TO DESIGN A SYSTEMATIC CLASSIFICATION

OF PRIVATE SCHOOLS IN THE UNITED STATES

Sylvia Kay Fisher, Bureau of Labor StatisticsDaniel Kasprzyk, National Center for Education Statistics

Sylvia Kay Fisher, BLS, Room 4915, 2 Massachusetts. Ave. N.E., Washington D.C. 20212

Key Words: Survey research methods, Cognitive techniques, Educational surveys

INTRODUCTION AND BACKGROUND

The Private School Survey (PSS) is a paper-and-pencil survey administered by mail toapproximately 27,000 private schools throughout theUnited States under the auspices of the NationalCenter for Education Statistics (NCES), U.S.Department of Education. The PSS form is generallycompleted at the private school site by the principalschool administrator or office administrativeassistant. The PSS is used for many purposes,including the identification and classification ofprivate schools in the United States. Specifically,five PSS items are used to classify private schoolsinto one of nine categories comprising a typologyscheme designed to enhance federal-level statisticalreporting about U.S. private school education.

The Bureau of Labor Statistics (BLS) wasrequested by NCES to conduct a series of tests onspecific PSS questions to improve their ability tocapture accurate information and improve theclassification of U.S. private schools. Of particularconcern was a question asking schools about theirmembership status relative to a series of schoolassociations; this item has, in the past, led torespondent burden and possible underreporting. BLSdeveloped a cognitive testing plan to addressquestions raised by NCES regarding the PSS. Thepurpose of the testing plan was to evaluate thelanguage and visual construction of the five PSSquestions, the sequencing between and withinquestions, and the manner in which respondentsinterpret the questions.

The cognitive testing plan is scheduled to becompleted in two years of activities. The purpose ofYear 1 activities was to conduct developmentalquestioning and directed probing to obtaininformation about the respondent's response process,and included a total of eighteen interviews withprivate schools and eleven educational experts. Year1 activities were composed of three pretesting waves:Wave 1 and Wave 3 involved interviews with privateschools, respectively. Wave 2 consisted ofinterviews with eleven educational experts

13

Feedback obtained from all Year 1

respondents were utilized to develop revision(s) ofPSS questions, which are to be field-tested using alarge national representative sample during Year 2 ofthe study. As part of the nation-wide field test, avalidation study to estimate the degree of respondenterror resulting from implementing the new PSSquestion(s) will be conducted. The results of Year 2field-testing will yield a final set of items to beincorporated within NCES's Private School Survey.

This paper summarizes findings obtainedthrough the first year of pretesting conducted duringWaves 1-3. Final recommendations resulting fromYear 1 testing are provided.

METHODOLOGY

Wave 1 Testing: The overall purpose ofWave 1 testing was to obtain preliminary informationregarding how respondents answered PSS items,identify problems resulting from the responseprocess, obtain information about how PSSinformation is collected at the private school site, andcompile suggestions offered by participating schoolpersonnel. The original PSS items and an interviewprotocol was administered to nine private schoolrespondents. Data collected from Wave 1 testing wasused to develop preliminary revisions of the PSSitems that could be tested in latter waves of testing.

Participating Schools: Private schoolswere categorized using grade level [kindergarten,elementary, and secondary] and type of affiliation[formal religious, informal religious, and non-sectarian] as major variables. The intersection ofthese two variables (and their three levels,respectively) resulted in the generation of a 3 X 3matrix. One school was selected to represent eachmatrix cell. Wave 1 private schools were selectedrandomly from a computer-generated listingproduced by NCES, which comprised all privateschools in Miami, FL, Atlanta, GA, Boston, MA, andMinneapolis-St. Paul, MN. Private schools wereselected from all four states, such that each of thenine cells in the 3 X 3 matrix would be represented.

Data Collection Procedures: Wave 1

respondents were private school administratorsand/or administrative assistants responsible forcompletion of the PSS on behalf of the school.Participants were contacted by telephone andappointments were scheduled at a mutuallyconvenient time. Interviews were conducted at theprivate school site and were approximately one hourin length. Respondents participated voluntarily andwere not paid for their participation.

Cognitive Interviews: Interviews consistedof face-to-face cognitive tasks and the administrationof an interview protocol. Respondents were queriedabout several aspects of the PSS, including: thedefinition of the terms "orientation," "purpose," or"affiliation;" the meaning of the term "schoolassociation;" and the process by which schoolsbecome members of a school association.Respondents completed a task in which theyclustered 38 private school associations into groupsand labeled them. They also provided a list ofassociations with which the school is affiliated. Theirfeedback and recommendations were used to developpreliminary revisions of the PSS items and examinetheir effectiveness in latter Waves of pretesting.

Wave 2 Testing: Wave 2 testing involvedthe conduct of cognitive interviews with eleveneducational experts, representing national-leveleducational and religious school associations, basedprimarily in the greater metropolitan WashingtonD.C. area. Experts were consulted regarding theoriginal PSS items, preliminary revisions of the PSSitems, conceptual definitions, and their insights intoprivate education. Their feedback andrecommendations were used to modify thepreliminary revisions of the PSS items and improvetheir efficacy and appropriateness for future testing.

Data Collection Procedures: Elevenassociation representatives were contacted bytelephone and appointments were scheduled at amutually convenient time. Interviews wereconducted face-to-face at the association site andwere approximately one hour in length. Respondentsparticipated in the study voluntarily and were notcompensated for their participation. All interviewswere conducted within the greater metropolitanWashington, D.C. area (including Maryland andnorthern Virginia) and the state of Missouri.

Cognitive Interviews: Experts werequeried about the PSS questions, conceptualdefinitions, and their insights into private education.Experts completed the original PSS items as thoughthey were staff members in one of the schoolsbelonging to their association. Many experts hadbeen teachers and/or school principals and were ableto accomplish this task easily. Feedback wasobtained from the experts regarding the effectiveness

14

of the original items. Experts also evaluatedpreliminary revisions of the PSS items, and providedtheir opinions about the effectiveness of the proposedrevisions. Their responses and additional commentswere used to generate a final version of the PSS itemsto be tested with private schools during Wave 3testing.

Wave 3 Testing: The major objective ofWave 3 testing was to test the effectiveness of thenewly revised PSS item revisions and compare theireffectiveness to the original PSS items, as part of aninterview conducted with private school respondents.Respondents were encouraged to offer comments andsuggestions regarding the effectiveness of the revisedversions of the PSS items.

Participating Schools: The 3 X 3 matrixused during Wave 1 testing was used to identifyprivate schools participating in Wave 3 activities.Nine private schools were randomly selected from acomputer-generated listing provided by NCES of allWashington, D.C. area private schools (includingMaryland and Northern Virginia). Private schoolswere selected to ensure greater representation ofprivate schools across the nine-part typology used byNCES; specifically, a home school, an Islamicschool, an Orthodox Jewish school, a Montessoriprogram school, and a school that only servesstudents with identified learning disabilities wereidentified to enhance the representativeness ofsampled private schools.

Data Collection Procedures: Wave 3respondents were private school administratorsand/or administrative assistants responsible forcompletion of the PSS on behalf of the school.Participants were contacted by telephone andappointments were scheduled at a mutuallyconvenient time at the private school site. Interviewslasted approximately forty-five to sixty minutes.Respondents participated in the study voluntarily andwere not be paid for their participation.

Cognitive Interviews: The nine interviewsconsisted of a face-to-face cognitive interview and abrief, customized set of questions geared to thespecialized needs of the participating private school.Respondents also completed the original and revisedversions of Item 15(a)-15(d) and the original and tworevised versions of PSS [16]. Respondents providedtheir evaluation of each version of the PSS items, andwere encouraged to offer suggestions about ways tomaking the PSS more relevant to the specializedneeds of their school. These informal interviewswere designed to obtain additional information aboutthe diverse and varied group of private schools whoparticipated in the study.

Pretesting Results: Issues Associated withCompleting Each PSS Item

15(a): Does this have a religious orientation,purpose, or affiliation?

Response Issues: Pretesting identified the followingpotential sources of response error:I. Respondents generally distinguished the termsreligious "orientation" and "purpose" from"affiliation," which was perceived to refer to a formalrelationship with an established religious group,institution, or denomination, with possibleimplications for school finances, curriculum, hiringpractices, and expectations of students.2. Many respondents had difficulty distinguishingreligious "purpose" from "orientation," which wereperceived to be virtually synonymous, and generallyagreed that a school with a religious orientation orpurpose is not necessarily beholden in any way to areligious institution or group. Those respondentswho distinguished "purpose" from "orientation"reported that a private school's purpose stems fromits orientation or vice versa.3. Some respondents reported that some privateschools have a formal affiliation with a religiousorganization, yet have no actual religious orientationor purpose. These schools were often originallyfounded by an established church, but no longermaintained a religious orientation, purpose, orprogramming. These schools differ from schools thatmaintain a religious affiliation with a religiousorganization or body, and/or a religious orientation orpurpose that is expressed in the school's daily life.4. Experts also identified problems in distinguishingthe three terms "orientation," "purpose," and"affiliation;" however, they were unable to achieveconsensus regarding the best means to improve 15(a).5. Definitional problems affect responses to 15(a),but can also affect responses to 15(b) and 15(c).Respondents who endorse "yes" to 15(a) are directedvia an accompanying arrow to respond to 15(b) and15(c), while respondents who select "no" are directedvia another arrow to "GO to item 16, page 10."Thus, errors in responding to 15(a) can result inresponse errors for Items 15 (b) and 15(c).

Final Modification of Item 15(a): Pretesting resultsverify that the use of all three terms "orientation,"

and "affiliation" is necessary to accountfor the subtle complexities associated with the manytypes of relationships religiously-oriented privateschools have with religious institutions and/orgroups. As presently worded, 15(a) appears to mosteffectively capture the greatest number of correctendorsements. Therefore, no modifications were

15

made to 15(a), and its originally form was retainedfor the PSS field test.

15(b): Is this school/program formally affiliatedwith a national religious denomination?

Response Issues: Pretesting identified the followingpotential sources of response error:1. The term "formally affiliated" was conceptuallydifficult for many respondents. Because someprivate schools are operated independently by asingle church or temple that sponsors the school,many respondents (including Jewish and non-denominational Christian respondents) do notperceive themselves to have a "formal" affiliationwith a national religious denomination, and fail toendorse "yes" to 15(b).2. The terms "national" and "denomination" in thephrase "national religious denomination" providedsources of cognitive confusion for some respondents.Roman Catholic, Jewish, and Islamic respondentswere unclear whether they should endorse "yes" to15(b), because they perceived their religions to beinternational rather than "national" in scope.Similarly, the term "denomination" had littlemeaning for these respondents; many favored the useof a more inclusive term less associated withProtestant mainstream religious groups, such as"organization" or "institution." These issues werealso identified by the experts during Wave 2 testing.

Final Modification of Item 15(b): Item 15(b) wasrevised to address the potential sources of responseerror identified through pretesting. The followingmodifications were made to 15(b) to facilitate therespondent's ability to endorse the appropriateresponse:

Item 15(b) was revised to drop the word"national," which posed difficulties for manyreligiously-oriented respondents Thismodification was approved by all ninerespondents during Wave 3 testing (includingJewish, Roman Catholic, and Islamic).The term "formally" was dropped and replacedwith "affiliated in some way" to account for allpossible types of affiliations private schools mayhave with a religious group and/or a singlereligious institution (e.g., church, temple, etc.).The phrase "religious denomination" wasamended to read "religious denomination ororganization" to include all respondents fromreligions that do not recognize the term"denomination;" this should facilitate therespondent's ability to endorse "yes"appropriately. The term "organization" wasadded because it subsumes both religious groups

and individual and independent religiousinstitutions.15(b) was amended for the field test as follows:

15(b): Is this school/program affiliated insome way with a religiousdenomination or organization?

15(c): What is this school's religious orientationor affiliation?

Response Issues: Item 15(b) contains a list of 20response options representing major religious groupsthat exist in the U.S (e.g., Roman Catholic, Amish,Calvinist, Islamic, Jewish, etc.). "Roman Catholic" isthe first option provided, because the majority ofU.S. private schools are Roman Catholic. FourLutheran options, representing the four major U.S.Lutheran synods., are also provided within the list.These options are labeled by their acronyms, and arenot separated from the other available options. Thefinal option provided is "other;" a blank line is alsoprovided for respondents who choose to write in areligious group not included within the list ofoptions. Pretesting identified the following potentialsources of response error for 15(c):I. Some respondents were concerned that "RomanCatholic" was not placed alphabetically in the list,and hypothesized that its placement as the first optionmight offend some respondents. A few experts alsofavored alphabetizing the list of religions for thisreason.2. Some respondents were concerned that theinclusion of four Lutheran groups within the list ofreligious groups may cause the list to appear to beweighted too heavily in favor of one group.Therefore, for pretesting Waves 2 and 3, the fourLutheran groups were collapsed into a single option"Lutheran." However, four Wave 2 experts indicatedthat all four sub-groupings of "Lutheran" should beretained in the final version of 15(c), because the fourLutheran Schools groups varied substantively inmanagement and philosophy, and these distinctionsshould be duly noted. All four experts agreed itwould be acceptable to place the four Lutherangroups under the greater category heading of"Lutheran."3. Respondents reported no problems with the stemof 15(c) as presented in the current PSS.

Final Modification of Item 15(c): Item 15(c) wasrevised as specified below to address the potentialsources of response error identified throughpretesting:

Because more than one-half of U.S. privateschools are Roman Catholic, it is reasonable,logical, and convenient to place the most

frequently selected option first in a list ofoptions. Three experts representing religiousprivate school associations (only one of whichwas Catholic) agreed it was acceptable to retainthe current placement of "Roman Catholic,"because of large number of Roman Catholicprivate school respondents. Therefore, thecurrent placement of the option "RomanCatholic" was retained for the field test.All four types of Lutheran synods have beenretained within the final version of 15(c) toensure appropriate representation. However,these four options have been subsumed beneaththe greater term "Lutheran" and an arrow hasbeen provided to direct respondents who select"Lutheran" to the list of four Lutheran synods.Based on respondents' recommendations, theoption "Gospel/Full Gospel" was added to thelist of religious groups provided as options. Theoption "Greek Orthodox" was also replaced withthe more inclusive term "Eastern Orthodox."

15(d): Which of the following categories bestdescribes this school?

Parochial (or inter-parochial)DiocesanPrivate

Answer this question only if you marked"Roman Catholic" for question c above.Mark (X) only one box.

Response Issues: This item is designed exclusivelyfor Roman Catholic respondents. Response optionsrepresent Roman Catholic organizational structuresthat frequently maintain, support, or sponsor theRoman Catholic private school. The respondent isdirected to respond to 15(d) by an arrow stemmingfrom the option "Roman Catholic" in 15(c) andextending along the left-hand margin to 15(d).Pretesting identified the following potential sourcesof response error:1. Responses to 15(c) can affect responses to 15(d).Specifically, the arrow stemming from the option"Roman Catholic" in 15(c), which leads therespondent to 15(d) is frequently missed byrespondents. Some respondents do not read thedirections supplied beneath the stem in 15(d) whichindicate 15(d) is designed exclusively for RomanCatholic respondents. Thus, some non-Catholicreligiously-oriented and non-sectarian respondentsselected "parochial" and/or "private" under 15(d),because they simply scanned the page, saw the word"private," interpreted the item as including all privateschool respondents, and endorsed either :parochial"or "private."

16 r.4: 4

Final Modifications of Item 15(d): Item 15(d) wasrevised as specified below to address the potentialsources of response error identified throughpretesting:

Item 15(d) will be eliminated altogether, and theresponse options beneath 15(d) will be moved toa new location beneath the option "RomanCatholic" within 15(c). By collapsing theseoptions, Roman Catholic respondents will beable to proceed directly (by means of an arrow)to the additional set of options "parochial,""diocesan," and "private," after identifyingthemselves as Roman Catholic in 15(c). All nineprivate schools participating in Wave 3 testingagreed that this revision was an appropriate andmore advantageous solution.

16: To which of the following associations ororganizations does this school belong?

Mark (X) all that apply.

Response Issues: PSS [16] requires respondents toreply to the question above using a list of 38 privateschools associations from which to identifyassociations they belong to; respondents also havethe option to select "None of the Above." This list iscontains some informal grouping of types of relatedassociations that are predominately in alphabeticalorder. There are also eleven "Other ... " options,such as "Other Montessori association(s)" and "Otherreligious school association(s)." All 38 associationnames are presented in the same type, consecutively,and without breaks. Pretesting identified thefollowing potential sources of response error:1.. The list of associations was haphazardlyorganized and difficult to read. Some respondentshad difficulty finding their associations.2. "None of the Above" was on the bottom of the listand respondents who wanted to endorse this optionwere required to scan the entire list prior to findingthis option.3. Many respondents only knew the acronym for anassociation rather than the association's full name(e.g., AMS - American Montessori Society). Thismade completing [16] more difficult for somerespondents, who struggled to remember the fullname of their association.4. One respondent recommended a line be added foreach of the "other" options provided within the list ofassociations so that schools could write in the nameof their school association. This recommendationwas predicated on the assumption that private schoolrespondents would like to feel they are a part of arecognized school association. Three educationalexperts favored this modification.

17

5. Some private school respondents identifiedadditional associations to be included within the list.These additional associations were examined toevaluate whether a sufficient number of schoolspertained to the association. Associations meetingthis criteria were added to final list of associationsprovided in [16].

Sorting Task: A sorting task was used to evaluatewhether the list could be divided into major sectionheadings that would facilitate responding to [16].The primary objective of the sorting task was toprovide a categorization schema for the privateschool associations listed in [16] and simplify thetask of completing [16]. The introduction ofassociation category headings was hypothesized tosimplify the task of completing [16] by allowingrespondents to quickly scan the list of associations,and more readily discern where the associations theybelong to are located. Category headings would alsominimize reading required of respondents, decreaserespondents' boredom and inattention, and decreasethe likelihood that association(s) the school belongsto is/are missed by the respondent.

A set of sorting cards was created, eachmade of durable cardboard and labeled with the nameof an association from the PSS list. Sorting cardswere numbered from 1 to 38, and provided torespondents in the order of presentation currentlyused in the PSS form (e.g., 1, 2, 3,..., 38) to moreclosely approximate the actual task of completing[16].

Respondents were asked to sort the cardsinto clusters of associations representing logical andreasonable groupings. Respondents identified amean number of 9.1 cluster names (range 4-14).Respondents exhibited several patterns whenclustering associations. A common pattern was theidentification of very small clusters of only two orthree associations (e.g., bilingual, international, andalternative associations). Four respondents identifiedten or more such cluster names.

Another clustering pattern that occurredfrequently was the grouping of religiously-orientedprivate school associations by religion (e.g.,Christian, Jewish, and Friends); this pattern emergedfrequently from respondents from religiously-oriented private schools. Conversely, non-sectarianschools were more likely to group all religiousassociations together under a neutral and inclusivecategory name such as "Religious Orientation."

Some respondents classified a small numberof specialized associations into clusters such as"Early Childhood," "Special Needs" or "ExceptionalEducation," or clusters with an educational orprogrammatic focus such as "Special Interest" and"Special Emphasis." Finally, several respondents

distinguished religiously-oriented associations fromnon-sectarian associations, which were labeled"Independent School Orientation" and"Other/Independent non-Religious."

The task of developing category headingsrequires that headings be sufficiently inclusive toallow several associations to be grouped within theassociation heading name. However, it is essentialthat the new categorization system be parsimonious,because the inclusion of too many headings can becumbersome and detract from the primary purpose ofsimplifying the list of associations. Therefore, amaximum of three category headings were identifiedto simplify the task of completing [16].

Final Modification of Item 16: Item 16 was revisedas specified below to address the potential sources ofresponse error identified through pretesting:

Three category headings were identified tosimplify the task of completing [16]: 1)

Religious; 2) Special Emphasis; and 3) OtherSchool Associations or Organizations. Teneducational experts unanimously favored thisrevision, because it was easier for private schoolrespondents to grasp. Category headingsprovided the respondent with a structureconducive to ease in responding, and facilitatedvisual acuity so that respondents would be lesslikely to miss their school association. Severalexperts reported respondents would save time bysimply going to the association heading thatcontained the association(s) where they weremembers. Although the term "Other PrivateSchool Associations or Organizations" mightpotentially be offensive to some respondents, themajority of experts did not anticipate this wouldbe a serious difficulty.Four additional associations were added to thelist as recommended by respondents and experts.Acronyms for each association was placed inparentheses in capital boldface letters next toeach association name, so that respondents thatonly recognized the association acronym wouldbe able to quickly identify the association fromthe list. This feature allows respondents to scanthe list of acronyms to find their association, andshould improve the likelihood that respondentscan find their association.A screening item was added at the top of the listto screen out private schools that were notmembers of any school association. Thiscategory would include private schools that 1)have not yet applied to any associations formembership; 2) cannot afford associationmembership dues; and 3) may not join anyprivate school associations for religious or otherreasons. The screening item was worded "This

school/program does NOT belong to anyassociation or organization [Mark X]."An "other" category was provided beneath eachof the major categories, such as "Other SpecialEmphasis associations". This feature wouldallow private schools with memberships ineducational associations not included in thegreater list to provide a response so they wouldnot feel unimportant. "Other" options wereprinted in italics to distinguish them fromassociation names; this feature makes theseoptions more readily visible.Include an "other" option at the bottom of thelist with a line available for respondents to write-in their school association(s) to alleviate anypossible feelings of inferiority that would resultfrom failing to check any options provided in thelist; this feature would also allow NCES toconsult with the association and evaluatewhether it should be included in latter versionsof the PSS.

CONCLUSION

The Private School Survey is used toidentify and categorize private schools in the UnitedStates. The first year of testing of the PSS itemsresulted in the implementation of severalmodifications to the original PSS items. Thesemodifications were identified through suggestionsmade by participating private school respondents andeducational experts during the early Waves of testing.Revisions were refined based upon feedback receivedduring latter waves of testing.

Currently, a field test of the revised PSSitems is underway with a proportional representativesample of approximately 1800 private schoolsthroughout the United States. The results of the fieldtest will be compared to responses generated byrespondents who completed the PSS form during the1996 administration. The comparison group will bedrawn in a manner to ensure its statisticalcomparability with the field test sample. The resultsof field testing are expected to yield a final set ofitems that will be incorporated into the PrivateSchool Survey and used to categorize private schoolsthroughout the United States.

The opinions expressed in this paper are those of theauthors and do not necessarily represent those of theBureau of Labor Statistics or the National Center forEducation Statistics.

AN ANALYSIS OF RESPONSE RATES OF SASS 1993-94

Sameena M. Salvucci, Fan Zhang, Mingxiu Hu, David Monaco, Synectics for Management Decisions, Inc.,Kerry Gruber, National Center for Education Statistics

Fan Zhang, Synectics for Management Decisions, Inc., 3030 Clarendon Blvd., Arlington, VA 22201

Key Words: Complex sample surveys, Unitnonresponse, Hierarchical nonresponse, Multivariatemodeling, Cooperation rates, Sampling frame.1. Introduction

This paper addresses the most pervasive andchallenging source of nonsampling error in estimatesfrom sample surveys which is the error associated withincomplete data. Incomplete data resulting from threesources are of particular importance in sample surveys:item nonresponse, unit nonresponse, andundercoverage.' The concern for nonresponse,whether item or unit, is twofold. Nonresponse reducesthe sample size and thus increases the samplingvariance. Respondents may also differ significantlyfrom nonrespondents, thus, the estimate obtained fromrespondents can be biased and the magnitude of thisbias may be unknown. Concerns about bias aregenerally greater as the rate of nonresponse increases.

The particular focus of this paper is to quantify theextent of unit nonresponse in the 1993-94 Schools andStaff-mg Survey (SASS) conducted by the NationalCenter for Education Statistics (NCES) and to assessthe impact of differences in the known characteristicsof respondents and nonrespondents for differentsubgroups of the survey populations in order to providesome indication of the potential effects of nonresponsebias. The results of this study can be used to furthercontrol and adjust survey estimates for bias, andimprove survey operations. While the scope of thispaper is chiefly descriptive, inferential modeling of theresponse rates is also provided as an example for futureSASS research.2. 1993-94 SASS

The 1993-94 SASS is the third study of public andprivate elementary and secondary schools in a series ofsurveys begun in 1987-88 by NCES. Survey data fromschools, local education agencies (LEAs),administrators, and teachers in the United States werecollected by mail with telephone follow-up ofnonrespondents first during the 1987-88 school yearand again during the 1990-91 and the 1993-94 schoolyears. The series provides data on school and teachercharacteristics, school operations, programs and

'Madow, W., Nisselson, H., and 011dn, I. (1983).Incomplete Data in Sample Surveys, Vol. 1, Report andCase Studies. New York: Academic Press.

policies, teacher demand and supply, and the opinionsand attitudes of teachers and school administratorsabout policies and working conditions. The analyticpower of the data is enhanced by the ability to linksurvey data for individual LEAs, schools,administrators, and teachers. In 1993-94 new library,librarian and student SASS components were initiatedthat could also be linked. In addition, computerassisted telephone interviewing (CATI) facilities wereintroduced for the first time during the 1993-94 SASSand were used extensively for nonresponse follow-up.

The 1993-94 SASS consists of thirteen components:the School Surveys, the School Administrator Surveys,the Teacher Surveys, the Teacher Demand andShortage Survey, the Library Surveys, the LibrarianSurveys, and the Student Record Surveys. Some13,000 schools and administrators, and 67,000 teacherswere selected. In addition, 5,500 local educationagencies associated with the selected schools and 100districts not associated with schools were selected in the1993-94 SASS. Some 7,600 libraries and librarians,and 6,900 student records were also selected. Detailspertaining to the frame, stratification, and sampleselection for each of the survey components arepresented in Abramson et al. (1996).3. Weighted Unit Response Rates

For each survey of SASS, weighted unit responserates were calculated. The weighted response rateswere derived by dividing the sum of the basic weightsfor the interview cases by the sum of the basic weightsfor the eligible cases (the number of sampled casesminus the out-of-scope cases). The basic weight foreach sample case was assigned at the time of samplingas the inverse of the probability of selection.

In the first stage of this study we tested whetherthere is a significant difference between respondentsand nonrespondents for a range of characteristics. Tomake this kind of inference to the underlyingpopulation based on SASS data, the conventionalPearson chi-squared statistic is not appropriateanymore. Tests were performed using a modifiedPearson test statistic called Rao-Scott3 (RS3)2. A

2 Rao, J. and Scott, A. (1981). "The Analysis ofCategorical Data from Complex Sample Surveys: Chi-squared Tests for Goodness of Fit and Independence inTwo-way Tables." Journal of the American Statistical

statistical software package called the WesVarPC® 2.0provides a convenient procedure for this purpose.WesVarPC® not only calculates the weighted responserates, the standard error, the sample size and designeffect, but also provides the Rao-Scott3 statistic thatreflects the complex sample design.

Within the public and private sectors, the results ofthe significance tests are fairly uniform. That is, ifrespondents are significantly (or not significantly )different for a variable for a public sector surveycomponent then they are likely significant (or notsignificant) for other public sector components as well.There are some interesting differences in results of thesignificance tests when compared across the public andprivate survey components. The most striking contrastin results exists for the variable "school sampled in1990-91 SASS". Whether or not a school was surveyedin the 1990-91 SASS proved not to be significant for allof the public school components, while it wassignificant for all of the private school components.

Some interesting patterns arise when responserates are looked at across surveys. Tables 3.1 (publiccomponents) and 3.2 (private components) show thetests results and rankings of response rates for differentlevels of a selected set of variables that are commonacross all surveys. For public components the responserates for minority enrollment and urbanicity show somevery strong patterns. Schools with a minorityenrollment greater than 50.5 percent had the lowestresponse rates for all public components except theStudent Record Component. Furthermore, minorityenrollment showed a significant association withresponse status for all public components except theStudent Record Component. Urbanicity showed a verystrong pattern, with rural/small towns having thehighest response rate, followed by urban fringe/largetowns, and then central cities with the lowest responserates. This pattern was the same for all componentsexcept the Student Record Component. But, as withminority enrollment, urbanicity showed a significantassociation with response status for all publiccomponents except the Student Record Component.For private components the response rates for regionand school size show some patterns. The Midwestregion had the highest response rates for all privatecomponents except the Student Record Component.While response rates for schools with 1 to 149 studentswere always the lowest for all private components

Association, 76: 221-230. Rao, J. and Scott, A. (1984)."On Chi-squared Tests for Multiway ContingencyTables with Cell Proportions Estimated from SurveyData." The Annals of Statistics, 12: 46-60.

20

except the Student Record Component. Similar to thepublic side region and school size showed a significantassociation with response for all private componentsexcept the Student Record Component.

The rankings, when viewed across the public andprivate components, show two variables withsimilarities - school level and urbanicity. For schoollevel, eight of the 12 public and private componentshave secondary schools with the highest response ratefollowed by elementary schools, and then combinedschools. School urbanicity also showed a fairly strongranking pattern, where eight of the 12 public andprivate components have schools in rural/small townswith the highest response rate followed by those inurban fringe/large towns, and then those in centralcities.4. Hierarchical Response Patterns

In the second stage of this study we examined thehierarchical nature of the nonresponse in the 1993-94SASS. The aim was to find out about the jointness ofnonresponse; for example to learn whetheradministrators in responding schools are more or lesslikely to respond than administrators in nonrespondingschools. Specifically, we tested to see if there is asignificant difference in response rates of each of thefollowing types of respondents: (1) public and privateschool administrators, (2) public and private schools,(3) public and private school teachers, (4) public andprivate school libraries, (5) public and private schoollibrarians, and (6) local education agencies (LEAs)when "linked" with the response status of other SASScomponents.

The results indicated that all units in the 1993-94SASS (e.g. administrators that are "linked" within otherunits such as schools) are more likely to respond whenthe "linked" unit responds and in a large number ofcases the difference in response is significant5. Components of Nonresponse/Cooperation Rate

To measure the ability of a survey to establishcontact with sampled units, the reasons for nonresponseare important In the 1993-94 SASS, three categories ofreasons were recorded: 1) refusal, the nonrespondentrefuses to take part in the survey; 2) unable-to-contact,contact with the nonrespondent was not able to be madethrough the nonresponse followup procedures; and 3)other, for example the questionnaire was not returnedor the questionnaire was returned but it was incomplete.

Two teacher components had very high unable-to-contact rates (17.9%). For those instances with highunable-to-contact rates it is sensible to look at acooperation rate, which is the response rate given thecases can be contacted. The cooperation rate is thenumber of interviews divided by the number of eligiblecases contacted: Cooperation Rate = Interview /

n4

(Interview + Refusal + Other). Compare with theResponse Rate = Interview / (Interview + Refusal +Unable to Contact + Other). The advantage of usingthe cooperation rate is that it controls for differencesdue to the unable-to-contact cases. Using thecooperation rate will eliminate the confounding effectassociated with unable-to-contact cases.

To illustrate this confounding effect the significancetests for the teacher components were calculated usingthe cooperation rates since their unable-to-contact ratesare the highest of all the components and the differencebetween their response and cooperation rates wereamong the highest (see Tables 5.1 and 5.2 below)

Table 5.1 Weighted response and cooperation rates:Public School Teacher Component (Rates in percent)Variable Response Rate Cooperation RateSchool TypeRegularNon-regular

88.2686.25

89.2588.42

Table 5.2 Weighted response and cooperation rates:Private School Teacher Component (Rates in percent).Variable Response Rate Cooperation Rate

UrbanicityRural/Small town 83.10 85.38Urban/large town 80.41 82.48Central City 78.79 82.64New TeacherYes 81.02 85.16No 80.05 82.76

The tests results for the public teacher componentindicated that using cooperation rates the variableschool type was not significant anymore (see table 5.3).The reason for this is that the low response rates for thenon-regular schools is due to a higher unable-to-contactrate than regular schools.

Table 5.3 P-value of the Independence Test: PublicSchool Teacher ComponentVariable P-value based on P-value based on

Response Rate Cooperation RateSchool Type 0.0172 0.1092

For the private school component, the lower responserate in central cities is due to a high unable-to-contactrate. After adjusting for this, by removing the unable-to-contact cases, urbanicity is not significant (see table5.4). On the another hand, there is a high unable-to-contact rate for new teachers and that caused a lowresponse rate for the new teachers. After the unable-to-contact cases are removed, the new teachers have asignificantly higher cooperation rate than the others andthe variable new teacher becomes significant.

21

Table 5.4 P-value of the Independence Test: PrivateSchool Teacher Component.Variable P-value base on P-value based on

Response Rate Cooperation RateUrbanicity 0.0094 0.0923New Teacher 0.3206 0.0107

6. Multivariate Model (Public School Component)In the last stage of our study we assessed the

multivariate adjusted effects (on the response rates) ofthe significant variables which were identified in thefirst stage of our study (see section 3). We fittedmultivariate logistic regression models. In this sectionthree issues will be discussed: model selection, modelinterpretation, and comparisons of univariateunadjusted results with multivariate adjusted results.

In our model selection, we began with the followingpotential variables which were considered in theunivariate analysis: urbanicity, region, school level,school size, school type, minority enrollment, sampledwith certainty, submitted a teacher list, source, andsampled in the 1990-91 SASS. Variable submitted ateacher list, which has the most significant effect on theschool nonresponse, is eliminated from the multivariatemodel due to interpretation difficulties. This variable ismore like a questionnaire variable rather than a designvariable in terms of the time we observe the variable.We can not use it to predict the probability of schoolresponse. The variables sampled in the 1990-91 SASSand sampled with certainty are dropped from the modeldue to their ignorable contribution to the model.

The software package WesVarPC® was used to fitthe multivariate logistic regression model with theseven selected independent variables as well as sevenseparate univariate logistic regression models for thosevariables. Table 6.1 presents a comparison of the p-values for the Rao-Scott, univariate logistic regressionmodel, and multivariate logistic regression model tests.

It is noted that p-values for the Rao-Scott test (Raoand Scott, 1984, or RS3 in WesVarPC output) and theunivariate logistic regression are pretty close. The onlysignificant difference between these two tests is forvariables source and school type, but their p-values arestill comparable. If we test the hypotheses at 0.01level, both tests will reach the same conclusion ofsignificance.

However, the multivariate logistic model test resultsare very different from Rao-Scott test results and theunivariate logistic model test results, especially forvariables minority enrollment, urbanicity, region, andschool type. Variables minority enrollment, urbanicity,and region are highly significant in the Rao-Scott testsand the univariate logistic regression model, but theyare not significant at all, with high p-values of 0.3936,

29

0.1016 and 0.1115, respectively, in the multivariatelogistic regression model when we adjust for othervariables simultaneously. This happens because thereexists an antagonism effect on these variables. We musttake into account of this antagonism effect when weinterpret the effects of minority enrollment, urbanicityand region. The significant effects of these threevariables shown in the univariate analysis are simplycaused by imbalance of the other significant variablesamong these three variables. On the other hand, thereexists an synergysm effect on variable school type. Inthe multivariate logistic regression model, school typeis significant (at 0.01 level) with a p-value of 0.0047,but it is not significant in the univariate logisticregression model or by Rao-Scott test with p-values of0.0397 and 0.0719, respectively. That means that someinformation about school nonresponse provided byschool type is covered by the noise of other factors.We must retrieve that part of information by adjustingto other factors simultaneously through a multivariatemodel.

For the other three variables, school size, schoollevel and source, it seems that there is neither anantagonism effect nor a synergism effect. Theunivariate results are almost identical to themultivariate results for those variables. The four-levelvariable school size is the most significant variable forexplaining the variation of school nonresponse.

The entropy for the multivariate model is 2.13%.As pointed out by the documentation for WesVarPC®,this entropy may not be appropriate to measure thestrength of the association.

Table 6.2 presents the parameter estimates, standarderrors, odds ratios and p-values of the tests for thedummy variables which represent the independentvariables in the multivariate logistic regression model.The parameter estimates and odds ratios describe thenature of the association between the schoolnonresponse and the selected independent factors.

We found that the response rate of a rural/smalltown school is barely significantly higher than a centralcity school with an odds ratio of 1.377, although theoverall factor urbanicity is not significant with a p-value of 0.1016. School level, school size, and schooltype are all significant factors for the schoolnonresponse. A combined school and an elementaryschool are both less likely to respond to the survey thana secondary school; a smaller school is more likely torespond than a larger school. The odds ratio comparinga school with enrollment between 1 and 149 studentsand a school with an enrollment of 750 or morestudents is 2.316; non-regular school is less likely torespond than a regular school with an odds ratio aboutone-half. However, Minority enrollment, region and

source has no significant effect on the schoolnonresponse.

We also fit a reduced model which eliminates alldummy variables that are not significant at 0.1 level inthe full model. The results of the reduced model forschool level, school size, school type and source arealmost identical to those in the full model presented inTable 8.2. However, in the reduced model, urbanicity ishighly significant with P-value of 0.0063, but this p-value is for the testing the difference of the schoolresponse rate between rural/small town schools and allother schools. Similarly, the test to compare theMidwest vs the other three regions is barely significantat 0.05 level.

In summary, we find that school size, school level,and school type are the only three factors which have asignificant effect on school nonresponse. Neither thethree-level variable urbanicity or the four-level regionvariable have significant overall effects, but arural/small town school has a significantly higherprobability to respond than an urban fringe/large townor a central city school, and the Midwest hassignificantly higher response rate than other regions.Minority enrollment, which is highly significant in theunivariate model, is not significant at all in themultivariate model. The sample frame source "CCDupdate" is a little better than other three sources (closeto significant), but the other three sources are notsignificantly different at all.7. Conclusions

Results of assessing the differences in knowncharacteristics of respondents and nonrespondents fordifferent subgroups of the sampled populationsindicated that patterns of nonresponse amongcharacteristics such as region, urbanicity, school level,and school size persisted from the 1990-91 surveyround to the current round. For example, response ratesfor rural/small town public schools were the highestand response rates for central city schools were thelowest in both 1990-91 and 1993-94. In addition, a setof characteristics including some of those mentionedabove, whether a school submitted a teacher list, andminority enrollment in a school were shown to havesignificant differences between respondents andnonrespondents.

One of the more striking results of our analysispertain to the examination of whether response patternsin a survey component are hierarchically associatedwith response patterns of linked components. Ouranalysis showed response rates were higher amonglinked responding units versus linked nonrespondingunits. For example, response rates for LEAs werehigher for those LEAs linked with responding schoolsversus those linked with nonresponding schools.

22 3

Response rate components (e.g. out-of-scope rates,refusals, non-locatables, etc.) were examined in anattempt to provide tools to monitor the quality of theSASS frame and the corresponding 1993-94 SASSsurvey statistics. Out-of-scopes rates for the publiccomponents were lower than their private counterparts.In addition, out-of-scope rates for the private schoollibrary component, and both the public and privateschool librarian components were quite high.Cooperation rates for components with high non-locatable rates, such as the teacher component, werecalculated and tested. In most cases, for the teachersurvey, the significance results were different forcooperation rates versus response rates indicating thatthe unable to contact cases had a confounding effect onthe results of these significance tests.

Finally, the results of fitting a multivariate logisticregression nonresponse model for the public schoolcomponent were compared to the univariate levelsignificance results. School size, school level, andschool type were the three factors shown to jointly havea significant effect on school nonresponse. Theseresults show that the effects of some variables on theresponse status can be explained by the other variables

hence a reduced model is preferable. The model resultscan be used to adjust weights for nonresponse.

ReferencesAbramson, R., Cole, C., Jackson, B., Panner, R., and

Kaufman, S. (1996), "1993-94 Schools andStaffing Survey: Sample Design andEstimation," technical report, National Centerfor Education Statistics, Washington, DC.

Jabine, Thomas B. (1994), "Quality Profile for SASS",NCES 94-340, National Center for EducationStatistics, Washington, DC.

Rao, J. and Scott, A. (1984), "On chi-squared tests formultiway contingency tables with cellproportions estimated from survey data," TheAnnals of Statistics, 12, 46-60.

Scheuren, F., Monaco, D., Zhang F., Ikosi, G., Chang,M., and Gruber, K. (1996), "An ExploratoryAnalysis of Response Rates in the 1990-91Schools and Staffing Survey," forthcomingreport, National Center for EducationStatistics, Washington, DC.

Westat, Inc. (1996), "A User's Guide to WesVar PC®,"Westat Inc., 1650 Research Boulevard,Rockville, MD 20850.

Table 3.1 Public component response rate ranks: Schools and Staffing Survey 1993-94, Public Administrator, School, Teacher,Library, Librarian, and Student Components.Component Administrator School Teacher Library Librarian Student

Variable

Minority Enrollment (test result) S S S S S NSLess than 5.5% 1 1 1 2 3 1

5.5 - 20.5% 2 2 2 1 1 420.5 - 50.5% 3 3 3 3 2 2Greater than 50.5% 4 4 4 4 4 3Region (test result) S S S NS NS SMidwest 1 1 1 2 2 1

Northeast 3 4 4 3 1 3South 2 2 2 1 3 2West 4 3 3 4 4 4School Level (test result) NS S NS S S NSElementary 3 2 2 2 2 2Secondary 2 1 1 1 1 3Combined 1 3 3 3 3 1

School Size (test result) S S S S NS NS1 to 149 1 1 1 4 4 1

150 to 499 2 2 2 3 2 3500 to 749 3 3 3 1 3 2750 or more 4 4 4 2 1 4Urbanicity (test result) S S S S S NSRural/small town 1 1 1 1 1 2Urban fringe/large town 2 2 2 2 2 3Central City 3 3 3 3 3 1

"S" indicates a significant association between respondents and nonrespondents for the different levels of the variable."NS" indicates that there is not a significant association between respondents and nonrespondents for the different levels of the variable.

23 BEST COPY AVNLARE

Table 3.2 - Private component response rate ranks: Schools and Staffing Survey 1993-94, Private Administrator, School, Teacher,Library, Librarian, and Student Components.Component Administrator School Teacher Library Librarian Student

Variable

Region (test result) S S S S S NSMidwest 1 1 1 1 1 2Northeast 3 3 3 4 4 3

South 4 2 2 2 3 4West 2 4 4 3 2 1

School Level (test result) S S S S S NSElementary 2 1 2 2 2 1

Secondary 1 1 1 1 1 2Combined 3 3 3 3 3 3School Size (test result) S S S S S NS1 to 149 4 4 4 4 4 3150 to 499 3 1 2 3 3 2500 to 749 2 3 1 2 1 4750 or more 1 2 3 1 2 1

Urbanicity (test result) NS NS S S NS NSRural/small town 3 1 1 1 3 2Urban fringe/large town 2 2 2 2 2 3Central City 1 3 3 3 1 1

"S" indicates a significant association between respondents and nonrespondents for the different levels of the variable."NS" indicates that there is not a significant association between respondents and nonrespondents for the different levels of the variable.

Table 6.1 - P-values for Rao-Scott, univariate logistic regression model, and multivariate logistic regression model tests (Public SchoolVariable Rao-Scott (RS3) Univariate Model Multivariate ModelUrbanicity 0.0001 0.0001 0.1016Region 0.0030 0.0109 0.1115Minority Enrollment 0.0002 0.0002 0.3936Source 0.0175 0.0605 0.0746School Level 0.0100 0.0119 0.0116School Size 0.0000 0.0001 0.0001School Type 0.0719 0.0397 0.0047

Table 6.2 - Parameter Estimate, Odds Ratio and P-value: Public School.

Pairwise ComparisonParameterEstimate

StandardError

OddsRatio P-value

UrbanicityRural/small town vs Central City 0.32 0.154 1.377 0.0410Urban fringe/large town vs Central city 0.07 0.159 1.073 0.6600RegionMidwest vs West 0.29 0.162 1.336 0.0844Northeast vs West -0.12 0.179 0.887 0.4928South vs West 0.18 0.136 1.197 0.2019Minority EnrollmentLess than 5.5% vs Greater than 20.5% 0.16 0.136 1.174 0.24585.5-20.5% vs Greater than 20.5% 0.11 0.119 1.116 0.3615SourceCCD update vs others 0.46 0.254 1.584 0.0746School LevelCombined vs Secondary -0.36 0.156 0.698 0.0268Elementary vs Secondary -0.23 0.089 0.795 0.0140School Size1 to 149 vs 750 or more 0.84 0.156 2.316 0.0000150 to 499 vs 750 or more 0.40 0.120 1.492 0.0015500 to 749 vs 750 or more 0.30 0.152 1.350 0.0543School TypeNon-regular vs Regular -0.68 0.229 0.507 0.0047

BEST COPY AVAILABLE 24 32

AN OVERVIEW OF NCES SURVEYS REINTERVIEW PROGRAMS

Valerie Conley, Steven Fink, Mehrdad Saba, Synectics for Management Decisions Inc.,Steven Kaufman, National Center for Education Statistics

Valerie Conley, Synectics for Management Decisions, Inc., 3030 Clarendon Blvd #305, Arlington VA 22201

KEY WORDS: Reinterview Programs, SimpleResponse Variance, Gross Difference Rate, Index ofInconsistency, Response Bias

1. Introduction

The National Center for Education Statistics (NCES)conducts a variety of programs to assess the quality ofthe data it collects in its surveys. Traditionally, theemphasis has been to estimate nonsampling errorcomponents in a survey model: reinterview programsand validity evaluations are part of the overall surveydesign for most of its complex sample surveys. Thispaper highlights the NCES application of thereinterview while serving also as an overview of thetechniques and methods that quantify measurementerror which are used in NCES data quality assessment.

2. Programs Purpose and Background

A reinterview -- replicated measurement on the sameunit -- is a new interview which repeats all or a subsetof the original interview questions. At the 1991American Statistical Association (ASA) meeting, theNational Agricultural Statistics Service (NASS)presented a paper which traced the history ofreinterview studies at NASS. The authors concludedthat "An important product of reinterview surveys hasbeen the identification of reasons for reporting errors.These include definitional problems, misinterpretationof questions and survey concepts, and simple reportingerrors. Such cognitive information obtained fromreinterviews has been valuable in survey instrumentdevelopment, training, and the interpretation of surveyresults" (Hanuschak et al., 1991).

The purpose of NCES reinterview programs is:1) to determine how good questions are with respect to

a measurement of the response error and 2) to assessthe quality of the data collected. The extent of theresearch effort varies across surveys from small samplereinterview programs conducted as part of a surveyquestionnaire field test to larger samples that rangebetween 1 and 11 percent of a full-scale study. Theseprograms have been used for three major purposes:

25

Identing specific questions that may beproblematic for respondents and result in highvariabilityQuantifring the magnitude of the measurementerrorProviding feedback on the design of questionnaireitems for future surveys

Specifically, the purpose of the reinterview is to gaininsight into the adequacy of questions. This gain can beachieved analytically by measuring two components ofsurvey response -- response variance and response bias.These two measures are explained in more detail insection 5.

Another common purpose of reinterview programs is toverify that the original interviews were genuine. NCESoften uses a combination of mail and ComputerAssisted Telephone Interview (CATI) for its surveys.Most of the NCES reinterview programs were doneusing Computer Assisted Telephone Interview (CATI)in a centralized setting. Since the CATI interviews wereclosely monitored, it is highly unlikely that a telephoneinterviewer could invent or falsify interviews.Therefore, this aspect of reinterview is not typically afocus of NCES reinterview programs.

3. Surveys and Reinterview Design

Several NCES surveys have conducted reinterviewprograms for more than one round or cycle of thesurvey, specifically Baccalaureate and Beyond (B&B),Beginning Postsecondary Students (BPS), NationalHousehold Education Survey (NHES), Schools andStaffing Survey (SASS), and Teacher Follow-upSurvey (TFS). Most of the programs do not include thesame items on subsequent rounds of the reinterview,however. The BPS reinterview programs, for example,are designed "to build on previous analyses bytargeting revised or new items not previouslyevaluated" (Burkheimer et al. 1992). Most of the NCESreinterview programs were developed to estimateresponse variance, but some, such as the AdultEducation component of NHES:95, included a responsebias study as well. However, the surveys that involved

testing, such as the National Adult Literacy Survey(NALS), the National Assessment of EducationalProgress (NAEP), and the National EducationalLongitudinal Study (NELS) never considered retestingor reinterviewing because NCES considered suchactivities too much of a burden on the respondents.

The following issues are considered in NCESreinterview designs:

Time Lag

Time lag between the original and the reinterview formost of NCES surveys is usually stated as a range ofdays or weeks following the original survey such as"the reinterviews were conducted in October andNovember, about 4 to 6 weeks after the originalinterview" (Brick, Cahalan et al., 1994, p. 3-3). EarlyChildhood Education (ECE) reinterviews for NHESwere designed for 14 days after the completion of theoriginal ECE interview, but they were actuallycompleted between 14 and 20 days after the originalinterview. BPS reinterviews were conducted up to 8weeks after completing the original interview, andreinterviews in NPSAS were conducted between oneand three months after the original interview.

Reinterview Instrument

The reinterview instrument is a subset of the originalquestionnaire, but the question wording is almostalways identical between the original and thereinterview instrument. In some cases, however,adjustments were made to the question wording in anattempt to gain more reliable data such as NationalSurvey of Postsecondary Faculty (NSOPF). This ismost often the case if the reinterview is conducted aspart of the field test and not as part of the full-scalestudy. Other case examples are National PostsecondaryStudent Aid Study (NPSAS) and B&B.

Mode

The mode of reinterviews is usually telephoneregardless of the original interview mode. Conductingall the reinterviews by telephone violates survey errormodel assumptions that require the reinterview to be anindependent replication of the original interview inorder to estimate response variance accurately.Therefore, SASS included research in its 1991reinterview program to determine the impact modechange might have on data quality. Most of the mailrespondents were reinterviewed by mail and thetelephone follow-up cases were reinterviewed by

telephone. Generally reinterviews conducted by mailshowed relatively lower response variance than thetelephone reinterviews (Royce, 1994).

4. Sample Size and Response Rates

The design of NCES reinterview programs typicallyincludes a target number or percentage of completedreinterviews. The reinterview sample size for RCG:91,for example, was 583 with a goal of 500 completedreinterviews. SASS reinterviews 10 percent of theSchool and Administrator samples and one percent ofthe Teacher sample to have a reinterview sample of justover 1,000 for each of its components. The reinterviewsample sizes are considerably smaller when thereinterview program is conducted as part of the fieldtest. The reinterview sample size for the NPSAS 1992-93 field test, for example, was 237.

NCES reinterview response rates vary from a low of 51percent to a high of 94 percent with most reinterviewprograms having response rates in the mid-eighties tolow nineties.

5. Measurement Error Estimation

Reinterview programs at NCES tend to measure theresponse variance and response bias using simplemeasures of consistency. Response variance is acomponent of measurement error which examines howconsistently respondents answer questions in a survey.Response bias, on the other hand, measures thesystematic nonsampling errors. In order to estimateresponse variance and response bias, it is necessary todefine a general measurement error model:

Let Ytl

= 1 if Yes is recorded for unit t in the original

interview and Y = 0 otherwise.tl

Let Y = 1 if Yes is recorded for unit t in thet2

reinterview and Y = 0 otherwise.t2

For unit t (t = 1,2,...,n) and the ith measurement (i =1,2), the assumed model is:

Y =X +B +eti t ti ti

Where Yti is a Bernoulli random variable, Xt is the"true" value of unit t, assumed unchanged betweenmeasurements, Eiti is the response bias, and efi is arandom measurement error.

2634

To illustrate the model consider the crossclassificationof two measurements of an individual populationcharacteristic (for example, whether a person whoparticipates is a college graduate) obtained from anoriginal interview and a reinterview of the same sampleof individuals. Table 1 shows the crossclassification(Brick, Cahalan et al., 1994):

Tablel: Two measurement crossclassificationReinterview Original Interview

Response Yes No Total

Yes a b a + b

No c d c + d

Total a+c b+d n=a+b+c+d

In their simplest form, reinterview results are analyzedusing measurements derived from thiscrossclassification. These measures include the overalldeviation between the interview and the reinterview,deviations on individual responses, and the index ofcrude agreement. The three specific measures mostcommonly used by NCES are:

Gross Difference Rate (GDR)-Measures the weighted percentage of casesreported differently in the original and thereinterview as

(b + c) / 2n

Index of Inconsistency (I0I)-Estimates the proportion of total surveyvariance due to simple response variance as

n( b + c) / 2(a + c)(b + d)-Assumes simple random sampling withreplacement

Net Difference Rate (NDR)-Computed after reconciliation for eachanswer category of a question-Weighted difference of the false positive andfalse negative rates calculated as

(c - b)/n

Typically, the first two measures, GDR and ICI, areused to estimate simple response variance. The thirdmeasure, NDR, is used to estimate response bias, whichmost measurement models at NCES assume to beconstant in the repeated measurement. NDR is alsoused to test the independence of the reinterview.

27

6. NCES Reinterview Results

This section will present a summary of selected resultsfrom reinterview programs conducted by NCES. Onlyresults from reinterview programs conducted as part offull-scale studies are provided here. These studiesinclude NHES, RCG, and SASS.

National Household Education Survey (NHES)

The National Household Education Survey (NHES) isdesigned to collect education data from U.S.households through telephone interviews, using randomdigit dialing (RDD) and computer-assisted telephoneinterviewing (CATI) procedures. The sample is drawnfrom the noninstitutionalized civilian population inhouseholds having a telephone in the 50 states and theDistrict of Columbia.

NCES has conducted four comprehensive reinterviewprograms for the full-scale NHES surveys. Thereinterview program for NHES:91 was administeredonly on the early childhood (ECE) component. InNHES:93 both components underwent reinterviews,while only the adult education component wasreinterviewed for NHES:95.

All NHES reinterview programs have used grossdifference rate (GDR) and net difference rate (NDR), andall except NHES:95 used index of inconsistency (IOI), asmeasures of response variability and response bias forcritical items in the surveys.

The NHES:91 reinterview results suggested that the ECEinterview measured some variables with relative success, butit also revealed some items needed to be handled carefullywhen tabulating findings and for which alternative methodsof collection should be considered (Brick et al., 1991).

The early childhood component of NHES:91 reinterviewprogram included questions on current enrollment (whetherthe child was attending school and, if so, what grade) andhome environment (reading and television habits). All of theseven enrollment items had low GDRS and IOIs. Of the fourhome environment variables there were two worth noting:

P19/E36: How often do you or other familymembers read stories to (child)?P22/E40: How many hours each day does(child) watch television or videotapes?

Brick et al. (1991) suggested the relatively large IOI(42.0) for the television question might be due to the"general ambiguity in the item, the crude measurementscale (whole hours) relative to the internal variability inthe item, and differing circumstances" (p. D-20). Thereading question also had a relatively large GDR (23.3)and RN (33.5).

The two topical components of NHES:93 were: 1) theSchool Readiness (SR) interview of parents of children(ages 3-7 and 8-10) enrolled in second grade or belowand 2) the School Safety and Discipline (SS&D)interview of parents of students enrolled in grades 3-12and youths enrolled in grades 6 through 12. The subsetof the original SR and SS&D questionnaire items chosenwere selected because they were substantively important,not highly time dependent, and not examined in theNHES:91 reinterview. The reinterview sample sizes weresubstantially increased from the 604 of the NHES:91reinterview program to 977 for SR and 1,131 for SS&Din order to obtain more reliable estimates of the responsevariance for key questions. The reinterview did not revealany items with response problems that were severeenough to cause researchers to question analysis based onthe item.

The NHES:95 reinterview program examined andestimated measurement errors as components ofnonsampling error in the Adult Education (AE) survey. Asubset of items from the original interview were selectedand the original and reinterview responses were thencompared to estimate the consistency of reporting.Interviews were sampled at different rates for participantsand nonparticipants (i.e., people who did not participatein adult education activities), with a total of 1,289 casesselected for reinterview.

The GDRs for the NHES:95 reinterview programs werelow for the adult education participation and theeducation background items, indicating that responses tothose questions were consistent. The GDRs for bather toparticipation items (such as obstacles that preventedrespondents from adult education activities) were muchhigher than for the other subject areas, indicating thatresponses were not consistent. Only four (out of 15)barrier items had GDRs of less than 10 percent, and thehighest GDR approached 50 percent. This inconsistencymay have been related to factors like recoding thequestions, additional eligibility criteria, and small samplesizes. Nonetheless, barrier items had some responseproblems and did not appear to be reliable.

NCES also conducted a separate study to measure biasfor NHES:95. The methodology used for this surveyappeared to have potential for detecting biases;however, this method -- intensive interviews -- was notas successful as the standard NHES reinterviews forestimating consistency of reporting.

Recent College Graduates (RCG)

The 1991 Survey of Recent College Graduates(RCG:91) provides data on the occupational andeducational outcomes of bachelor's degree andmaster's degree recipients one year after graduation.Telephone interviews were conducted between July1991 and December 1991 using computer-assistedtelephone interviewing (CATI).

Two measurement error models were estimated fromthe reinterview data. The first model (the simpleresponse variance model) assumed the errors were allfrom random sources. This model was then expandedto allow for systematic errors or biases. Both modelsassumed that the interviewers were not a source ofsystematic error in the data collection process, but thefirst assumed that the measurement errors were thesame across sampled graduates and from one trial to thenext. Thus, if the reinterview was uncorrelated with theoriginal interview, then the number of original andreinterview errors should be roughly equal.

Of the 16 reinterview items in these categories, onlytwo had an IOI greater than fifty percent. One item wasrelated to employment experience, while the other wasa question dealing specifically with teacher certificationand employment:

Q24: Were you looking for work during theweek of April 22, 1991?

Q62: Prior to completing the requirements foryour 1989-90 degree, were you at anytime employed as a school teacher atany grade level, from prekindergartenthrough grade 12? Please excludestudent or practice teaching and work asa teacher's aide.

Question 24 was asked only of a subset of the sampleof graduates -- those who were unemployed. Thereduced sample size may have contributed to a largerGDR and IOI. There were also potential recallproblems since the question referred to a specific

28 36

period of time. For question 62, no explanation wasoffered for the possibly high random measurementerror.

The overall conclusions of Brick, Cahalan et al. werethat even though measurement errors were an importantsource of error in RCG:91, the estimates from thesurvey were not greatly distorted by these errors. Therelatively small GDRs indicated responses wereconsistent; however, the IOIs being generally moderateimplied that improvements in questionnaire wordingand construction might help to reduce measurementerrors in future surveys.

Schools and Staffing Survey (SASS)

The Schools and Staffing Survey (SASS) is a periodic,integrated system of surveys designed to collect data oncharacteristics of public and private school teachers,administrators, and their workplaces. The first tworounds of SASS (1987-88, and 1990-91) included theSchool Survey, the School Administrator Survey, theTeacher Demand and Shortage Survey (TDS), theTeacher Survey, and, one year later each time, theTeacher Follow-up Survey. SASS includes reinterviewprograms as part of its survey design, although it hasused other methodologies for measuring response

variance and response bias as well, including validationstudies, such as the Teacher Listing Validation Study(TLVS), and follow-up cognitive research.The following part of this paper summarizes overallresults and comparison of the 1987-88 and 1990-91reinterviews. There was no difference in responsevariance between public and private administrators,schools, or teachers.

Thirty-nine percent of the 1990-91 SASS reinterviewquestions showed low response variance. This wassignificantly better than the 11 percent of reinterviewquestions for SASS 1987-88 with low responsevariance (see table 2). Moreover, there was a 23percentage point difference between 1990-91 and1987-88 SASS items with a high response variance (26percent versus 49 percent) (Royce, 1994).

It is important to note that the results across 1987-88and 1990-91 are not strictly comparable. Different setsof questions were used for the two interviews. Amongthe 15 factual questions common to both years, 11showed significant revisions in 1991. Four of theseitems displayed reduced response variance, whichindicates improvement in these questions (Bushery,Royce and Kasprzyk, 1992, p. 459).

Table 2. Summary of 1987-88 and 1990-91 SASS reinterview response variance results*

Low Moderate HighNumber Percent Number Percent Number Percent

All three components:1988 4 11% 14 40% 17 49%1991 43 39% 38 35% 28 26%

Administrator Survey:1988 1 11% 4 44% 4 44%1991 5 20% 10 40% 10 40%

Teacher Survey:1988 3 25% 4 33% 5 42%1991 21 44% 16 33% 11 23%

School Survey:1988 0 0% 6 43% 8 57%1991 17 47% 12 33% 7 19%

*Questions for which index could be reliably estimated.

SOURCE: Royce, D. (1994), 1991 Schools and Staffing Survey (SASS) Reinterview Response Variance Report,(working paper 94-03), U.S. Department of Education, Office of Educational Research and Improvement, NationalCenter for Education Statistics.

29

37

7. Conclusion

This study is part of the adjudicated report on "NCESMEASUREMENT ERROR PROGRAMS" which isscheduled for publication in the spring of 1997. Thisreport synthesizes results from a sample of NCESreinterview programs, validity evaluations, andcognitive research studies.

Results indicate that the measurement error programshave helped NCES to improve the quality of its data.Over different rounds of surveys, the reinterviewsample sizes have increased and response variance inmost of the surveys in different areas have beenimproved. In some surveys it was found thatinconsistencies between responses were attributed tofactors like recoding the questions, lack of knowledgeabout the questions, eligibility criteria and small samplesizes. More recently, NCES applied alternative methodsseparately from reinterview programs to measureresponse bias. These methods were effective but costly.It was suggested that these methods be used when thereis an indication of reporting errors.

Although, overall studies indicated that questions hadlow to moderate GDRs and low to moderate IOIs inseveral different NCES surveys, improvements inquestionnaire wording and construction might help toreduce measurement errors in future surveys.

References

Brick, J., Cahalan, M., Gray, L., Severynse, J. andStowe, P. 1994. A Study of Selected NonsamplingErrors in the 1991 Survey of Recent College Graduates.(NCES 95-640). Washington, DC: U.S. Department ofEducation. Office of Educational Research andImprovement. National Center for Education Statistics.

Brick, J., Collins, M., Celebuski, C. Nolin, M.,Squadere, T., Ha, P., Wernimont, J., West, J., Chandler,K., Hausken, E. and Owings, J. 1991. NHES:91 1991National Household Education Survey MethodologyReport. (NCES 92-019). Washington, DC: U.S.Department of Education. Office of EducationalResearch and Improvement. National Center forEducation Statistics.

Brick, J., Collins, M., Nolin, M., Ha, P., Levinsohn, M.and Chandler, K. 1994a. National HouseholdEducation Survey of 1993: School Safety andDiscipline Data File User's Manual. (NCES 94-218).

30

Washington, D.C.: U.S. Department of Education.Office of Educational Research and Improvement.National Center for Education Statistics.

Brick, J., Collins, M., Nolin, M., Ha, P., Levinsohn, M.and Chandler, K. 1994b. National HouseholdEducation Survey of 1993: School Readiness Data FileUser's Manual. (NCES 94-193). Washington, D.C.:U.S. Department of Education. Office of EducationalResearch and Improvement. National Center forEducation Statistics.

Burkheimer, G., Forsyth, B., Wheeless, S., Mowbray,K., Boehnlein, L., Knight, S., Veith, K. and Knepper,P. 1992. Beginning Postsecondary StudentsLongitudinal Study Field Test Methodology Report.BPS:90/92. (NCES 92-160). Washington, DC: U.S.Department of Education. Office of EducationalResearch and Improvement. National Center forEducation Statistics.

Bushery, J., Royce, D. and Kasprzyk, D. 1992. TheSchools and Staffing Survey: How ReinterviewMeasures Data Quality. Proceedings of the Section onSurvey Research Methods, American StatisticalAssociation. Alexandria, VA: ASA, 458-463.

Forsman, G. and Schreiner, I. 1991. The design andanalysis of reinterview: An overview. In P. Biemer, R.Groves, L. Lyberg, N. Mathiewetz, and S. Sudman(eds.), Measurement Error in Surveys. 279-302. NewYork: John Wiley & Sons, Inc.

Hanuschak, G., Atkinson D., Iwig, W. and Tolemeo, V.1991. History of Reinterview Studies at NASS.Proceedings of the section on Survey ResearchMethods. American Statistical Association.

Kasprzyk, D. 1992. The Schools and Staffing Survey:Research Issues. Proceedings of the section on SurveyResearch Methods. American Statistical Association.

Royce, D. 1994. 1991 Schools and Staffing Survey(SASS) Reinterview Response Variance Report.(Working Paper 94-03). Washington, D.C.: U.S.Department of Education. Office of EducationalResearch and Improvement. National Center forEducation Statistics.

88

ESTIMATING RESPONSE BIAS IN AN ADULT EDUCATION SURVEY

J. Michael Brick and David Morganstein, Westat, Inc.David Morganstein, Westat, Inc., 1650 Research Boulevard, Rockville, Maryland 20850

KEY WORDS: Reinterview, telephone survey,measurement errors

1. IntroductionEstimates from surveys are subject to both

variable and systematic nonsampling errors. Variablenonsampling errors, or response variance, are those thatmight vary across repeated surveys administered to thesame sample, assuming that the conditions of theinterview could be controlled so that the surveys wereindependent. For example, the same respondent mightreport annual income differently when asked inrepetitions of the same survey because the method usedby the respondent to estimate income might vary(records might be used, recall might be used, or thevalue might be estimated using different schemes).These circumstances would lead to variable errors forestimates of income.

Systematic nonsampling errors, on the otherhand, are those that have a particular direction. Forexample, if respondents tend to omit certain types ofincome, say interest income from savings, then theestimated income would be expected to be lower thanthe true income. In repetitions of the same survey, theestimated income would always be less than the trueincome. These types of systematic errors are calledresponse bias. Survey estimates can be subject to bothresponse variance and response bias.

Measuring response bias is typically verydifficult. This study examines an intensive reinterviewas a particular approach to estimating response bias.Other approaches for measuring response bias, thereasons for using an intensive reinterview, and the goalsof the study are presented in the next section, afterdescribing the source of the data. Section 3 outlines themethods used to collect the intensive reinterview data.Section 4 gives the estimates of the response bias andpossible explanations of the findings. The last sectionsummarizes the highlights of the study and theapplicability of the method to other surveys. A morecomplete analysis of this study is given in Brick et al.(1996a).

2. Study DesignThe source of the data for this analysis is a

special methodological study undertaken as part of the1995 National Household Education Survey(NHES:95). The NHES is an ongoing data collection

system of the National Center for Education Statistics(NCES) conducted by Westat, Inc. designed to address awide range of education-related issues. It is a telephonesurvey of the noninstitutionalized civilian population ofthe US that has been conducted in 1991, 1993, 1995and 1996.

In the NHES, households are selected for thesurvey using random digit dialing (RDD) methods anddata are collected using computer-assisted telephoneinterviewing (CATI) procedures. Approximately60,000 households are screened for each administration.The NHES survey for a given year typically consists of

a Screener, which collects household composition anddemographic data, and extended interviews on twosubstantive components addressing education-relatedtopics. This study is based on the Adult Education(AE) component of the NHES:95. It was designed toestimate the percentage of adults participating in adulteducation activities and the characteristics of bothparticipants and nonparticipants.

As noted above, it is often difficult to measureresponse bias. A frequently used method of doing thisis to compare the results of the survey against answersfrom a more definitive source, such as an administrativerecord file. However, record checks have their ownlimitations, e.g., record checks can only be used ifrecords exist on the survey topic and those records canbe accessed. Brick et al. (1994) found that, even for thewell-defined topic of teacher certification, records werenot complete and accurate and could not be matched tothe survey respondents without error.

Another way of measuring response bias isthrough the use of reinterviews. Reinterviews areordinarily undertaken to measure response variancerather than response bias. However, sometimes aprocess called reconciliation is used in reinterviewing tomeasure bias. If the original and reinterview responsesare different, then the respondent is asked to reconcilethe differences and the resulting response is called thereconciled response. The reconciliation is oftenconducted by a supervisor rather than a regularinterviewer, assuming this will make the reconciledresponse less subject to error. Under theseassumptions, the difference between the original andreconciled response has been used to estimate responsebias (Forsman and Schreiner 1991).

Reconciliation has been used in earlier NHESstudies to estimate response bias (Brick et al. 1996b)and a reinterview was also conducted for the NHES:95

31

39

AE interviews (Brick et al. 1996c). However, there islittle evidence that reinterviews, even the reconciledresponses, actually measure response bias. As a resultof this, the NHES:95 reinterview study was designed toestimate the response variance.

It should not be too surprising that this approachdoes not provide reliable estimates of response bias.The methods used in the reinterviews, such as selectinginterviewers from the original interviewer pool, askingthe questions in much the same way as asked in theoriginal interview, not informing the interviewer or therespondent of the answers from the original interview,and waiting at least 14 days between interviews so thatthe respondent will not remember the details of theoriginal interview are all designed to support themeasurement of response variance rather than responsebias.

The intensive reinterview was designed to be analternative method of estimating response bias that didnot suffer from some of the shortcomings of the recordcheck or regular reinterviews. The intensive reinterviewmethod was pioneered by Belson (1986) who focused ondifficult or sensitive topics primarily in opinion andmarketing research.

The intensive reinterview differed from theregular reinterview in a number of ways. Theinterviewers were not selected from the regular pool oftelephone interviewers, but were persons with previousexperience in interviewing using less structuredmethods. The interviewers were trained to use a protocoland to conduct the reinterviews in a conversationalmode, using probes and other devices to trigger recalland comprehension. The reinterview was focused on afew topics and ample time was allowed for discussingthese few points. The respondents were encouraged tovoice their opinions and understanding of the topics.Furthermore, attempts were made to engage therespondents in the interview by explicitly asking fortheir advice on ways to improve the interview. Thehope was that these methods would lead to morecomplete and accurate reporting in the intensivereinterview.

Although there were four major researchobjectives of the study, only two of them are discussedin this paper. The first goal of the study was toexamine the potential bias in the estimates of thepercentage of adults who participated in adult educationactivities. The bias could be due to eitherunderreporting participation or overreporting activitiesthat took place outside of the time frame of the survey(i.e., the past 12 months prior to the originalinterview). Respondents might underreportparticipation either because they might not recall aqualifying activity during the 12 months before they

32

were interviewed or because they might not comprehendthe range of activities that were included as adulteducation. These types of underreporting would lead todownward bias in participation rates. However, anupward bias could occur if respondents "telescoped"some activities. Telescoping is reporting activities thattook place outside of the time frame of the survey ashaving taken place within that time frame. As describedbelow, underreporting was expected to be minimal in alltypes of adult education (the six types of activities were:ESL, adult basic education/GED preparation classes,credential programs, apprenticeships, work-relatedcourses, and personal development courses), exceptwork-related and personal development courses. As aresult, the intensive reinterview focused on these twotypes of participation in order to assess the bias in theoverall participation rate.

The second goal was to obtain more accurateestimates of participation in work-related and personaldevelopment courses, separately. As a result of thedifferences between the estimates of participation fromearlier AE surveys and cognitive laboratory work, it wassuspected that work-related courses and personaldevelopment courses were susceptible to underreporting.One of the major concerns for reporting these types ofcourses is that respondents might not comprehend thefull range of activities that are included as work-relatedand personal development courses. These types ofcomprehension problems could combine with recallproblems and result in underreporting of work-relatedand personal development courses.

3. Intensive Reinterview MethodIn an attempt to more closely determine the

respondent's actual status or opinions, the intensiveinterview was more of a directed conversation betweenthe respondent and the interviewer rather than a formallyscripted interview. Respondents were reminded of theiranswers in the original survey and asked if the answerswere still true for them. They were asked to recall otherdetails related to their responses. Interviewers werefully knowledgeable about the original answers givenby the respondent. Tactics similar to those used incognitive laboratory work, such as asking open-endedquestions and using probes to encourage the respondentto elaborate on his or her answer, were used. The goalwas to obtain more detailed and accurate information byunderstanding the respondent's perspective and thereasons for his or her answers.

The intensive reinterview was a new undertakingand presented several challenges. For example, theinterviewers who conducted the NHES interviews werethoroughly trained to read the questions verbatim and toavoid affective behavior that might influence the

40

respondent. Adopting the conversational and

unstructured interviewing method called for in theintensive reinterview required major changes in theirbehavior. The interviewers were also called upon toimplement some methods used in cognitive research,but they were not previously trained in these methods.The respondents also faced a challenge because theintensive reinterview differed significantly from the typeof interview they had already done. They were calledupon to give reasons for their responses and providedetails rather than choose among response alternatives.

To address these challenges, a protocol and datacollection methods were developed especially for thisstudy. The full details of the protocol development, themethods used to select and train the interviewers, thesampling of respondents to the original interview, anddata collection methods are provided by Brick et al.(1996a). These issues are very important but spacelimitations prevent giving but a few important featuresof the final sample in this paper.

Although the goal of the study was to developestimates of bias, only a very limited sample size couldbe fielded. Because of the small sample sizes, it wasdecided that the typical design-based estimates gatheredfrom the original interview would be subject to verylarge sampling errors and relationships would be

obscured by these sampling errors. Thus, the resultsfrom this relatively small sample were analyzedassuming the observations were from independent,identically distributed random variables and samplingweights were not used. The sample was randomlyselected from both participants and nonparticipants whocompleted the AE extended interview. In order for acase to be eligible for the study, certain conditions hadto be met. For example, the original interview had tobe conducted in English.

A sample of 230 adults was selected to meetspecific targets by participation status and educationallevel. Of the 206 sampled adults who completed theintensive reinterviews (90 percent), 115 werenonparticipants in the original interview and 91 wereparticipants.

4. Findings4.1 Bias in Participation Rate Estimates

The first goal of the study was to estimate theresponse bias associated with estimates of the rate ofparticipation in AE from the NHES:95. Adults wereclassified as participants in AE if they had participatedin one or more of six different types of adult educationactivities during the past 12 months. Based on theresponses in the NHES:95, 40 percent of all adults hadparticipated in one or more of these activities in the last12 months (Kim et al. 1995).

33

As noted earlier, responding to the items aboutparticipation in work-related and personal developmentcourses was identified as being problematic during thedesign phase of the NHES:95. After the survey wascompleted, the results from the reinterview confirmedthat these two types of participation were much morelikely to be reported inconsistently than any of the othertypes (Brick et al. 1996c). These results support thedecision to restrict this study to an in-depth examinationof reporting work-related and personal developmentcourses.

Adults were classified as nonparticipants in theoriginal interview if they said they had not taken anycourses in the last 12 months. Of the 115

nonparticipants who responded to the intensivereinterview, 41 percent indicated that they had taken oneor more work-related or personal development courses(in the intensive reinterview, respondents were not askedabout other types of courses). Since none of theparticipants sampled for the intensive reinterview deniedhaving taken courses, the response bias in the overallparticipation rate is one-directional and substantial.

Assuming the responding nonparticipants in theBias Study are a simple random sample of all adultsclassified as nonparticipants in the NHES:95 (theanalysis is thus unweighted), the bias in the NHES:95estimate is 24 percent. The bias is estimated bymultiplying the percent of all adults who werenonparticipants as reported in the NHES:95 by thepercent of the nonparticipants who reported participatingin the intensive reinterview. In general, the estimatedbias is

L(po) = po9P (100 po)r) (1)

where Po is the estimate of the percentage of adultsclassified as participants in the initial interview, 9P isthe estimate of the proportion of participants in theinitial interview who reported not participating in the

intensive reinterview, and 5'1P is the estimate of theproportion of nonparticipants in the initial interviewwho reported participating in the intensive reinterview.In this case, the last term of the estimated bias is zero

( 5)P =0), because no initial participants said they hadnot taken any courses during the intensive reinterview.If the bias in the estimated percentage of adults whoparticipated in AE is 24 percent, then the bias-correctedestimate is that 64 percent of adults participated in AEin 1995. This is substantially larger than the 40percent reported in the NHES:95. Both the bias and thepercentage participating from the NHES:95 are subjectto sampling error and because of the sample size thesampling error of the bias is very large relative to thatfor the estimate from the NHES:95. Taking advantage

of the fact that 9P =0, the estimated bias can be writtenas

b(p0) = (po 100)Siv

Thus, the estimated variance of the bias is aproduct of random variables and the approximatevariance for a product of independent random variables is(Hansen et al. 1953)

Var(b) = (rP )2 Var(p0) + (po 100)2 Vary'" )(2)

Substituting the estimated values into (2) andtaking the square root, the standard error of the estimatedbias of 24 percent is 2.7 percent. Thus, a 95 percentconfidence interval for the estimated bias is from 19percent to 29 percent, and for the percent of adultsparticipating, the confidence interval is from 59 to 69percent.

Considering the nonparticipating respondents tothe intensive reinterview a simple random sample of allnonparticipants in the original survey is a keyassumption in estimating the response bias. Usually,the sampling procedures would ensure that thisassumption holds, but the sampling methods describedearlier were primarily concerned with making sure thesample sizes for specific groups were large enough toprovide some nonparticipants with variouscharacteristics. In addition, the small sample size doesnot allow for broad generalizations. Thus, theseestimates are exploratory and should not be used tomake bias corrections to the NHES:95 estimates.

To evaluate the reasonability of this assumption,the characteristics of the responding nonparticipantsfrom the intensive reinterview were compared to thecharacteristics of all nonparticipants from the NHES:95.While the age and sex distributions are similar, the

educational attainment distributions are different, with amuch larger percentage of intensive reinterviewnonparticipants having less than a high schooleducation. This is a consequence of the samplingmethods used for the intensive reinterview. Thisdifference highlights the fact that the point estimatesand confidence intervals from the study are subject tospecification errors that cannot be measured. Despitethis shortcoming, the findings clearly show that arelatively large fraction of the adults classified asnonparticipants in the original survey did identify AEactivities in the intensive reinterview.

In addition to nonparticipants, participants in theoriginal survey who were sampled were asked if therewere any courses they had not reported in the initialinterview by using the same types of probes describedabove. All of the work-related and personal developmentcourses participants reported in the NHES:95 were

34

verified as being within the eligible 12 month timeperiod. In addition, about one-third of the sampledparticipants identified additional courses that were notreported in the original interview.

The reporting of additional work-related andpersonal development courses by adults classified asparticipants in the original survey is a further indicationthat the respondents may have had a more restrictiveunderstanding of the scope of activities than wasintended. Drawing on the work of Schwarz (1995), oneinterpretation of this finding is that respondents mighthave reacted to the context of the original interview indetermining what was an eligible activity. TheNHES:95 interview began by asking about more formaltypes of participation and some respondents may havecreated a response paradigm before the questions aboutthe less formal activities were asked. In the intensivereinterview the context was different because the onlytypes of courses discussed were work-related andpersonal development courses.

4.2 Bias in Work-Related and PersonalDevelopment Participation EstimatesOverall, about half the adults who named

additional courses reported work-related courses and halfreported personal development courses. Participantswere more likely to add personal development coursesand the nonparticipants were more likely to add work-related courses, but these differences are not statisticallysignificant.

Based on the NHES:95 responses, 21 percent ofadults were estimated to have participated in work-related courses during the previous 12 months and 20percent were estimated to have participated in personaldevelopment courses (Kim et al. 1995). The extent ofthe bias in these estimates can be estimated usingequation (1). The bias for the work-related participationrate is

1;(P°.wr) = (1- PO,wr)57. (3)

where paw,. is the estimate of the percentage of adultsclassified as work-related participants in the initial

interview, and 5 is the estimate of the proportion of

adults who did not report participating in work-relatedactivities in the initial interview but reportedparticipating in the intensive reinterview. Because weare now dealing with participation in a particular type of

adult education, the value of 511,,P, has two components:

those classified as nonparticipants who reported takingwork-related courses in the intensive reinterview, andparticipants in the initial survey who reported takingwork-related courses for the first time in the intensive

42

reinterview. In the intensive reinterview, 23 percent ofthe nonparticipants reported taking work-related coursesand 8 percent of the participants reported taking work-related courses for the first time.

Substituting the values into (3), the estimatedbias for the percent of adults participating in work-related courses is 16 percent. The standard error can be

computed using (2), where jr,P, is treated as a sum of

the two components described above. Using thisapproach, the standard error of the estimated bias is3 percent and the 95 percent confidence interval for theestimated bias is from 10 to 22 percent.

The same calculations can be performed forpersonal development courses to compute the estimatedbias and its standard error. The estimated biases aresummarized in Table 1 below.

These estimates show that the underreportingbias is approximately the same for both work-relatedand personal development courses. While theseestimates are subject to the same caveats as the overallestimates of participation rates, they also have aninteresting implication because of the difference in thewording of the questions about the two types ofparticipation. The introduction to the question aboutwork-related courses does not include specific examples,but does mention courses taken at work, takensomewhere else but related to work or career, andcourses taken to obtain a license or certificate related towork or career. On the other hand, the introduction tothe question about personal development specificallymentions courses including "arts and crafts, sports orrecreation, first aid or childbirth, Bible study, or anyother course."

One way of interpreting the equal biases for thetwo types of participation is that adding examples doesnot improve the quality of reporting in this situation.This interpretation is consistent with the hypothesisthat respondents develop a response paradigm in theoriginal interview that includes only more formalcourses. If this is true, then the addition of the specificexamples may not cause respondents to change theirparadigm and a different approach might be needed toaddress the exclusion of less formal courses inreporting.

5. SummaryThe estimated bias in the overall participation

rate of adults was 24 percent and the bias-correctedestimate is that 64 percent of adults participated in AEin 1995. This is substantially larger than the 40percent reported in the NHES:95. The underreporting ofparticipation for work-related and personal development

35

courses was also substantial and of about the samemagnitude for each of these types of participation.

A reason for the underreporting interpretationmay be related to how respondents react to the contextof the interview. Some respondents may have created aresponse paradigm that restricted their answers to moreformal courses before the questions about the work-related and personal development courses were asked.Despite the fact that more examples were used for thepersonal development courses than for the work-relatedcourse question, the estimated biases were

approximately the same for the two types ofparticipation. This suggests that simply addingexamples to the wordings of the questions may notimprove the quality of reporting and that otherapproaches to the underreporting problem may beneeded.

If the adults have developed a response paradigmthat focuses on formal types of participation (i.e.,traditional schooling or formal programs), then arelatively drastic intervention may be needed to modifythis behavior. For example, a modification in whichthe respondents are asked to actively cooperate inchanging the focus, for example by giving examples ofless formal courses, might be more effective.

The intensive reinterview methodology appearsto have good potential as a method for detecting biases,especially if more traditional methods like record checkstudies are not feasible. The alternative approach ofusing reconciled reinterviews, on the other hand, has notproven to be successful for estimating bias. However,from an operational perspective, it is important tounderstand that this method is more costly than aregular reinterview. As a result, this method should beused primarily when there is an indication of reportingerrors and the estimates subject to the biases are

important to the survey objectives.

ReferencesBelson, W.A. (1986). Validity in survey research.

Cambridge, England: University Press.Brick, J.M., Cahalan, M., Gray, L., Severynse, J., and

Stowe, P. (1994). "A study of selected samplingerrors in the 1991 Survey of Recent CollegeGraduates," Technical Report. U.S. Department ofEducation, Office of Educational Research andImprovement, NCES 95-640.

Brick, J.M., Kim, K., Nolin, M.J., and Collins, M.(1996a). "Estimation of response bias in theNHES:95 Adult Education Survey." WorkingPaper ,96-13. U.S. Department of Education,Office of Educational Research and Improvement,National Center for Education Statistics.

43

Brick, J.M., Rizzo, L., and Wernimont, J. (1996b)."The 1993 National Household Education Survey:Reinterview results for the School Readiness andSchool Safety and Discipline components."Technical report. U.S. Department of Education,Office of Educational Research and Improvement,NCES. (in progress).

Brick, J.M., Wernimont, J., and Montes, M. (1996c)."The 1995 National Household Education Survey:Reinterview results for the Adult Educationcomponent." Working Paper, 96-14. U.S.Department of Education, Office of EducationalResearch and Improvement, National Center forEducation Statistics.

Forsman, G., and Schreiner, I. (1991). "The design andanalysis of reinterview: An overview." In P.Biemer, R. Groves, L. Lyberg, N. Mathiowetz, andS. Sudman (eds.), Measurement error in surveys,279-302. New York: John Wiley & Sons.

Hansen, M.H., Hurwitz, W.N., and Madow, W.G.(1953). Survey Methods and Theory, Vol. I:

Methods and Applications. New York: JohnWiley and Sons.

Kim, K., Collins, M., Stowe, P., and Chandler, K.(1995). "Forty percent of adults participate in adulteducation activities: 1994-95." Statistics in Brief.U.S. Department of Education, Office of

Educational Research and Improvement, NCES 95-823.

Schwarz, N. (1995) "What respondents learn fromquestionnaires: the survey interview and the logicof conversation." International StatisticalReview, 62, 2, 153-177.

Table 1. Estimates of bias in overall, work-related, and personal development participation rates.

Type ofparticipation

NHES:95estimate

Estimatedbias

Samplingerror

Bias-correctedestimate

Overall 40% 24% 2.7% 64%

Work-related 21% 16% 3.0% 37%

Personaldevelopment 20% 14% 3.0% 34%

Source: U.S. Depaitment of Education, National Center for Education Statistics, Bias Study of the NationalHousehold Education Survey, 1995.

44

36

OPTIMAL PERIODICITY OF A SURVEY: EXTENSIONS OF PROBABLE-ERROR MODELS

Wray Smith, Dhiren Ghosh, and Michael Chang, Synectics for Management DecisionsWray Smith, Synectics for Management Decisions, 3030 Clarendon Blvd #305, Arlington VA 22201

KEY WORDS: Absolute error modeling, Datadeterioration, Fixed-and-variable costs, Repeatedsurveys, Sampling designs

This paper extends prior work on the problem ofchoosing optimal periodicity (and associated samplesizes) for repeated surveys of public and privateschools with joint consideration of data deterioration(resulting from unobserved year-to-year changes in theunderlying process variables), sampling error, and cost.The family of "probable-error models" that was firstdescribed in Ghosh et al. (1994) has been extended andempirical results obtained for state-level as well asnational-level estimates using data from three roundsof the Schools and Staffing Survey (SASS). As notedin the 1994 paper, the models provide "a directapproximate method for characterizing the decisionproblem of making a joint choice of inter-surveyintervals and sample sizes under a fixed costconstraint." The extensions reviewed in the presentpaper assume, for the most part, that conventionaldirect estimation methods will be used by the data user.In the case of a proposed alternative sampling designsuggested by the modeling results, the data user maywish to consider the use of an indirect estimation (timeseries modeling) approach along the lines discussed inSmith et al. (1995).

SASS was conducted at three-year intervals forschool years 1987-88, 1990-91, and 1993-94. Futurerounds may be conducted at intersurvey intervals of 4,5, or 6 years. The modeling extensions are illustratedhere in a review of two of several new models thatwere formulated as modifications of the earlier models.The two models provide alternative formulations toaccount for the approximate average errors incurred bya data user within successive 12-month periodsfollowing a SASS data collection and up to the time ofthe next data collection. Projected absolute errors havebeen estimated for future national-level and typicalstate-level data collections for selected policy variablesand a range of fixed-to-variable cost ratios for eachpossible periodicity.

The two illustrative models, denoted as Model 3Aand Model 4M, are modifications of Model 3 aridModel 4 of the 1994 paper. They combine a samplingabsolute error (s.a.e.) and a process shift D over timein different ways to obtain, for different periodicities,estimates of the year-by-year projected absolute errorsthat would be incurred by a data user as well as

average projected absolute errors for each multi-yearperiodicity. Annual dollar resources for SASS areassumed to be fixed. For each of several scenarios thisassumption constrains the total annualized cost to afixed amount and hence determines the sample size foreach combination of a periodicity (4, 5, or 6 years) anda fixed-to-variable cost ratio.

We assume that data users will keep on using thedata obtained from the most recent past survey until anew survey is undertaken and the newly collected dataare processed and released to data users. Thus, if theinter-survey period is long, "deterioration" of the datacould affect the quality of decisions made by users. Onthe other hand, if the survey is undertaken veryfrequently, the costs of conducting the survey andanalyzing the data and the indirect costs of theresponse burden may be judged to have costs thatexceed the benefits achieved in using fresh data. In thecontext of repeated surveys, it is useful to distinguishboth opportunities and problems presented by differentdesigns.

Typical analyses of cost-benefit tradeoffs tend tofocus on the best use of a fixed resource amount over atime period that would include two or more survey datacollections. The present budgetary restrictions for the1990s are such that the "fixed" resource amount maybe arbitrarily depressed and may overconstrain anyrealistic formulation of the optimization problem. Infact, the "truly optimal" formulation may be precludedby external constraints.

The usual cost model for a sample surveyassumes a start-up cost Co and a per unit (ultimatesample unit) cost C,. Thus, the total cost isrepresented as C = Co + nC . However, the start-upcost may depend on the periodicity. If so we representthe start-up cost as Co.), (where k is the periodicity),which may be regarded as increasing with increasingperiodicity; that is, the start-up cost may be more if theperiodicity is five years compared to the start-up costfor a periodicity of four years and so on. On the otherhand, the start-up cost may be considered to beconstant; that is, it may not depend on the periodicityof the survey. Further details are given in Ghosh et aL(1994).

We assume that the true value of a variableremains constant for a year after the survey date. Thisis an appropriate assumption for the SASS surveysystem since nearly all of the observed variables underthe various SASS questionnaires have an annual

accounting period and the SASS data user is interestedin changes in variables which are specified to changeas of some conventional time point. For example, theofficial figure for enrollment and number of teachers ina public school is the enrollment "on or about October1" of the school year. The corresponding number ofteachers or the full-time equivalent (FTE) number ofteachers are counted at about the same point in time.The student enrollment and the teacher count mayfluctuate during the academic year, but SASS and theCommon Core of Data (CCD) are, in effect, takingsnapshots at the same time over a sequence of years.The error committed in using a survey estimate isexactly equal to the difference between the surveyestimate and the true value. Within the first twelve-month interval from the survey date any user incurs anerror which equals the difference between the truevalue and the survey estimate. The estimated standarderror of the survey estimate provides an indication ofthis difference.

If one were interested in estimating from SASSdata for a survey year the mean of some characteristicfor a specified group of schools, such as the average"number of K-12 teachers that are new to the schoolthis year" for all regular public elementary schools inthe state of California, then the estimate would beconstructed by applying the school weight for eachschool to the reported number of new teachers for thatschool, summing the products and dividing by the sumof the school weights. For some of the SASS-basedpublic school statistics published by NCES, such asthose in the Statistical Profiles for each round ofSASS, the NCES publications include tables of state-by-state estimates of the statistics and, for a selectedsubset of of the state-by-state statistics, they alsoinclude tables of the estimated standard errors for thosestatistics. For example, the publication Schools andStaffing in the United States: A Statistical Profile,1990-91 includes for public schools both estimates ofthe statistics and estimated standard errors for thesestatistics on a state-by-state basis for (1) Number ofpublic schools and students and average number ofstudents per full-time-equivalent (FTE) teacher, (2)Percentage distribution of public school teachers bysex and race-ethnicity, percent minority teachers, andaverage teacher age, and (3) Average base salary forfull-time public school teachers and average publicschool principal salary. As stated in the technical notesto that publication, "Standard errors were estimatedusing a balanced repeated replications procedure thatincorporates the design features of this complexsurvey."

The difference between the true value and thesurvey estimate is the deviation from the mean m in

the normal distribution of the survey estimate xconsidered as random variables. We denote theaverage of the absolute deviations as the "samplingabsolute error" or (s.a.e.). Assuming a normaldistribution, the projected absolute error incurred by auser during the first year after the survey is 0.8 s /sqrt(n) where s / sqrt(n) is the standard error of theestimate, assuming simple random sampling. At theend of each year we assume that the true valueundergoes a change. The magnitude of this change atthe end of each year is denoted I D I . The samplingerror component is 0.8 s / sqrt(n). Thus the expectedvalue of the total error committed by a data user isdependent on (s.a.e.) and on I D I . The magnitude ofthe change at the end of the second year is also I D ,

and so on.In Model 3A, which is a variant of the Model 3

described in Ghosh et al. (1994), we assume the year-to-year process disturbance (process error) to be anormal variable with a zero mean. (If needed, aprocess error with a nonzero mean could beincorporated into the analysis framework.) Since theprocess error and the sampling error are both assumedto be normally distributed, they can be readilycombined. The projected absolute error is then a linearcombination of the process absolute error and thesampling absolute error.

In Model 4M, we explicitly assume that theprocess change which occurs each year (for example,every October) occurs in accordance with a RandomWalk process in discrete time. That is,

x, = + w,

where w, has mean zero. We then calculate theaverage error for different possible periodicities of therepeated survey. The optimal intersurvey interval canbe determined if the process variance and the samplingvariance are known. In a Random Walk model, thecurrent level of the process is the best current forecastfor any future year. One assumes that any known trendcomponent has already been subtracted out. Ingeneral, data users will typically use the last availablesurvey value as long as no new survey has beenconducted. This assumption concerning user behavioris consistent with our assumption of an underlyingRandom Walk process.

As noted above, Model 3A is a variant of Model3 of Ghosh et al. (1994). The new Model 4M is amodification and replacement for Model 4 of that 1994paper. In the original Model 4 we introduced theconcept of a loss parameter that converted thesampling error together with the unobserved processshift in non-survey years to a loss expressed in

38 4 6

monetary units. The combination of average cost andaverage error over a period of years was minimized todetermine the optimum periodicity. This was avariation on an approach in Smith (1980) and also onan analysis suggested by S. Kaufman. Model 4M,however, does away with the need for a separate lossparameter, thus avoiding the introduction of asubjective judgment on the part of the surveyadministrator.

The following table sets forth the year-by-yearevolution of the projected absolute errors for the twomodels. In Model 3A the evolution is based on I D I ,

the magnitude of the annual change in the true value,and the sampling absolute error, (s.a.e.). For Model4M, the evolution is based on Ei , which isproportional to the variance of the process disturbance,and the sampling absolute error, (s.a.e.) . The (s.a.e.)depends on the sample size which, in turn, depends onthe chosen periodicity under the constraint of fixedannualized cost.

Projected Absolute Errors for Selected Models

Year

1

2

Model 3A Model 4M

(s.a.e.) (s.a.e.)

3

4

0.8 IDI-F(s.a.e.)

0.8j IDFF(s.a.e.)

0.8.15-1D1+(s.a.e.)

5

0.8.XID1+(s.a.e.)

6

0.8.13 ID1+(s.a.e.)

Avgp.a.e.

(6yrs)

i_1lD1+(s.a.e.)i=1

0.8 D2 +r(s.a.elL 0.8

+r(s.a.e.)120.81/2D2

L 0.8

r(s.a.e.)-120.8113D2

L 0.8 j

+r(s.a.e.)120.81/4D2

L 0.8

+1(s.a.e.)120.8115D2

L 0.8

0.8 +11(i_)D2 +1(sorl-e12

6 fr7.1 L 0.8

We applied the models described above using threerounds of SASS data at the national level (U.S.) and atthe state level for three selected states (California,Iowa, and New York). The following twelve variableswere selected from the School, Administrator, andTeacher questionnaires:

Item 1. Number of students served by chapter 1

services (Schools--public).Item 4. Number of K-12 teachers that are new to the

school this year (Schools--public).Item 6. Percentage of all schools with minority

principals (Adminr--public and private).Item 7A. Number of students per FTE teacher, by

sector (Schools--public).Item 7B. Number of students per FTE teacher, by

sector (Schools--private).Item 8A. Percentage of schools in which various

programs and services were available(Schools--public).

Item 8B. Percentage of schools in which variousprograms and services were available(Schools--private).

Item 9. Percentage of principals having master'sdegree (Administrator--public).

Item 10A. Percentage of full time teachers whoreceived various types of compensation(Teacher--public).

Item 10B. Percentage of full time teachers whoreceived various types of compensation(Teacher--private).

Item 11A. Percentage of full time teachers newly hiredand were first time teachers (Teacher- -public).

Item 11B. Percentage of full time teachers newly hiredand who were first time teachers (Teacher- -private).

Private school items 7B, 8B, 10B, and 11B wereomitted from the state-level computer runs since state-level estimates are not published by NCES for privateschools. Item 6, which is based on pooled data forpublic and private schools combined, was retained inall runs.

We obtained approximate estimates for the fixedcost and variable cost elements of SASS. We appliedthe two models for each variable listed above, andcomputed the projected absolute error for periodicitiesof four, five, and six years and for specified scenariosof fixed-to-variable cost. The accompanying graphsfor Model 3A and Model 4M, respectively, show theaverage rel p.a.e. (where the rel p.a.e. for eachvariable is its p.a.e. divided by its mean) for Iowa,New York, California, and the U.S. for a set of eight

policy variables with a fixed total cost and a fixed-to-variable cost ratio of 50:50. We see that for the U.S. asa whole, shorter periodicities (even with their smallersample sizes) result in smaller relative projectedabsolute errors. For California and Iowa, under bothModel 3A and Model 4M, the averages of the relativeprojected absolute errors are larger for shortperiodicities and smaller for longer periodicities. ForNew York, the mean values of the rel p.a.e. areessentially flat under Model 3A over the periodicityrange from 2 to 6 years but under Model 4M the valuesdecline initially, with a minimum at a periodicity of 4years, and then rise slightly for periodicities of 5 and 6years.

Under the probable-error models the data userswho are primarily interested in carrying out analysesfor individual States will generally incur smaller errorsif they are provided with datasets from longerperiodicities and hence larger sample sizes. Data userswho are primarily interested in carrying out analysesfor the U.S. as a whole will incur smaller errors if theyare provided with datasets from shorter periodicitiesand correspondingly smaller sample sizes.

These observations have led to an alternatinglarge-and-small-sample scenario which was formulatedas follows: Assume the same fixed annualizedresource budget that would otherwise support thelarge-sample scenarios with a periodicity of five yearsover a range of cost ratios (0.3, 0.4, 0.5, 0.6, or 0.7).Then the assumed sample sizes for the U.S. at the mid-point cost ratio p=0.5 were 9,000 public schools and48,000 teachers. These sizes were proportionallysmaller or larger for smaller or larger cost ratios.Assign these sample sizes to a periodicity of six yearsinstead of to a periodicity of five years. This results ina "cost dividend" of 20 per cent which can be investedin a one-fifth U.S. sample of 1,800 public schools and9,600 teachers for a data collection which can beconducted at the halfway point between two full-sample data collections; namely, three years after theprevious large data collection. Assume that there is noprocessing delay. For simplicity, assume that theschools in the one-fifth sample are nonoverlappingwith the schools in the full sample. Further assumethat for the U.S. as whole only direct estimates will beof interest and, hence, the two independent samples(the full sample and the one-fifth sample) will betreated as independent cross-section surveys threeyears apart.

Now consider the projected absolute errors thatwill be incurred by a data user over a six-year period.During the first, second, and third years after a full-sample data collection a data user who is interested innational data will continue to use that sample. Under

40 48

either Model 3A or Model 4M the projected absoluteerrors will increase each year. In the fourth year afterthe large sample data collection a new dataset from theone-fifth size national sample will become available.The data user then disregards the data in the old largesample and begins to use the data from the new one-fifth sample and continues to use it until data from thenext full sample becomes available in the seventh year.For the U.S., the sample sizes in the one-fifth nationalsample are large enough that quite good estimates maybe made. That is the user is not heavily penalized inshifting every three years between the full nationalsample and the one-fifth national sample.

For the cost ratio p=0.5 the average rel p.a.e.values for an alternating large-and-small sample designwith large sample periodicity of six years are less thanor equal to the average rel p.a.e. values for the singlelarge-sample scenario with periodicity of five years.Furthermore, the user of U.S.-level data will bereceiving the benefits associated with the receipt offresh data every three years instead of every five years.Related numerical results will be found in Smith,Ghosh, and Chang (1996).

Our main conclusion from the present study isthat the National Center for Education Statistics shouldconsider adopting an alternating large-and-small-sample design for SASS with an appropriate full-sample periodicity together with a mid-periodfractional-sample to provide a periodic update at thenational level and for larger States.

References

Abramson, R. et al. (1996), "1993-94 Schools andStaffing Survey: Sample Design and Estimation,"Technical Report NCES 96-089, Washington, DC:U.S. Department of Education, Office ofEducational Research and Improvement, NationalCenter for Education Statistics (forthcoming).

Ghosh, D., Kaufman, S., Smith, W. and Chang, M.(1994), "Optimal Periodicity of a Survey: SamplingError, Data Deterioration, and Cost," 1994Proceedings of the ASA Section on SurveyResearch Methods, 1122-1127.

Kaufman, S. (1991), "1988 Schools and StaffingSurvey Sample Design and Estimation," TechnicalReport NCES 91-127, Washington, DC: U.S.Department of Education, Office of EducationalResearch and Improvement, National Center forEducation Statistics.

Kaufman, S. and Huang, H. (1993), "1990-91 Schoolsand Staffing Survey: Sample Design andEstimation," Technical Report NCES 93-449,Washington, DC: U.S. Department of Education,Office of Educational Research and Improvement,National Center for Education Statistics.

Scott, A.J., Smith, T.M.F. and Jones, R.G. (1977),"The Application of Time Series Methods to theAnalysis of Repeated Surveys," InternationalStatistical Review, 45, 13-28.

Smith, W. (1980), "Sample Size and Timing Decisionsfor Repeated Socioeconomic Surveys,"unpublished D.Sc. dissertation, The GeorgeWashington University, School of Engineering andApplied Science.

Smith, W. and Barzily, Z. (1982), "Kalman FilterTechniques for Control of Repeated EconomicSurveys," Journal of Economic Dynamics andControl, 4, 261-279.

Smith, W., Ghosh, D. and Chang, M. (1995), "OptimalPeriodicity of a Survey: Alternatives Under Costand Policy Constraints," 1995 Proceedings of theASA Section on Survey Research Methods.

Smith, W., Ghosh, D. and Chang, M. (1996),"Optimizing the Periodicity of the Schools andStaffing Survey: An Updated Assessment Based onThree Rounds of SASS Data," Technical Report,Synectics for Management Decisions, Inc.,Arlington, VA (forthcoming).

4149

Average rel p.a.e. for Iowa, New York, California, and U.S.for Eight Policy Variables with Fixed Total Cost and pA:1.5

425 0BEST COPY AVAILABLE

Projected Absolute Errors for Selected Models

Year Model 3A Model 4M

1

(s.a.e.) (s.a.e.)

0.8 ID1+(s.a.e.) 0.8 D2+1- (s.a.e.) 12\I [ 0.8 j

3

0.8j ID1+(s.a.e.) 0.8 2D2±F (s.a.e.)12

1L 0.8

4

0.8VTID1+(s.a.e.) +(s.a.e.)12C

0.8.13D20.8

5r

li1 0.8 \FFID I +(s.a.e.) 0.8114D2(s.a.e.) 12

L 0.8

61-

0.8V3 IDI+(s.a.e.) 0.8\15D2L

(s.a.e.) 12

0.8

Avg iDp.a.e.

08 yii 1 DIF(s.a.e.) 0.8 v,=6-, 1(i-1)D2 +I- (s.a.e.) 12

(66 ,=1 6 0.8

yrs)

ESTIMATING THE VARIANCE IN THE PRESENCE OF IMPUTATION USING A RESIDUAL

Steven Kaufman, National Center for Education StatisticsRoom 422d, 555 New Jersey Ave. N. W., Washington, D.C. 20208

Key Words: Imputation, Variance, Simulation,Half-Sample Replication

1. IntroductionWith most surveys there is always some item

nonresponse. One way of dealing with item nonre-sponse is to include imputations for the missingdata. This makes it easy to compute appropriatepopulation estimates. However, the variance will beunderestimated in the presence of imputation. Thisfollows from the main assumption with designbased estimation; the only random variable is thevariable specifying which units are in sample.When an imputation is generated from a sample,the imputation is not known given the unit requir-ing imputation. However, since it is assumed thatthe imputed value is known (i.e., it has no variabil-ity), standard variance estimation can not properlyreflect the imputation variance.

A number of methodologies have been proposedto estimate imputation variance. Multiple imputa-tion (Rubin,1978), a modified jacknife (Rao, 1992)and a model assisted methodology (Samdal, 1990)are some of the approaches that have been pro-posed. Rao's and Samdal's methodologies requirespecial software to estimate the variance. Rubin'smethodology requires the computation of the vari-ance estimate associated with the average of themultiple imputation estimates. With respect to ad-ditional software requirements, the multiple impu-tation variance estimate, although not as compli-cated as the other methods, is still more compli-cated then standard variance estimation.

The methodology proposed in this paper is amixture of the methodologies described above.Like multiple imputation, two potential imputationsare assigned to each nonrespondent. The differencebetween these values is a residual that is added toappropriate data elements to reflect the imputationvariance. The residual term differs from the oneused in the modified jacknife, in that the modifiedjacknife residual introduces the variability replicateby replicate, while here the variability is introducedto the actual data elements. Once the residuals havebeen added, standard replication programs cancompute the total variance. The questions this pa-per addresses are: 1) when will this process appro-priately measure the total variance; and 2) when the

process is not appropriate, what must be added toprovide appropriate estimates.

A simulation study, modeled after NCES'sSchools and Staffmg Survey (SASS), which has acomplex sample design, will demonstrate the pro-posed procedures. A nearest neighbor imputationwill be used. The nonresponse will be modeled as-suming unequal nonresponse-rates per cell and thatlarger units are more likely to be nonrespondents.2. Imputations

The nearest neighbor imputations used in thispaper are done within imputation cells, after theschools have been sorted by the number of studentper school. The imputation cells are state/schoollevel/urbanicity. There are three school levels -elementary, secondary and combined schools.There are three levels of urbanicity - central city,urban fringe/large town and rural/small town. Afterthe file is sorted, it is accessed sequentially usingthe nearest responding school as the donor for anonresponding school. Two imputations will bedetermined for each nonrespondent. One where thefile is sorted in ascending order and another wherethe file is sorted in descending order. A randomselection is used to determine which imputation isused in the estimate of interest (y. ).3. Definitionsr is the set of responding units

nr is the set of nonresponding units

3.1 Terms defining the imputation and residual

Yk =[Yk if k r

Yklik yk2(I Ik) if k E nr

Ik represents an independent selection within each

unit k with probability of .5 for a value of 1,

and 0 otherwiseyk is the response from unit k

ykl is the nearest neighbor imputation in ascending

order

yk2 is the nearest neighbor imputation in descending

order= Ewkh , the main estimate of interest

kes

Wk is the sampling weight for unit ks is a probability sample of units

E wkyk , the estimate with complete responsekes

e7kR =

0 ifk Er

5).4 2 i)jkl

jjkl -Pik 2where:

jk is a unit independently and randomly selected

from {j If E nr and within k's imputation cell}.

The selection is done proportional to the w j 's.

In addition, R will be used to represent the selec -

tion of unit jk .

dR =Ewkdkkes

ifk E nr and yki is used in 5),

ifk E nr and yk2 is used in Y".

3.2 Terms used in section 4

0 ifk Erdw = ymk2 -ymkl if k E nr and i = 1

y,ki y,k 2 ifk E nr and i= 2

mk is defined like jk above, independently fori = 1 and 2 .

T is used to represent the mk selection.

air = E wo7k7; =kes

Yk

Yk2

kes

y. =112(Y + 1,2 )

ifk Erif k enr and i =1ifk enrandi=2

3.3 Terms used in section 5

[0

ifk Erjk = Yk2Ykl if k E nr and yid is used in 5/.

YklYk2 if k E nr and yk2 is used in Ys.

y; is the term subtracted in ilk

= wkYkkes

a =

makes

k.4. Measuring the Imputation Variance

The goal of this paper is to present a methodol-ogy which measures the imputation variance usingstandard replication variance software packages, ina simple manner. One way of doing this is adding

an independent residual term ( ) to the data ele-

ments (5)-k ), so that an appropriate amount of im-

putation variance is added. The estimate (YO is

transformed into f = 5). +c2R . An appropriate con-

stant is added to the ilkR 's , so that f = Y.. Now,

the question is when does V(f) appropriately

measure the true variance. When V(Y) is not ap-propriate, what must be done to make it appropri-

ate. The first step is to compute V(2) .

4.1 Computation of Variance of Y=5). a R.Let c7= wk 07k

zikR

kes

11(1).= f(2)+ T;(2), (1)

where: 1 represents the selection of the sample sand 2 represents the imputation selectionof Ik and residual selection R.

4.2 Computation of V E(Y)I 2

E(P)=112(A+21.)+112(5;2+4)

di has been normalized to equal zeroLooking at the right hand term first:

1/ 4V(Y1 +cliT)=1/ 4V(P,

= 4uc(i), ) 1;(5,1;

+2 coy(g Y,Y )], since cov(Y1 , air) = 0

= 1/ 4[V(yI (Y4; i)))+ V(Y ) (2)

+ 2 cov(yL ii)], assuming sir, is distributed as

Likewise,

1 / 4 V(172 + cir)= 1/ 4[V(Y2 (Y 72 .P))+ V(Y .Pr2)

+ 2 cov(Yn -4, Si)] (3)

Combining 2 and 3 gives:

V E(P)=V(P)+ - ) + 2 cov(y. ."3,,j/s) , (4)12 1 1

assuming that sq, and S;r2 are distributed as

and Sqj and 1,r2 are distributed as

cov(S; +.1)1 ,5; + 1;2 - 14.2 ) and

cov(S, i/sr, , 5,72 ) are zero by the independence

in the T selection

5/2 .2

46 5 3

4.3 Computation of T;(1)

(f).vo, )+ v(aR)2 2 2

(5)

V PPz is the variance associated with probability2

proportionate to size sampling with replac-ement. The size is the wk's for k e nr withineach imputation cell.

4.4 Computation of V(k)Putting 1, 4 and 5 together gives:

V(17) = -Y.) +2 cov(7.

+EV(.P.)+EVPPz(aR)1 2 1 2

= V(Y S).)i- 2 cov(Y. Y, 5;)

+EVPP(c2R)1 2

As formulated by Si mdal (1994), V(y) is thevariance assuming no nonresponse and V(y y. )

is the imputation variance. The sum of these com-ponents equals the total variance, assuming thecov(j), y y,) = 0 . The simulation done in Sarncial(1994), using nearest neighbor imputation, indi-cates that the covariance term is small and can beapproximated by zero.

2 cov(5;. y, 5,) and E T;PPz R) are terms not

included in the estimate of the true variance. Thereare two ways of handling these terms. The first is to

estimate the terms and subtract them from V(f) .

The second way is realizing that the only non-zerovalues in these terms comes from nonrespondentsand if the item response rate is relatively high,these terms should be small and can be ignored.5. Computation of covg.

Since cov(57. y, j)) requires yk fork a nr , it cannot be computed with the given sample. However,

it can be approximated using ak . If

dk = Yk2 Ykl and given how dk is formed, yk2is the nearest neighbor imputation for yk, . There-

fore, -cik can be viewed as the difference betweenthe nearest neighbor imputation for a known value

and the known value. Since dk , in terms of thenearest neighbor imputation, is close to unit k, the

covariance might be approximated by cov(d, .

6. Computation of V PPz (d R )2

EVPPz(?) is estimated from a sample by1 2

computing V PP (d R) for that sample. This is done

by either using an exact formula (see Cochran,1977) or by replication given a set of replicateweights designed for the conditional variance. Forthis report the exact formula will be used.7. Simulations

In order to determine how well the proposed im-putations work, a simulation study is performed fortwo variations of the proposed variance methodol-ogy, using four states within the SASS sample de-sign and performing 5,000 simulations.

For each simulation sample, a nearest neighborimputation (yk ) is performed on the selected sam-ple of nonrespondents. Three populations of nonre-spondents with 5%, 15% and 30% nonresponse willbe generated and compared.

The simulation estimates are based on the designand collection variables that can be found on theframe. In this way, estimates for any selected sam-ple, as well as, estimates of the true variance, canbe computed.

7.1 Population of NonrespondentsIn order to do the analysis, a population of non-

respondents must be defined. A school k is chosento be a member of the nonrespondent population by

independently selecting k, proportional to PkNR

pr is determined to obtain an expected X% un-weighted nonresponse rate in s.

p kNR =(Ilk E pk2) NcNR

k ec

pk is the probability of selecting k in s

c is an imputation cell

N NR is the expected number of nonrespondents in c

NPR =(encr,NPR)/(Eencrc)CEi

enc is the expected sample size in c (i.e., E pk)k ec

rc is the relative rate of nonresponse in c

(i.e., 2 for central city schools, 1 otherwise)

NPR is the expected number of nonrespondents

required for the analysis in stratum 1

NPR =(Eenc) X (X /100)ce/

It is easy to verify: 1) the expected nonresponserate for s will be X%; 2) the relative rate of nonre-sponse in central city cells is twice that of othercells; and 3) larger schools, as measured by ph ,

will have a higher probability of being selected asnonrespondents.

7.2 Simulation Sample DesignThe Schools and Staffing Survey (SASS) is a

stratified probability proportionate to size sample ofelementary, secondary and combined schools. Theselection is done systematically using the squareroot of the number of teachers per school as themeasure of size. State by school level cells definethe stratification. Before selection, schools aresorted to provide a good geographic distribution.Estimates are designed to provide state estimates.

In order to assure unbiased variance estimationusing half-sample replication, the simulation designhas been slightly altered. Each state/school levelstratum has been split into a number of strata sothat exactly two schools are selected within eachstratum, while maintaining the original sample size.Another modification is that the selection withinstratum is done with replacement.

7.3 EstimatesThree estimates per state are computed:

Ye = E SkPke; Ym = ESkP ; andkes k es

Sih = SkP khk es

Sk is the known number of students in school k

pke is the proportion of the students in school k

in grades pre - kindergarten to 6

ph, is the proportion of the students in school k

in grades 7 to 9

pkh is the proportion of the students in school k

in grades 10 to 12

It is assumed that Sk is known for all k and thatonly the p 's require collection. Therefore, when krequires imputation, a nearest neighbor donor'sp will be applied to Sk . This is a common SASSimputation that can be duplicated from the frame.

7.4 Simulated Variance EstimatesAll variances below, except for VT .) and

VPPz ), are estimated using a fully balanced

half-sample replication variance estimator(Wolter,1985).

48

The four variance estimates computed withineach sample and averaged across samples are:

PR P()TECO') = P() 2 cov67 j," ) V PPz (C2 R )

V(y) is the variance estimate when all samplecases respond

P(j), ) is the variance estimate of 5).The following estimates can not be computed

from a single sample. However, they can be com-puted in the simulation setting and are used forcomparison purposes. The first is the variance esti-mate using a true estimate of the covariance and thesecond is an estimate of the true variance.

Vn (f) = P(Y) 2 cov(Y. , 5)) V PP` (dR )

VT(5,.)= 1/ nE(i:;.5 .T.$)2s=1

5).sand 3,-s are the value of y. for the 5th simulation

and the average of the in,s, respectively.8. Analysis Statistics

To evaluate the imputation methodology relativebias of the estimated standard error (RB) and cov-erage rates ( C, and Cm) are computed.

RB =(0F55 V7117751.)) V71,1(5Q

Yi(') is one of the variance estimates defined above

C, is the coverage rate testing whether the true

estimate with complete response is in the 95%

confidence intervalCm is the coverage rate testing whether the average

of the simulated estimates (Y.,) is in the 95%

confidence intervalSince the estimates with imputation are biased,

the levels for C, are unknown. One however, ex-

pects them to be smaller than 95%. Since Cmshould be closer to 95%, it is used for comparison.9. Results

Tables 1-3 provide results for the populations ofnonrespondents, 30%, 15% and 5%, respectively.

By comparing C, and Cm for 1!(5). ) , it can beseen that the bias in 5/. , relative to the coveragerates, is small. The only exception is state 24's _Phin table 1, where the coverage rates differ by 9points (90-81).

The half-sample coverage rates ( C, 's for P(y) )

are usually less than 95%. Since C, for V(y) pro-vides the coverage rate with complete response, it

will be assumed that the best imputation coveragerate, instead of being the one closest to 95%, willbe the coverage rate coming from the procedure

that gets closest to the C, 's for V(Y) .From table 1, it can be seen that when the nonre-

sponse rate is high VEC (2) is superior to VR (2) .This is true for both relative bias and coveragerates. VEC (f) has better coverage rates 11 times,

while PR (2) is better only once; 17E4) has

smaller relative biases 9 times, while PR (P) is

better only 3 times. Therefore, VEC (2) should beused when the item nonresponse rate is high.

When the nonresponse rate is moderate in size,

tables 2 show that VR (P) and Pa. (P) are compa-

rable with respect to coverage rates. Pa. (P) is

better or equal to VR (2) 7 times, while VR (Y) is

also better or equal to VEC (2) 7 times. With re-

spect to relative bias, P R (1) is slightly better than

EC (C;) VR (7) has smaller or equal relative biases

8 times, compared to VEC (P) being better or equal5 times. Hence, when computation simplicity is

important, VR (1) can be used. However, it may

still be safer to use VEC (P) .

When the nonresponse rate is low, table 3 shows

that VEC (P) has better coverage rates than PR (i) .

VEC (1) is better or equal to VR (P) 10 times, while

VR (Y) is better or equal to VEC (i) 7 times. How-

ever, VR (P) has smaller relative biases. VR (2) is

better or equal 9 times, while VEC (P) is better orequal 4 times. This probably means that it is safe to

assume the terms subtracted in VEC (Y) are close to

zero and VR (P) can safely be used to estimate the

total variance. However, since VEC (2) has bettercoverage rates, it should still be considered.

Tables 1-3 shows that VEC (P) and P (Y) workequally well for relative biases. This means that for

estimating the total variance, cew(a,P) is a goodapproximation for cov(5i. .

10. ConclusionsFor the design and imputations simulated,

VEC (CO should be used when the item nonresponse

rate is high, although PR (2) is still an improve-

ment over P(Y.) . When the item response rate is

moderate or low P R(Y) provides good results andgiven its simplicity, can be used to estimate the to-tal variance.

Table - 1 Relative Bias (RB) and Coverage Rates ( CT and C, ) for Population with 30% Nonresponse

30% Nonresponse Percent RB ( CT ) Percent RB ( C, )State Estimate Po ) P(Y. ) P (3% ) Pic (Y) PEc (CO PR (f)

2.P, 0 (94) -13 (90) -13 (90) 0 (91) 9 (94) 2 (93)

$'m 1 (93) -25 (81) -25 (83) -3 (90) 9 (93) -3 (90)

Yh 1 (91) -28 (80) -28 (81) -11 (85) -5 (89) -14 (86)

9

Yee 1 (94) -28 (83) -28 (83) 2 (94) 0 (94) -13 (89)

Sin, 0 (94) -29 (82) -29 (82) 1 (93) 0 (93) -14 (88)

Yh 1 (94) -15 (88) -15 (89) 3 (94) -1 (93) -7 (91)

10

Ye 1 (91) -22 (85) -22 (85) -7 (90) -5 (91) -13 (88)

s,, 0 (92) -26 (83) -26 (83) -6 (91) -3 (91) -13 (88)

Yh 1 (87) -31 (72) -31 (73) -1 (88) -10 (83) -21 (77)

24.Pe -1 (94) -30 (82) -30 (81) -3 (93) -1 (93) -15 (88)

j). 0 (93) -29 (81) -29 (81) -2 (93) 2 (94) -13 (88)

.i)h 0 (93) -13 (81) -13 (90) 2 (93) 15 (96) 2 (93)

49

Table - 2 Relative Bias (RB) and Coverage Rates (CT and Cm ) for Population with 15% Nonresponse

15% Nonresponse Percent RB (CT ) Percent RB ( Cm )State Estimate y(y) P(P. ) PO. ) P7r (P) VEC (P) PR (1)

2ye 0 (94) -6 (92) -6 (92) 2 (93) 5 (95) 1 (94)

5,. 1 (93) -9 (89) -9 (91) 2 (92) 8 (93) 0 (92)

S'h 1 (91) -1 (91) -1 (92) 2 (91) 9 (93) 3 (92)

9

ye 1 (94) -10 (90) -10 (91) 4 (94) 2 (94) -4 (92)

.P. 0 (94) -11 (89) -11 (90) 6 (95) 4 (94) -4 (92)

is'll 1 (94) -7 (92) -7 (92) 7 (95) 3 (94) -2 (93)

10

.P, 1 (91) -9 (89) -9 (90) 2 (92) 3 (93) -4 (91)

5). 0 (92) -10 (89) -10 (91) 1 (93) 1 (94) -5 (92)

,Ph -1 (87) -10 (82) -10 (86) 7 (90) -3 (86) -9 (86)

2451e -1 (94) -14 (90) -14 (90) 7 (94) 8 (94) -4 (92)

j). 0 (93) -13 (90) -13 (90) 8 (95) 11 (95) -1 (93)

S'sh0 (93) -5 (92) -5 (91) 5 (93) 5 (93) 0 (93)

Table - 3 Relative Bias (RB) and Coverage Rates ( CT and Cm ) for Population with 5% Nonresponse

5% Nonresponse Percent RB ( CT ) Percent RB ( Cm )State Estimate pc), ) p(y. ) fl(y. ) px (p) pEc

(2)pR

(2)

25,', 0 (94) -4 (92) -4 (93) -1 (94) -1 (94) -2 (93)

.p. 1 (93) -7 (88) -7 (91) 11 (95) 8 (93) 0 (92)

..f'h 1 (91) -6 (88) -6 (90) -6 (87) 3 (91) -2 (91)

9

Si, 1 (94) -3 (93) -3 (93) 6 (95) 5 (95) -1 (94)

i). 0 (94) -5 (92) -5 (92) 2 (94) 3 (94) -2 (93)

.Ph 1 (94) -1 (94) -1 (94) 0 (93) 0 (93) -1 (93)

10

.-y'e 1 (91) -2 (91) -2 (91) 5 (92) 2 (91) -1 (91)

.p. 0 (92) -1 (92) -1 (92) 2 (92) 2 (93) 0 (92)

i''n -1 (87) -2 (87) -2 (87) -2 (88) -2 (87) -2 (87)

243% -1 (94) -7 (91) -7 (92) 1 (95) 1 (94) -4 (93)

5,'m 0 (93) -5 (90) -5 (92) 4 (94) 4 (93) -2 (92)

i)h 0 (93) 1 (93) 1 (93) 1 (93) 2 (93) 1 (93)

ReferencesCochran, W.G. (1977), Sampling Techniques, NewYork: John Wiley & Sons, p253.Rao, J.N.K., Shao, J (1992), "Jacknife VarianceEstimation with Survey Data Under Hot Deck Im-putation", Biometrika,79, pp. 811-822.Rubin, D.B. (1978), Multiple Imputation for Nonre-sponse in Surveys, New York: John Wiley & Sons.SAmdal, C.E (1990), "Methods for Estimating thePrecision of Survey Estimates When Imputation

50

Has Been Used", Proceedings of Statistics Can-ada's Symposium '90: Measurement and Improve-ment of Data Quality, pp.369-380.Sarndal, C. E. (1994), "Estimation of the Variancein the Presents of Nearest Neighbor Imputation",Proceedings of the Section on Survey ResearchMethods, ASA Annual Meetings, pp. 888-893.Wolter, K.M. (1985), Introduction to Variance Es-timation, New York: Springer-Verlag.

WHERE WILL IT ALL END?SOME ALTERNATIVE SASS ESTIMATION RESEARCH OPPORTUNITIES

Steven Kaufman and Fritz ScheurenFritz Scheuren, The George Washington University, Washington, D.C. 20052-0001

KEY WORDS: GENERALIZED REWEIGHTING,MASS IMPUTATION

1. INTRODUCTION

For 1993-94, both NCES's Schools and StaffingSurvey (SASS) and its Private School Survey (PSS) wereconducted. SASS has a large private school component aspart of its overall sample. PSS is essentially a census ofprivate schools -- but with considerably less item contentthan SASS. Over the last two years at these Joint StatisticalMeetings, a modified Generalized Least Squares (GLS)estimator has been explored in SASS to see if we couldachieve simultaneous intersurvey consistency in comparabletotals for schools, teachers, and students between SASS andPSS.

We were successful (Scheuren and LI 1996); but,in the end, the results were ultimately disappointing in thatso little use had been made of the exceptionally rich PSSdata. If our goal had been more general -- say, improvingSASS estimates -- not just trying to achieve a limitedconsistency with PSS, then alternatives to GLS becomeattractive. In this year's paper we look generally at what canbe done when conducting a survey in a data rich setting.These general observations are then applied to privateschool surveys. Among the methods advocated in thissetting, "mass imputation" is given special attention, since ithas not been used as often as its considerable strengthsmight warrant.

Organizationally, the present paper is divided intofour parts: This introduction begins the discussion (Section1). A summary of our results with Generalized LeastSquares (GLS) estimators makes up Section 2. Section 3presents an alternative to reweighting SASS whichlas beencalled "mass imputation." This technique, now roughly 20years old (Colledge et al. 1978), imputes records from asurvey back to its sampling frame; and, in a sense, operatesin malting estimates as if there had been a census. The finalsection discusses some "What Nexts."

2.GLS ESTIMATION IN SASS

As already noted, a Generalized Least Squares(GLS) technique was used to achieve simultaneousconsistency or near consistency in totals for schools,teachers, and students between the private schoolcomponent of the 1993-4 SASS and the 1993-4 PSS

(Scheuren and Li, 1996). This technique is described brieflybelow (subsections 2.1 and 2.2), then our 1993-4 resultssummarized (subsection 2.3). A discussion of our (possiblymisplaced) expectations concludes the section (in subsection2.4).

2.1 Generalized Least Squares. Advocated by Deville andSiundal (1992), GLS can be used (as in Imbens andHellerstein 1993) to achieve consistency between SASS andPSS. To see how GLS works in this setting it is necessary todefine some notation; in particular --

v.', is the original SASS Private School baseweight for the ith SASS observation,i=1,...,n.

is the SASS total of teachers for ithSASS observation, i=1,...,n.

si is the SASS total of the students for theith SASS observation, i=1,...,n.

N is the total estimated number of schools,as given by PSS.

T is the total estimated number of teachers,as given by PSS.

is the estimated total number of schools,as given by PSS.

In reweighting SASS three constraints are imposed on thenew weights u,,

E u, = N

E u,t, = T

Eu,s, = s

For our application the new weights Lk, subject to theseconstraints, are to be chosen, as in Burton (1989), tominimize a loss function which can be written as the sum ofsquares, E (u; - wy . Motivating this loss function here isoutside our present scope, except to say that the sensitivity

51 58

of the final results to the loss function chosen (e.g.,Devilleand Sin-tidal 1992; Deville et al. 1993) seems not to be toogreat. Now the usual Lagrange multiplier formulation ofthis problem yields after some algebra that the new weightsare of the form u, = w, + 1, + AA, + 1,s, , where theI's are obtained from the matrix expression .d = M1 withthe vector d consisting of three elements, each a differencebetween the corresponding PSS and SASS totals for schools(first component), teachers (second component), andstudents (third component); in particular

N - E w,

T - E w,t,

S E w,s,

where the summations are over the SASS sampleobservations and the quantities: N, T, and S are known PSStotals for schools (N), teachers (T), and students (S)respectively.

The matrix M is given by:

Et,

Et. Et:,

Es, Et.s, Es2,

and tl is the vector of unknown GLS adjustment factorsobtained from = W'sl. Notice that the M matrix isbased solely on the unweighted sample relationships amongschools, teachers and students. This is not an essentialfeature of our approach; and, indeed, had we used anotherloss function, a weighted version of the M matrix could havebeen employed.

2.2 Olkin Modified GLS. Based on concerns aboutnegative weights raised in our pilot application of GLS(Scheuren and Li 1995), it seemed worthwhile to see if aratio adjustment step could be introduced before the GLSalgorithm was employed. An old idea of Olkin (1958)

formed our starting point. Assume we have a total t, say,of student enrollment in the current application. Supposefurther, as is the case here, that this is to be estimated froma sample. Following Olkin we tested a multivariate ratio

estimator fort of the form Y = a, R, w. + a, R, t + a, R,where the a, are positive and add to 1, where the w., t., and

s. are sample totals, as before; and the R, are the ratios R,= S/N, R, = SIT, and R, = S/S. In our application, the a,were simply chosen to be equal to one-third; however, amore natural approach would be to select them so as tominimize the variance of Y. Given the complex sampledesign of SASS, though, this has been left for the future.

In principle, an Olkin adjustment to the originalweights could be produced within whatever domain isdesired; then in order to determine the "new" weight for thatdomain, all the cases would be adjusted such that they wouldhave new weights u, =rw, where the overall ratio r isobtained by taking Y and dividing it by the correspondingestimate obtained from the original sample. The intuition isthat if the Olkin estimation is first carried out for small(appropriate) subdomains, then there would be a directbenefit from this step in those subdomains; and, amongother things, the number of negative weights reduced..

2.3 Results of 1993-94 GLS Reweighting. -- For the ninetypologies that make up the private school populationseparate GLS and Olkin GLS reweighting attempts weremade. Sometimes this was straightforward; sometimesextremely difficult. In each case the complex nature of thePSS and SASS sample designs was considered, operationalproblems were documented, and independent comparisonswere made to PSS school size and community typeinformation. Measures of benefit and harm could bedeveloped because of the comparisons possible. Extensivetabular, graphical, and analytic material have been looked atin making the assessments required (Scheuren and Li 1996).Table A at the end of this paper summarizes the results.

Our operational assessment of the Olkin GLSadjustment to SASS was judged to be good to excellent. Inonly one case, that for the Other Unaffiliated typology wasthe evidence unclear. We consider this typology "unclear"because the Olkin GLS did not work without a considerableamount of ad hoc tinkering.

Based on the independent assessment bycommunity type and school size, the Olkin GLS seemed todo no apparent harm and may have even been of benefit --beyond the basic consistency achieved with PSS. However,in the independent assessment we never judged the OlkinGLS as "excellent" because, especially by community type,the Olkin GLS was never best overall. Regularly, it did a"good" job, usually by school size, but even here theperformance was less than hoped. In three cases, the OlkinGLS was judged only "fair." These were instances wherevery mixed results were achieved: Some estimates muchimproved, others quite adversely impacted.

2.4 Discussion. -- While a detailed comparison was alsoconducted for the Basic GLS, we have omitted any detailedcomment on it here, because generally the Basic GLS- wasinferior to the Olkin GLS.

52 v9

Frankly, as already noted, our expectations werereally not satisfied, even in the Olkin GLS case. Uponreflection, we feel that part of our problem is that theexpectations were misplaced. After all, why shouldintroducing just three totals from PSS make a bigimprovement in SASS. Conversely, why should such aseemingly small change sometimes be so hard?

3 MASS IMPUTATION ALTERNATIVE

Because the positive benefits of the Olkin GLSwere often disappointing, we began to question the entirereweighting approach. In order to avoid negative and smallweights we had already set aside a few of the largest schools(about a half dozen to a dozen per typology) to be imputedrather than reweighted. We asked ourselves why not domore than just a few? In fact, why not impute the entireSASS file to the PSS in order to take full advantage of theopportunity that having PSS and SASS fielded for the sameyear offered? In other words, why not do "mass imputation"?

In particular, let us suppose that mass imputationwere to be conducted as part of an overall change in SASSestimation. How would it be done? Suppose, for the sake ofdiscussion, that we had PSS and SASS done in the sameyear. What would the steps be?

Take a specific typology, "Other ReligiousUnaffiliated" Schools. For the 1993-94 round of SASS,there were 329 schools in the SASS sample with thisdesignation. In the corresponding PSS for the same period,there were 3,193 such schools. The original SASS estimateof students in other religious unaffiliated schools was462,934. From PSS, the estimate was 37,578 smaller-- at425,356 students.

An 011cin GLS reweighting approach was taken tothis problem to "solve it." However, as noted earlier, wewere not satisfied that we had done enough to use the PSSdata to improve SASS. If number of students was the majorpredictive variable, a sensible mass imputation method thatcould be applied would be to simply impute the SASSrecords to nearby PSS cases where nearness is definedsimply by student enrollment For parts of the distributionwhere the SASS sample is sparse, the SASS observationcould be used over and over as a donor perhaps up to, say,1.5 times its original SASS weight' Conversely, in parts ofthe distribution where there were lots of SASS cases relativeto those in the PSS, the SASS cases would be used asdonors less often than their original SASS weights would

This factor, about 1.5, is clearly arbitrary and depends onhow much of a potential variance price one is willing to payto get the "nearness" desired. In many weighting settings(e.g., Oh and Scheuren 1987), truncating factors under thesquare root of 2 work well.

suggest. The SASS observation would always be used atleast once, of course, to represent itself.

It may be useful to think of choosing a massimputation approach after successively imputing SASS toeach of the PSS variables separately and looking at howoften each SASS observation was used as a donor. If thisrange of donor use is not too large, then a single, perhapsnearest neighbor, imputation model could work well. Widelydiscrepant values in terms of donor use would suggest thatthe imputation is sensitive to one's beliefs as to thepredictive power of the variables being used in theimputation. In such settings a case can be made for doingseveral different imputations that might be made availableto the final users for possibly different purposes. It may evenmake sense simply to use some convex combination of theseparate imputations

In Kovar and Whitridge (1995), there is anexcellent discussion of mass imputation. Among otherthings, they comment on the parallels that can exist betweenweighting and imputation. They call attention to the work ofFolsom (1981) in this connection. Evidence that imputationmodel sensitivity can be a serious problem exists, as theypoint out -- citing Cox and Cohen (1985), among others.

Difficulties exist in calculating variances andcovariances when using mass imputation. A multipleimputation approach to their estimation has been advocated(Rubin 1996) and could be workable since by design themissing data is missing at random.

In another application, by Wong and Ho (1991),bootstrapping was employed successfully. We think a formof bootstrapping might be the best approach for SASS. Thepresentation by Kaufman (1996), also given at thesemeetings, presents related work.

4. POSSIBLE NEXT STEPS

In this paper we have brainstormed at length aboutpossible improvements in SASS that could be undertakenonly one small aspect of which has been summarized inthese Proceedings (See Scheuren 1996 for further details).We are not of one mind as to next steps. For one of us, someeffort to try out mass imputation seems warranted. Maybe asingle typology should be taken and alternative approachestried out We both agree that mass imputation could be a lotof fun.

On the other hand, efforts at reweighting SASShave not all been explored and an additional look at someform of poststratification, using GLS or some othercalibration estimator makes a lot of sense too, especiallysince it is unlikely that PSS and SASS will ever be fieldedagain in the same year.

BEST COPY AVAILABLE

53 6 0

REFERENCES

Burton, R. (1989). Unpublished Memorandum, NationalCenter for Education Statistics.

Col ledge, M., Johnson, J., Pare, R. and Sande, I. (1978)."Large Scale Imputation of Survey Data,"Proceedings of the Survey Research MethodsSection, American Statistical Association.

Cox, B. and Cohen, S. (1985). Methodological Issues forHealth Care Surveys, Marcel Decker: New York.

Deville, J.C., and SArndal, C.E. (1992). "CalibrationEstimators in Survey Sampling," Journal of theAmerican Statistical Association, 87, 376-382.

Deville, J.C., Sarndal, C.E. and Sautory,O. (1993)."Generalized Raking Procedures in SurveySampling," Journal of the American StatisticalAssociation, 88, 1013-1020.

Folsom, R. (1981). "The Equivalence of GeneralizedDouble Sampling Regression Estimators, WeightAdjustments and Randomized Hot DeckImputations, " Proceedings of the SurveyResearch Methods Section, American StatisticalAssociation.

Imbens, G.W. and Hellerstein, J.K (1993) "Raking andRegression," Discussion Paper Number 1658,Cambridge, MA; Harvard Institute of EainomicResearch, Harvard University.

Kaufman, S. (1996). Properties of the Schools andStaffing Survey's Bootstrap Estimator in Nearest

Neighbor Matching. Paper given at the Chicagomeetings of the American Statistical Association.

Kovar, J. and Whitridge, P. (1995). "Imputation of BusinessSurvey Data," in Cox, Binder, Chinnappa,Christianson, Colledge, and Kott, eds. BusinessSurvey Methods, Wiley: New York.

Oh, HL. and Scheuren, F. (1987). "Modified Raking RatioEstimation in the Corporate Statistics of Income,"Survey Methodology.

Olkin,I. (1958). "Multivariate Ratio Estimation for FinitePopulations," Biometrika, 45,154-165.

Scheuren, F., and Li, B. (1995). Intersurvey Consistency inNCES Private School Surveys. Prepared for U.S.Department of Education, National Center forEducation Statistics.

Scheuren, F. and Li, B. (1996). Intersurvey Consistency inNCES Private School Surveys for 1993-94.Prepared for U.S. Department of Education,National Center for Education Statistics.

Whitridge, P. Bureau, M. and Kovar, J. (1990). MassImputation at Statistics Canada. U.S. Bureau ofthe Census, Sixth Annual Research Conference.

Wong, W. and Ho, C. 1991. "Bootstrapping Post-Stratification and Regression Estimates from aHighly Skewed Distribution," 1991 Proceedingso, the American Statistical Association, Sectionon Survey Research Methods.

Table A. 011dn GLS Comparisons to Original Weighted SASS Data, By Typology

SASS Operational IndependentTypology Assessment Assessment

Catholic Parochial excellent goodCatholic Diocesan excellent fairCatholic Private excellent good

Conservative Christian good fairOther Affiliated excellent goodOther Unaffiliated unclear good

Non-sectarian Regular good goodNon-sectarian Special

Emphasis good goodNon-sectarian Special

Education good fair

Notes: The admittedly subjective conventions employed in table A were devised to separate typologies by level of perceiveddifficulty or benefit This was done as follows:

(1) Operationally typologies where a simple visual inspection was all that was needed to remove outliers are labelled"excellent" in the operational assessment column.

(2) Typologies labelled "good" operationally were ones where an analytic (potentially iterative) process was required toidentify SASS cases that might best be treated by imputation to similar PSS cases rather than being reweighted.

(3) Operationally, only in one case, that of the "Other Unaffiliated" typology was the label "unclear" used. This was donebecause constructing the 011cin GLS weights proved enormously difficult and required great patience and persistence.(Parenthetically, this typology may, also, have been the most instructive in terms of learning more about how to employ theGLS.)

(4) Based on the independent assessment by community type and school size, the Olkin GLS seemed to do no apparent harmand may have even been of benefit -- beyond the basic consistency achieved with PSS. The comparisons made are to theoriginal SASS weighted data

(5) The independent assessment column was never coded "excellent" because, especially by community type, the 011cin GLSwas never best overall. Regularly, it did a "good" job, usually by school size, but even here the performance was less thanhoped In three cases, the 011cin GLS was judged only "fair." These were instances where very mixed results were achieved:some estimates much improved, others quite negatively impacted.

BESS COPY AVPd LAKE

55 6 2

Estimating State Totals from the Private School Universe Survey

Easley Hoy, Beverley Causey, Leroy Bailey, Bureau of the Census, Steven Kaufman, NCESEasley Hoy, Bureau of the Census, Room 3000 -4, Washington, DC 20233

Key Words: Coverage Adjustment, Indirect Estimation,Raking

I. Background and HistoryThe Private School Survey (PSS) is designed and

conducted by the Census Bureau for the National Center forEducation Statistic to collect data for all private secondaryschools in 50 states and D.C. Every two years the surveycollects data in an attempt to obtain a complete count for allprivate schools along with counts of students, teachers, andgraduates.

The survey collects data from an administrative list ofprivate schools. To improve the coverage of this list frame,additional lists were obtained from private schoolassociations, state records, and other sources. These listswere matched and unduplicated with the list frame. Theseoperations added about 4900 schools for 1991 and about2300 schools for 1993. Despite these efforts, the privateschool list frame remains incomplete, with around 8% ofprivate schools missing from the list. The list enumerationestimates are therefore supplemented by a followup areasample that aims to find and represent unlisted privateschools. While direct estimation from the followup areasample produces estimates for unlisted schools of adequateprecision for the four geographical regions, it fails to do sofor individual states. This paper reports the empiricalresults of an alternative method for providing estimates ofsuch state totals.

II. Current Methodology from the PSSFor this follow up survey, a stratified sample of primary

sampling units (PSUs) is drawn with probabilityproportionate to size (PSS). The PSUs are comprised ofcounties or groups of counties and the strata cross stateboundaries. Eight of the largest counties are included withcertainty and about 115 PSUs are selected withnoncertainty. In each sample PSU, seven different sources(e.g., yellow pages, local government offices, etc.) wereused to identify missing schools which were not on the listframe. For 1991, the search identified a total of 355missing schools, and for 1993 there were 421 such schoolsidentified.

For direct estimation, each "added school" is firstmultiplied by its sampling weight (the reciprocal of thePSU's probability of selection). Then the weighted schoolsare added up to the PSU level, summed over PSUs withina state to obtain state totals, and summed over states toobtain the four region totals of the number of schools,missed by the list frame. Similar estimates can be obtained

for students, teachers, and graduates. Such regional totalshave adequate precision, but the state totals are dependentupon which of a states' PSUs are selected for the follow upstratified area sample. For the largest states, there shouldbe no problem, as they will surely have at least one samplePSU with which to estimate its private schoolundercoverage rate. However, for the smaller states, theymay not have a sample PSU in the follow up area surveyand thus there would be no estimates of uncoverage rate forsuch states. The direct estimation approach does provide aunbiased estimator of total added schools to an area's total,and the sampling variance is relatively easily estimated.

III. Proposed MethodologyFor the indirect estimator, each sample PSU receives

the actual number of unweighted added schools from thefollowup survey. Each nonsample PSU school gets anupward adjustment based on the sample PSUs' estimationof the add rates.1. Assume for the moment an across the boardadjustment. Also suppose that in the area frame sample, thesample PSUs have a total of n schoolsFor school i let w; be the weight associated with the school

y, = 1 if school is an add0 if school is an orienal (originally on

the list frame)Let p be the weighted proportion of school adds.

p = ( E w; yi) /w. with w. = E w,

Then q = 1-p is the weighted proportion of original schools.Consequently if 1/q is the ratio of weighted total schools tothe number of schools in the original listing. Let r = 1/q bethe adjustment factor by which each school outside thesample PSUs is revised upward to reflect undercount. Asit turns out, equivalently, we will be estimating for each listschool outside the sample PSUs, p/q schools on average tobe added. Since

r = 1 + P = 1 - P 1.q 1 -p q

2. Knowledge of PostrataValues of p and r can be quite different for different

kinds of schools. NCES has identified 9 groups of suchschools:

57

63

1. Catholic Parochial2. Catholic Diocesan3. Catholic Private4. Conservative Christian5. Other Religious, Affiliated6. Other Religious, Unaffiliated7. Nonsectarian Regular8. Nonsectarian Special Emphasis9. Nonsectarian Special Education

Such groupings helps to distinguish between different"coverage patterns for different schools. Therefore we willdevelop a value of p within each poststratum j. But also wecan use that fact that big schools, (number of students) areeasier to find than small schools. So rather than just fit avalue pi for poststratum j we fit a relationship pi (x) so thata school outside the sample belonging to poststratum j andsize x receives an adjustment factor ri(x)

Within the poststratum j we will drop the subscript jand fit a logistical regression.p = 1/[1+exp (A + Bx)] subject to the constraints

Y. °0 E w; R, = E w; y=1 (1)

Yt °0 E wi xrlti = Ewi x; y=1

Pi 1Where RI and the summation

qi exp (A + B xi)over i on the left hand side of the equation is where yi=0

(original schools) and the summation on the right hand sideof the equation is for yt=1 (add schools). We can now solvefor coefficients A&B to be used for deriving the adjustmentfactors. The adjustment for the ith school is ri= 1 + R1.

(2)

Table 11993 Coefficients for Linear Adjustment

Poststratum A1 4.2144 .61372 4.5463 .07423 1.2842 1.62144 2.9289 .23795 .9181 1.02556 1.6024 .58077 1.6024 .58078 .9071 .62129 1.8757 2.3367

For the estimate of teacher adds, we refit A values with thex variable representing teachers in equation (2). For theestimate of graduate, data were sparse, so we collapsedacross poststrata j to one strata.

IV. Results

A. Application Rules of Proposed Methodology forState Totals

The following summarizes the procedure for stateestimation in the PSS:

1. A school, its students, teachers and graduatesfrom a sample PSU are included in the adjustedstate counts (including adds) but not in originalcount if it is an add.

2. A shool its students, teachers, and graduatesfrom a non sample PSU are in the originalcount and with multiplied factor of r in theadjusted count.

For each state, we derived original and adjusted counts ofschools, students, teachers, and graduates; also the ratio ofstudents to teachers, and the ratio of adjusted to originalcounts were computed for each of these categories. In thefull paper and the appendix, to be published in a separatedocument later, the adjustments for all 50 states and D.C.for both 1991 and 1993 will be provided. The range ofadjusted values for most states was 8-12% for 1991 and 5-11% for 1993. The 1993 adjustment results for 6 states(California, Indiana, New York, Texas, Vermont, andWyoming) are provided in Tables A, B, and C at the end ofthis paper as examples for discussion.

We begin the discussion with Table A. The numbersof Private Schools, Students, Teachers, Graduates, andStudent-Teacher Ratio, by States 1993. Note for each statethe ratio adjustment for schools is greater than that forstudents and teachers. For the states of California, NewYork, and Indiana, the ratio factors are near average. Forthe states of Texas, Vermont, and Wyoming, the ratiofactors are above average.

58

B. Variance Estimates for Proposed Methodology

For the proposed methodology, there are twocomponents of variance. The first component is thesampling error due to use of sample to estimateparameters A & B of the model. The secondcomponent reflects the variability due to the model. Itturns out that the second component is many timeslarger than the first component.

Table B provides the adjusted estimates withstandard deviations reflecting both components ofvariance.

Note that the standard deviations are small relativeto the estimates. Also the first component is asmaller proportion to total error.

C. Raking Adjustment to Regional TotalsFor precision and consistency sake, we want the

64

regional totals for the proposed method to equalthose for the current method.

Table 2NCES Totals for Regions

Northeast Midwest South West

Schools 6183 7146 7558 05207

Students 1275924 1309211 1386268 865039

Teachers 94622 81862 105509 56128

Graduates 77513 60547 67842 36965

For an area sample PSU's actual count, it includes theoriginal list plus the unweighted adds. Subtract from theregional totals the area sample PSU actual count above toobtain a set of reduced regional totals. For each region werake the nonsample PSU estimates across the board so thattheir sum equals the reduced total.

S

S

S

Table 2 above gives the 1993 regional totals based on thecomplete enumeration plus the follow up area sampleestimates of the adds. The raking factors associated withthe proposed procedure are given in Table 3.

Table 3Raking Factors for Region

Northeast Midwest South West

Schools .9789 .9725 1.0518 .9557

Students .9933 .9813 1.0156 .9838

Teachers .9880 .9796 1.0199 .9862

Graduates .9962 .9973 1.0044 .9862

BEST COPY AVAILABLE

Note that the factors are all within 5% of 1.0000. It seemsin the South our indirect estimator slightly over estimatedadds.Finally, for the raked counts by state for 1993 for theproposed method we have Table C. The table entries whichare referred to as scaled counts are the adjusted state countsof Table B after the application of the raking procedure.The indicated errors are root mean square errors. The biascomponent of the estimates were derived from estimates ofbias at the regional level, based on differences betweentotals using the current estimation procedure (unbiased) andthe proposed indirect metnod. These regional biasestimates were proportionately allocated across therespective states.

V. Summary and Recommendations

On the basis of our test results, we recommend theindirect estimation approach to adjust the list framecounts for the PSS along with the raking to regional totalsto obtain the adjusted state totals. We would use thepoststratification via NCES and size of school vs anacross the board adjustment or poststratification based onsample design. We have made preliminary evaluations ofthe results of the techniques on 1991 and 1993 data andplan to apply them on 1995 data.

VI. References

1. Broughman, S., Gerald, E., Bynum, L. And Stoner, K.(1994), Private School Universe Survey, 1991-92,National Center of Education Statistics, StatisticalAnalysis Report NCES 94-350, Washington, DC: U.S.Department of Education.2. Hartley, H.O., J.N.K. Rao, and J. Kiefer (1962),"Sampling with Unequal Probabilities and WithoutReplacement," Annals of Mathematical Statistics, 33,350-74.3. Woodruff, R.S. (1971), "A Simple Method forApproximating the Variance of a Complicated Estimate,"Journal of the American Statistical Association, 66, 411-14.

59

65

Table ANumbers of Private Schools, Students, Teachers, Graduates, and Student-Teacher Ratio, by States, for 1993

CALIFORNIAUnadjustedAdjusted

Ratio A/U

Schools30093224

1.071609

Students562847576047

1.023451

Teachers3450135718

1.035274

S-T Ratio1616

0.988580

Graduates2374624039

1.012336

INDIANAUnadjustedAdjustedRatio A/U

Schools619686

1.108312

Students9198595447

1.037631

Teachers61386450

1.050840

S-T Ratio1414

0.987429

Graduates40124097

1.021151

NEW YORKUnadjustedAdjustedRatio A/U

Schools18651974

1.058298

Students464172472562

1.018076

Teachers3381234735

1.027305

S-T Ratio1313

0.991016

Graduates2568226048

1.014259

TEXASUnadjustedAdjustedRatio A/U

Schools10241177

1.149020

Students186975199967

1.069484

Teachers1452915728

1.034274

S-T Ratio1212

0.987960

Graduates74247787

1.048821

VERMONTUnadjustedAdjustedRatio A/U

Schools8498

1.165207

Students91079648

1.059432

Teachers945

10211.080959

S-T Ratio99

0.980085

Graduates10811116

1.032137

WYOMINGUnadjustedAdjustedRatio A/U

Schools3440

1.185055

Students19182112

1.101031

Teachers167192

1.148996

S-T Ratio11

11

0.958255

Graduates2935

1.210001

C6

60

111

Table BAdjusted Figures and Standard Deviations, with Percent of Variance Attributed to 1st Component, for 1993

CALIFORNIAAdjustedStand. Dev.% 1st Comp.

Schools3224

3328.982

Students576047

296715.746

Teachers35718

24020.568

S-T Ratio16

0.0373

Graduates24039

1046.478

INDIANAAdjustedStand. Dev.% 1st . Comp.

Schools686

1639.289

Students95447

121313.057

Teachers6450

9616.186

S-T Ratio14

0.0643

Graduates4097

582.955

NEW YORKAdjusted

Stand. Dev.% 1st Comp.

Schools1974

1922.620

Students472562

25128.874

Teachers34735

2499.750

S-T Ratio13

0.0337

Graduates26048

1314.776

TEXASAdjustedStand. Dev.% 1st Comp.

Schools1177

15

23.996

Students199967

182911.081

Teachers15728

16014.921

S-T Ratio12

0.0359

Graduates7787

664.893

VERMONTAdjustedStand. Dev.% 1st Comp.

Schools984

13.271

Students9648272

6.618

Teachers1021

357.326

S-T Ratio9

0.0975

Graduates1116

382.258

WYOMINGAdjustedStand. Dev.% 1st Comp.

Schools40

29.514

Students2112

933.925

Teachers192

11

5.630

S-T Ratio11

0.2242

Graduates35

5

_ 1.258

BEST COPY AVAILABLE

61 67

Table CScaled Counts, and Student-Teacher Ratio, with Associated Errors, by State, for 1993: Proposed Method

CALIFORNIAScaled ValueError

Schools3081

140.045

Students566722

9627.003

Teachers35116

636.866

S-T Ratio16

0.563

Graduates23707

342.895

INDIANAScaled ValueError

Schools667

24.163

Students93663

2117.049

Teachers6319

159.529

S-T Ratio14

0.853

Graduates4086

58.888

NEW YORKScaled ValueError

Schools1932

44.925

Students469394

4016.141

Teachers34317

481.261

S-T Ratio13

0.281

Graduates25950

163.220

TEXASScaled ValueError

Schools1238

66.289

Students203089

3675.202

Teachers16041

359.663

S-T Ratio12

0.367

Graduates7821

75.029

VERMONTScaled ValueError

Schools96

4.818

Students9583

278.540

Teachers1009

36.876

S-T Ratio9

0.737

Graduates.1112

38.791

WYOMINGScaled ValueError

Schools39

2.814

Students2109

92.107

Teachers191

11.296

S-T Ratio11

1.828

Graduates36

5.023

6862

EFFECT OF HIGH SCHOOL PROGRAMS ONOUT-MIGRATION OF RURAL GRADUATES

Gary Huang, Synectics for Management DecisionsMichael P. Cohen, National Center for Education Statistics

Stanley Weng, and Fan Zhang, Synectics for Management DecisionsStanley Weng, Synectics for Management Decisions, Inc., Arlington, VA 22201

Key Words: High School and Beyond, Scaletransformation, Multilevel logit modeling, Randomintercept, Weighting

This article presents a study which addressed apolicy research issue regarding outmigration amongrural high school graduates. The study analyzed datafrom a national longitudinal survey, the High Schooland Beyond (HS&B) by the National Center forEducation Statistics, using two-level logit models forgenerating reliable estimates of organization effects onindividual behavior. This study examined the post-school outmigration pattern in connection to students'coursework and schools' curriculum, focusing on theeffect of academic program versus vocational programadjusting for the effects of local labor market conditionand student sociodemographic background. This studyalso systematically examined the implementation oftwo-level logit models through the software MLn.Related statistical issues were addressed.

1. IntroductionRural America has been experiencing chronic

depression economically and psychologically, as aresult of more than a century's industrialization andurbanization (Theobald, 1990). Rural-to-urbanmigration of educated rural youth is a factor leading toeconomic marginality and community decline in ruralAmerica (McGranahan, D.A. & Ghelfi, L.M., 1991;O'Hare, 1988; Reid, 1990). Some analysts say that theexisting public school curriculum has contributed to theproblems (Berry, 1990; Theobald, 1992; Snauwaert,1990; Thompson & Kutach, 1990). Schools, theyargue, have failed in educating the young to strengthentheir community identity and to preserve theenvironment and community. Typical rural schoolprograms provide predominatly occupational skills,which essentially prepare students for the urban labormarket. Thus, rural schools are contributing to thedecline of rural areas (Berry, 1990).

Some rural advocates suggest that it is necessary tostrengthen liberal arts and humanities education, ratherthan occupational training, in rural schools (Berry,1990, 1989,). The idea has drawn attention from somerural educators who support strengthening liberal artsand humanities curricula (Theobald, 1992; Snauwaert,

1990; Thompson & Kutach, 1990). They believe thatliberal education can nurture the understanding andappreciation of the links between man and the land,local heritage and the global civilization, and helpstudents explore meanings of rural life and maintaintheir attachment to the community. Only with a strongroot in the community, is it possible for young peopleto contribute to the community development whileachieving personal wellness.

On the other hand, many aruge that vocational andtechnical education oriented toward providingmarketable skills to students are essential for localcommunity development as well as for individualearnings (e.g., Klerman & Karoly,1995; Vandegrift &Danzig, 1993; Muraskin, 1993). In this perspective,vocational/technical training, often developed to fulfillthe local market needs, not only helps students get jobsbut also foster entrenprieniorship that contributes tolocal economic development (Finch, 1993; Sher, 1977).

Further, vocational and technical education may beparticularly helpful for youth who are disadvantaged insocioeconomic background, disabilities, and rurallocale by training them to learn cognitive skills in job-specific contexts (Teixeira & Swaim, 1991). Theprograms require a shorter period for student tocomplete than academic programs do, and allow themto promptly apply learned knowledge and skills to theirworkplace (Muraskin, 1993; Finch, 1993). Suchfeatures in vocational/technical education maycontribute to graduates' success in labor market of localareas or elsewhere.

If we are to address problems in rural development,it is critical for research to understand the mechanismin which educated youth are moving out from ruralareas (Teixeira, 1993). The question about the functionsof the content of rural education regarding outmigrationcalls for systematic research. This study is a response tosuch research needs. We recognize, however, thedifferential benifits of specific school programs todifferent student subgroups, the unique concerns oflocal communities, and the diverse contexts whereinschooling takes place. The study is not meant to offer auniform answer to the question as what kind ofprogram is suited to a particular community.Prior Studies

The research literature on rural outmigration hasbeen largely motivated by economic and demographicconcerns. In dealing with the factors responsible forrural outmigration, research often focused on jobmarket; life quality; housing conditions; anddemographic background of migrants such as race, age,income, and residential history (e.g., Cromartie, 1993,Sandefur & Jeon,1991; Voss & Futuitt, 1991; Johnson,1989; Adamchak, 1987). Educational attainment hasbeen studied as a predictor variable for migration, butthe content of schooling in connection to subsequentmobility has been rarely studied. Little is known aboutthe differential effects of secondary school curriculumand student coursework on later geographic mobility.

There is a study available (Pollard, O'Hare & Berg,1990) that provides a ground for further research. Thisstudy examined a series of correlates to post-highschool migration with HS&B data, including students'enrollment in different curricular programs. Itdescribes a pattern in which a large portion ofoutmigrants-- relative to those who stayed in thecommunity-- were enrolled in high school academicprograms. The analysis however, is not adequate toanswer the questions we pose. While dealing with alarge number of variables in predicting migration, it hasnot focused on the impact of content of schooling, nordifferentiated the effects of school program provisionvis-a-vis individual student coursework.

Building upon this line of research, we examine thethe issue by focusing on the joint effect of studentcoursework and school instructional programs inpredicting post-school moving, adjusting for the effectsof student background and community context. Withthe two perspectives about school curriculum as aconceptual framework, we analyze data simultaneouslyat student- and school-levels to sytematicallyinvestigate the differential effects on outmigration byacademic and vocational \tecnical education in terms ofboth student couretaking and school programprovisions.

The analysis links post-school migration to liberalarts education embodied in academic curriculum incontrast to vocational/technical education geared toacquiring occupational skills. Theoretically, academiclearning differs from that of job-skill acquisition ininfluencing students' willingness to stay in the localcommunity. We consider the school "compositional"effect and the effect of individual students' courseworkas two distinctive variables contributing to post-schoolmobility. The compositional effect at the school levelcan be represented by high enrollment in a particularprogram. High enrollment in a program reflects theschool emphasis on the given program. A largeproportion of students in academic program often

indicates the school's value of academic learning andits commitment of resource and staff to the program.Moreover, high enrollment in academic programprobably suggests an overall school atomospherepressing student toward academic learning. Likewise,high enrollment in vocational/technical programsrepresents the school emphaisis on job skill training andlikely a strong climate that forsters student interest andeffort to acquire such skills.

Normally, students who are in academic programstend to be involved in more liberal arts education(literature, history, math, and science); whereasstudents in vocational and technical programs are morelikely to learn practical, specific occupational skills.Specific coursetaking also should be a factorinfluencing the amount of students' exposure to liberalarts and vocational learning.

2. Study Design2.1 Data

We used HS&B (1980-1992)'s 1980 sophomorecohort data combined with the senior cohort in order tohave adequate within-school student samples. For the1980 sophomore cohort data, we used: (1) the firstfollowup (1982, the senior year) data on coursetaking(from transcripts file), curriculum program (academicvs. vocational), test scores (composite and 1982 IRTscores on reading and math), and other backgroundmeasures; and (2) the third followup (1986) data onmigration. For the senior cohort, similar data from thebase year (1980, the senior year) and the secondfollowup were used. Local economic indicators fromthe Census component were, as school-level variables,attached to each student record of both cohorts.

A group of students whose schools contained lessthan 3 students in the sample were removed from theanalysis. The final file contains 875 schools with a totalof 16,492 students. The within-school sample sizeaveraged 18.8, with a standard deviation of 7.78,ranging from 3 to 48. The student sample representsthe 1980 and 1982 public high school seniors in theU.S. who participated in the HS&B followup surveys asspecified above. The school sample represents thepublic high schools to which these students went.

To improve the data quality, we edited data,including recoding, rescaling, and imputation formissing values. Data were largely compatible betweenthe two cohorts with one exception, the curriculumcredits. For sophomores, an official transcripts surveyprovided valid records of their coursework andcomposites for academic and vocational credits were infile. For seniors, only self-reported coursework isavailable. To retain the good quality of the sophomoretranscript data while making the senior self-reported

64

70

credit data comparable, we took the followingapproach: first, scales for senior self-reported academicand vocational credits were developed; second, bothsenior and sophomore scales were standardized so thatthe measures were statistically compatible.

2.2 VariablesFollowing variables were selected for the analysis:Outcome variable: Outmigration rate (MIGRATE).

A binary indicator for outmigrant. It is defined by thestudent-reported residential location of 50 or moremiles away from the original community four yearsafter high school completion.

Student level variables:Academic coursework measure (ACADCRC). A

standardized score of two source of information:seniors' self-reported course-taking and sophomores'academic curriculum credits from the transcripts surveyfile.

Vocational coursework measure (VOCTCRC).Scaled with the same approach as for academiccoursework.

Senior-year test score (FUTESTC). A standardizedmeasure of academic achievement.

Parents' education (PCOLLEGE). As an indicatorfor student socioeconomic status because of substantialmissing of student economical status composite data.

Minority (MINORITY). A dummy variable (Black,Hispanic, Native American and Others versusotherwise).

School level variables:School academic program enrollment rate

(ACADEMM).School vocational program enrollment rate

(VOCATM).Locale indicator of rural versus others (RURAL).County-level employment growth rate between

1980 and 1982 (CEMPG02). As an indicator of thelocal labor market conditions, used as a school-levelvariable. It was deemed more relevant to post-schooloutmigration compared with the other three indicatorsavailable from the local economic indicators file:metropolitan area employment growth rate, county percapita income relative to the national average, andmetropolitan area per capita income relative to thenational average.

Technical handling of the data included:Imputation Missing values for most variables were

a small portion and imputed with the sample means.MIGRATE, with approximately 3 percent nonresponse,was examined in relation with race, parents' education,program enrollment, coursework, and local economicindicators. A pattern identified with these predictorswas used to generate imputing values. The rate of

I

outmigration among the imputed cases is similar to thatof the total sample.

Centering All student-level variables werecentered around their school mean. Centered metricswas used since level-2 infra -unit correlation was foundsubstantial (for the debate on the issue, see Plewis, etal., 1989).

2.3 Weighting After exploring a number of weightingprocedures, we decided not to use any of them due tothe restrictions imposed by the software and logisticregression procedures. The weighting issue will bediscussed later in section 5.1 of this article.

3. Multilevel Logit ModelingMultilevel logit models (Goldstein, 1995) were used

to analyze the data. The software MLn (Rasbash, et al.,1995; Woodhouse, et al., 1995) was used to implementthe models.

For binary responses yu for student i in school j,

with outcome probability it , the two-level logit model

is described as follows (Goldstein, 1995). At level-1,

= + zuey

where z..e.. is the level-1 binomial error,Y Y

= [TE -7C u )]1/2 , Cr = 1 , and

log it(rc u) = log1 rc

is modeled by a linear predictor that contains studentexplanatory variables with coefficients containingrandom components representing the variation ofschool characteristics. In a baseline model addressingthe question as how much is the variance ofoutmigration across schools, the linear predictorconsists of only a random intercept Boi with

Rof =700+140j (1)

wherey oo is the fixed intercept and uoi is the level-2

error associated with the intercept, uo N(0, a !13).

More elaborate random-intercept-only model containsschool-level explanatory variables:

Po; =7 oo of (RURAL) +y 02(CEMPG02)

+7 03 ( ACADEMM) +y .(VOCATMM) + (2)

Y os (RU_ACADM) +y 06 (RU_VOCTM) zicy

65

71 BEST COPY AVAILABLE

addressing such question as how school-level variablescontribute to the average outmigration rate, and howthey explain the variance of the outmigration rate.

With all the five student variables included in thelevel-1 model, the linear predictor is formed as

R 0, + 13,; (ACADCRC) + p2j(VOCTCRC) +

133i (FUTESTC) + 134, (PCOLLEGE) +

135i (MINORITY).

For our data, we found the slopes in the level-1 modeldon't vary significantly at level-2. Thus we focused onthe random-intercept-only model with fixed level-1covariates.

Table 1 presents the estimates from two random-intercept models: without and with level-1 covariates.

Table 1. Estimates of Two-level Random-InterceptLogit Models (standard error in parenthesis): Thesample of the 1980 and 1982 seniors of US publichigh schools

EstimateWithout With

student variable studentvariable

Fixed effectsSchool mean outmigrt. 1.160(.134) -1.245(.140)

RURAL .499(.053) .809(.214)CEMPG02 -.009(.004) -.009(.004)ACADEMM .806(.206) .794(.215)VOCATMM -.911(.228) -.893(.239)RU_ACADM -.578(.353) -.655(.370)RU_VOCTM -.135(.385) -.182(.403)

ACADCRC .217(.023)VOCATCRC -.128(.021)FUTESTC .041(.003)PCOLLEGE .469(.042)MINORITY .042(.050)Random effectsSchool mean outmigrt. .202(.023) .232 (.025)

Log likelihood 19747.4 18304.2

Analysis of data on non-college goersFindings from the two analyses are largely

consistent. Similar to the pattern found in the totalsample analysis, randome-intercept models with non-college goer data confirm that outmigration ispositively related to school academic curriculum andstudent academic coursework (in the model with level-1 covariates, the two estimates are respectively .794and .217 in logit), and negatively related to school

66

vocational program enrollment and student vocationalcoursework (respectively, -.449 and -.131). The twoanalyses also generate similar estimates of the effects ofother variables at both levels.

4. DiscussionThis study systematically examined the effects of

public school curriculum on rural youth outmigrationwith HS&B logitudinal data. Two-level random-intercept logistic regression models were tested, takingstudent-level logit of outmigration and school averageoutmigration as the outcome variables in simultaneousanalyses. We decomposed the effects into school- andindividual-levels components, controlled for the effectsof local employment conditions and student test scoresand sociodemographic background, and introducedinteraction terms to determine the specific effects ofcurriculum on rural school average outmigration. .

The analyses revealed that (1) school averageoutmigration is positively related to the schoolemphasis on academic programs and negatively relatedto the school emphasis on vocational programs; (2)also, the probability of student outmigration ispositively related to students' academic coursework andnegatively related to students' vocational coursework;(3) these relationships hold in the subsample of theyouth who did not go to college in the period of thefour years after high school completion, as well as inthe total sample; and (4) overall, curriculum effects donot differ in rural schools vis-a-vis schools elsewhere,except that among the subsample of non-college goers,school academic emphasis seem to reduce rural schoolaverage migration in a marginal fashion. In brief,vocational education seems to work better thanacademic programs in retaining youth in the rural areas.

As discussed earlier, the findings from this study areintended to further our general understanding about thethe functions of public school curriculum in connectionto postsecondary student mobility in rural areas. Theycannot be applied to assessing the value of specificprograms in local schools. An implication is thatvocational and technical education programs tailored tolocal economic needs are capable of serving ruralcommunities in retaining educated youth andconsequently contribute to social and economicdevelopment in rural areas. An emerging consensusseems to favor curriculum integration that encompassescooperative learning, experiential education, outdooreducation, in addition to both conventional academicand vocational education.

Future research may be directed to look at morespecific elements of school curriculum contents inexplaining student postsecondary mobility. Withinlarge categories of programs, sub-groups of courses

72

that are more relevant to the needs of localdevelopment may be identified and examined. Forexample, rural areas with tourist resources may benifitfrom educational programs that concentrate in touristbusiness; whereas areas with correctional facilities mayneed criminal justice and other related trainingprograms.

Further, innovative approaches to developingcommunity-oriented curricula should be studied.Community-based curriculum appears promising.Teaching and learning about local communities,including the historical and cultural studies of the place,and practical research into local problems andsolutions, is an effective way to orient students to thecommunty needs. Such programs help studentsunderstand their communities and acquire skills usefulfor improving local conditions. More data on suchinnovative programs are needed for research.

The rapid development of telecommunicationtechnology, for example, is making it possible forprofessionals to work in places that are remote fromurban centers. Rural areas may anticipate opportunitiesfor growth thanks to such development. In respondingto such changes, schools are providing increasinglymore courses on new technology. This sort ofprograms enable students to enter labor market withcompetitive skills or to pursue postsecondary educationwith good exposure to updated technology. Thechanging curriculum is likely to contribute to retainingeducated youth in rural areas. How and to what extentthis actually happens is an issue needs to study.

Finally, it is also interesting to focus on those whohad extensive exposure to academic education yet tookvocational programs as the basis for post high schoolcareer choice. This group, compared to those whomerely took vocational studies and missed a largeamount of liberal arts courses, perhaps would attainmarketable skills while managing to stay in thecommunity after high school graduation. We need toconceptualize such issues and examine them withempirical data.

5. Statistical issuesIn this study, with the implementation of multilevel

logit models, related statistical issues in weighting andmodel assessment were addressed methodologicallyand empirically.

5.1 WeightingWeighting for survey data in multilevel logit

modeling was an issue (Skinner ,1996). Literaturereview showed there were few studies reported inmultilevel logit modeling of binary responses, and none

of them involving use of weighted models (McArdleand Hamagami, 1994; Rodriguez and Goldman, 1995).

Both MLn and HLM (Bryk, et al., 1996), anotherwidely used software for multilevel modeling, canperform weighted multilevel linear modeling, thoughthey have different design of weighting procedure.However, neither MLn nor HLM is ready to performweighted multilevel logit modeling. In fact, HLM'snewly available procedure for hierarchical generalizedlinear modeling (HGLM) doesn't contain a weightingdevice. And for MLn, our examination showed itsweighting procedure is applicable also only to linearmodeling. We thus decided to use unweighted modelsfor this study.

In an atempt to perform weighted multilevel ligitmodeling, we explored the way of adapting MLn'sweighting device to implement weighted logit models.A promising strategy is to use weights as frequenciesfor modeling binary data in the event/trial model syntax(SAS, 1992).

5.2 Intraclass correlationIt was desirable to have a simple measure assessing

the effect of clustering in the context of multilevel logitmodel as does for multilevel linear model. We derivedthe intraclass correlation associated with a random-intercept-only logit model, as an analogue of theintraclass correlation for multilevel linear model(Goldstein, 1995; Bryk and Raudenbush, 1992), usingthe technique of linearization (Rodriguez and Goldman,1995). It takes the following form:

2Cr NO

2ao +[i100-110A-

where a 20 is the level-2 variance associated with the

intercept, and go = logit-1(y ) , where y 00 is thefixed intercept of the logit model. We used this measureto our data. The intraclass correlation is calculated as0.054 for a simple random-intercept-only model (notcontaining school variables). Thus the level-2 varianceaccounts for about 5 percent of the total variance. Thisassessment is consistent with the models we obtained.

ReferencesAdamchak, D.J. (1987). Further evidence on economic

nd noneconomic reasons for turnaround migration.Rural Sociology, 52, 1, 108-118.

Aitkin, M., Anderson, D., Francis, B., and Hinde, J.(1989). Statistical Modeling in GLIM, Oxford:Oxford University Press.

6773

Berry, W. (1990). What are People for? San Francisco:North Point Press.

Berry, W. (1989). The Hidden wound. San Francisco:North Point Press.

Bryk, A.S. and Raudenbuch, S.W. (1992). HierarchicalLinear Models: Applications and Data AnalysisMethods. London: Sage.

Bryk, A.S., Raudenbush, S.W., and Congdon, R.T.(1996). HLM: Hierarchical Linear and NonlinearModeling with the HLM/2L and HLM/3LPrograms. Chicago, IL: Scientific SoftwareInternational.

Cromartie, J.B. (1993). "Leaving the Countryside:Young Adults Follow Complex MigrationPatterns," Rural Development Perspectives, 8, 2,22-27.

DeYoung, A. J. (1987). "The Status of American RuralEducation Research: An Integrated Review andCommentary," Review of Educational Research,57(2), 123-148.

Gober, P. (1993). "Americans on the Move,"Population Bulletin, 48,32-40.

Goldstein, H. (1995). Multilevel Statistical Models, 2ndedition. London: Edward Arnold.

Johnson, K.M. (1989). "Recent PopulationRedistribution Trends in NonmetropolitanAmerica," Rural Sociology, 54, 3, 301-326

Lee, V.E. and Bryk A. S. (1989). "A Multilevel Modelof the Social Distribution of High SchoolAchievement," Sociology of Education, 62, 2, 172-192.

McArdle, J.J. and Hamagami, F. (1994). "Logit andMultilevel Logit Modeling of College Graduationfor 1984-1985 Freshman Student-Athletes,"Journal of the American Statistical Association, 89,1107-1123.

O'Hare, W. (1988). The Rise of Poverty in RuralAmerica. Washington, DC: Population ReferenceBureau. ERIC Document Reproduction No. ED 302350.

Paterson, L. (1992). "The Influence of Opportunity onAspirations amongst Prospective UniversityEntrants from Scottish Schools, 1970-1988,"Journal of the Royal Statistical Society, A, 155. 37-60.

Pollard, K, O'Hare, W. & ??? (1991). "SelectiveMigration of Rural High School Seniors in the1980s," Staff Working Papers. Washington, DC:Population Reference Bureau, Inc.

Rasbash, J., Woodhouse, G., Goldstein, H., Yang, M.,Howarth, J. and Plewis, I. (1995). MLn Command

Reference, London: Institute of Education,University of London.

. 68

Reid, J. (1990). "Education and Rural Development: AReview of Recent Evidence," presented at theannual meeting of American Educational ResearchAssociation, Boston, MA.

Rodriguez, G., and Goldman, N. (1995). "AnAssessment of Estimation Procedures for MultilevelMethods with Binary Responses," Journal of RoyalStatistical Society, A, 158, Part 1, 73-89.

Sandefur, G. d. & Jeon, J. (1991). "Migration, Raceand Ethnicity, 1960-1980," International MigrationReview, 25, 2, 392-407.

SAS (1992). SAS/STAT User's Guide, Version 6,Edition 4. Cary, NC: SAS System Institute.

Skinner, C. (1996). Regression Models for ComplexSurvey Data, presented at the Joint Program inSurvey Methodology, University of Maryland atCollege Park.

Snauwaert, D. T. (1990). "Wendell Berry, Liberalism,and Democratic Theory: Implications for the RuralSchool," Peabody Journal of Education, 67, 4, 118-130.

Theobald, P. (1992). "Rural Philosophy for Education:Wendell Berry's Tradition," ERIC Digest.Charleston, WV: ERIC Clearinghouse on RuralEducation and Small Schools. ERIC DocumentReproduction No.

Theobald, P. (1990). "A Look at Rural Education in theUnited States," Peabody Journal of Education, 67,4, 1-6.

Thompson, P. B. & Kutach, D. N. (1990)."Agricultural Ethics in Rural Education," PeabodyJournal of Education, 67, 4, 131-153.

Voss, P.R. & Futuitt, G.V. (1991). "The Impact ofMigration on Southern Rural Areas of ChronicDepression," Rural Sociology, 56, 4, 660-679.

Woodhouse, G., Rasbash, J., Goldstein, H., Yang, M.,Howart, J., and Plewis, I. (1995). A Guide to MLnfor New Users. London: Institute of Education,University of London.

7 4

Number

94-01 (July)

94-02 (July)

94-03 (July)

94-04 (July)

94-05 (July)

94-06 (July)

94-07 (Nov.)

95-01 (Jan.)

95-02 (Jan.)

95-03 (Jan.)

95-04 (Jan.)

95-05 (Jan.)

Listing of NCES Working Papers to Date

Please contact Ruth R. Harris at (202) 219-1831if you are interested in any of the following papers

Title

Schools and Staffing Survey (SASS) Papers Presentedat Meetings of the American Statistical Association

Generalized Variance Estimate for Schools andStaffing Survey (SASS)

1991 Schools and Staffing Survey (SASS) ReinterviewResponse Variance Report

The Accuracy of Teachers' Self-reports on theirPostsecondary Education: Teacher Transcript Study,Schools and Staffing Survey

Cost-of-Education Differentials Across the States

Six Papers on Teachers from the 1990-91 Schools andStaffing Survey and Other Related Surveys

Data Comparability and Public Policy: New Interest inPublic Library Data Papers Presented at Meetings ofthe American Statistical Association

Schools and Staffing Survey: 1994 Papers Presented atthe 1994 Meeting of the American StatisticalAssociation

QED Estimates of the 1990-91 Schools and StaffingSurvey: Deriving and Comparing QED SchoolEstimates with CCD Estimates

Schools and Staffing Survey: 1990-91 SASS Cross-Questionnaire Analysis

National Education Longitudinal Study of 1988:Second Follow-up Questionnaire Content Areas andResearch Issues

National Education Longitudinal Study of 1988:Conducting Trend Analyses of NLS-72, HS&B, andNELS:88 Seniors

Contact

Dan Kasprzyk

Dan Kasprzyk

Dan Kasprzyk

Dan Kasprzyk

William Fowler

Dan Kasprzyk

Carrol Kindel

Dan Kasprzyk

Dan Kasprzyk

Dan Kasprzyk

Jeffrey Owings

Jeffrey Owings

Number

95-06 (Jan.)

95-07 (Jan.)

95-08 (Feb.)

95-09 (Feb.)

95-10 (Feb.)

95-11 (Mar.)

95-12 (Mar.)

95-13 (Mar.)

95-14 (Mar.)

95-15 (Apr.)

95-16 (Apr.)

95-17 (May)

95-18 (Nov.)

96-01 (Jan.)

Listing of NCES Working Papers to Date--Continued

Title

National Education Longitudinal Study of 1988:Conducting Cross-Cohort Comparisons Using HS&B,NAEP, and NELS:88 Academic Transcript Data

National Education Longitudinal Study of 1988:Conducting Trend Analyses HS&B and NELS:88Sophomore Cohort Dropouts

CCD Adjustment to the 1990-91 SASS: A Comparisonof Estimates

The Results of the 1993 Teacher List Validation Study(TLVS)

The Results of the 1991-92 Teacher Follow-up Survey(TFS) Reinterview and Extensive Reconciliation

Measuring Instruction, Curriculum Content, andInstructional Resources: The Status of Recent Work

Rural Education Data User's Guide

Assessing Students with Disabilities and LimitedEnglish Proficiency

Empirical Evaluation of Social, Psychological, &Educational Construct Variables Used in NCESSurveys

Classroom Instructional Processes: A Review ofExisting Measurement Approaches and TheirApplicability for the Teacher Follow-up Survey

Intersurvey Consistency in NCES Private SchoolSurveys

Estimates of Expenditures for Private K-12 Schools

An Agenda for Research on Teachers and Schools:Revisiting NCES' Schools and Staffing Survey

Methodological Issues in the Study of Teachers'Careers: Critical Features of a Truly LongitudinalStudy

Contact

Jeffrey Owings

Jeffrey Owings

Dan Kasprzyk

Dan Kasprzyk

Dan Kasprzyk

Sharon Bobbin &John Ralph

Samuel Peng

James Houser

Samuel Peng

Sharon Bobbin

Steven Kaufman

StephenBroughman

Dan Kasprzyk

Dan Kasprzyk

Number

96-02 (Feb.)

96-03 (Feb.)

96-04 (Feb.)

96-05 (Feb.)

96-06 (Mar.)

96-07 (Mar.)

96-08 (Apr.)

96-09 (Apr.)

96-10 (Apr.)

96-11 (June)

96-12 (June)

96-13 (June)

96-14 (June)

Listing of NCES Working Papers to Date--Continued

Title

Schools and Staffing Survey (SASS): 1995 Selectedpapers presented at the 1995 Meeting of the AmericanStatistical Association

National Education Longitudinal Study of 1988(NEL S:88) Research Framework and Issues

Census Mapping Project/School District Data Book

Cognitive Research on the Teacher Listing Form forthe Schools and Staffing Survey

The Schools and Staffing Survey (SASS) for 1998-99:Design Recommendations to Inform Broad EducationPolicy

Should SASS Measure Instructional Processes andTeacher Effectiveness?

How Accurate are Teacher Judgments of Students'Academic Performance?

Making Data Relevant for Policy Discussions:Redesigning the School Administrator Questionnairefor the 1998-99 SASS

1998-99 Schools and Staffing Survey: Issues Related toSurvey Depth

Towards an Organizational Database on America'sSchools: A Proposal for the Future of SASS, withcomments on School Reform, Governance, and Finance

Predictors of Retention, Transfer, and Attrition ofSpecial and General Education Teachers: Data from the1989 Teacher Followup Survey

Estimation of Response Bias in the NHES:95 AdultEducation Survey

The 1995 National Household Education Survey:Reinterview Results for the Adult EducationComponent

77

Contact

Dan Kasprzyk

Jeffrey Owings

Tai Phan

Dan Kasprzyk

Dan Kasprzyk

Dan Kasprzyk

Jerry West

Dan Kasprzyk

Dan Kasprzyk

Dan Kasprzyk

Dan Kasprzyk

Steven Kaufman

Steven Kaufman

Number

96-15 (June)

96-16 (June)

96-17 (July)

96-18 (Aug.)

96-19 (Oct.)

96-20 (Oct.)

96-21 (Oct.)

96-22 (Oct.)

96-23 (Oct.)

96-24 (Oct.)

96-25 (Oct.)

96-26 (Nov.)

96-27 (Nov.)

Listing of NCES Working Papers to Date--Continued

Title

Nested Structures: District-Level Data in the Schoolsand Staffing Survey

Strategies for Collecting Finance Data from PrivateSchools

National Postsecondary Student Aid Study: 1996 FieldTest Methodology Report

Assessment of Social Competence, AdaptiveBehaviors, and Approaches to Learning with YoungChildren

Assessment and Analysis of School-LevelExpenditures

1991 National Household Education Survey(NHES:91) Questionnaires: Screener, Early ChildhoodEducation, and Adult Education

1993 National Household Education Survey(NHES:93) Questionnaires: Screener, SchoolReadiness, and School Safety and Discipline

1995 National Household Education Survey(NHES:95) Questionnaires: Screener, Early ChildhoodProgram Participation, and Adult Education

Linking Student Data to SASS: Why, When, How

National Assessments of Teacher Quality

Measures of Inservice Professional Development:Suggested Items for the 1998-1999 Schools andStaffing Survey

Improving the Coverage of Private Elementary-Secondary Schools

Intersurvey Consistency in NCES Private SchoolSurveys for 1993-94

Contact

Dan Kasprzyk

StephenBroughman

Andrew G.Malizio

Jerry West

William Fowler

Kathryn Chandler

Kathryn Chandler

Kathryn Chandler

Dan Kasprzyk

Dan Kasprzyk

Dan Kasprzyk

Steven Kaufman

Steven Kaufman

Number

96-28 (Nov.)

96-29 (Nov.)

96-30 (Dec.)

97-01 (Feb.)

Listing of NCES Working Papers to Date--Continued

Title

Student Learning, Teaching Quality, and ProfessionalDevelopment: Theoretical Linkages, CurrentMeasurement, and Recommendations for Future DataCollection

Undercoverage Bias in Estimates of Characteristics ofAdults and 0- to 2-Year-Olds in the 1995 NationalHousehold Education Survey (NHES:95)

Comparison of Estimates from the 1995 NationalHousehold Education Survey (NHES:95)

Selected Papers on Education Surveys: PapersPresented at the 1996 Meeting of the AmericanStatistical Association

79

Contact

Mary Rollefson

Kathryn Chandler

Kathryn Chandler

Dan Kasprzyk

80

U.S. DEPARTMENT OF EDUCATIONOffice of Educational Research and Improvement (OEM)

Educational Resources Information Center (ERIC)

NOTICE

REPRODUCTION BASIS

TM028111

ERIC

This document is covered by a signed "Reproduction Release(Blanket)" form (on file within the ERIC system), encompassing allor classes of documents from its source organization and, therefore,does not require a "Specific Document" Release form.

This document is Federally-funded, or carries its own permission toreproduce, or is otherwise in the public domain and, therefore, maybe reproduced by ERIC without a signed Reproduction Releaseform (either "Specific Document" or "Blanket").

top related