1 data linkage for educational research royal statistical society march 19th 2007 andrew jenkins and...
TRANSCRIPT
1
Data Linkage for Educational Research
Royal Statistical Society March 19th 2007
Andrew Jenkins and Rosalind Levačić
Institute of Education, University of London
2
Examples of Data Linkage
• (1) Data Linkage with the Longitudinal Survey of Young People in England (LSYPE) and the National Pupil Database (NPD)
• (2) Linking NPD to a survey of student experiences at school for evaluation of Diversity Pathfinders
3
Structure of presentations
Introduce the datasets used
Outline why data linkage was useful/important
How the datasets were combined
Any practical problems which arose in linking data
Methodological issues in using linked data
4
Main aim of research project (1)
To use data from first wave of Longitudinal Survey of Young People in England (LSYPE), combined with other datasets, to try to separate out effects of family background and neighbourhood on students’ attainment.
5
Value of Data Linkage
• Richer and more detailed models– e.g. Administrative data may include little
about pupil background
• Better control variables– e.g. controlling for family background factors
when modelling neighbourhood effects on attainment
6
Datasets to be combined:
Pupil Level: LSYPE
NPD
School Level: Edubase
Annual School Census
Neighbourhood Level: Area variables from 2001 Census
Indices of Deprivation
7
Longitudinal Survey of Young People in England
• Begins at age 14, in 2004
• Annual Interviews until age 25
• Currently only wave 1 data available
• Includes interviews with young person and parent/adult
8
LSYPE variables: some examples
.Family Siblings, mother’s education, mother’s
occupation, single parent household, state benefits/ tax credit
Pupil Attitudes to school, homework, future plans,
risk factors e.g. in contact with police, truanting etc...
Parent Expectations for child’s education, helping with homework, family joint activities, parent involvement in school
9
Overview of National Pupil Database (NPD)
• Information on all state school pupils in England• Includes national test score results• It is longitudinal
– Pupils can be tracked through Key Stages
• NPD includes Pupil Level Annual School Census (PLASC)– PLASC provides pupil background data e.g. ethnicity,
SEN
• NPD owned by DfES who manage access to the data
10
Variables and data
.National Pupil Database
Pupil level variables
Key Stage 3 scores and Key Stage 2 prior attainment in maths, English, science; gender, SEN
Family variables FSM eligibility, ethnicity, EAL
11
Neighbourhood variables
.Census area variables
(examples)
Indices of Deprivation
(examples)
Proportion unemployed Employment deprivation score
Proportion lone parent households
Income deprivation score
Proportion with level 1 or lower qualification
Skills deprivation score
Proportions from various ethnic groups
Children’s educational deprivation score
12
Linking pupils and schools
• DfES provided us with linked LSYPE/NPD data
• Linkage to school-level data using LEA and Establishment numbers
13
Combining pupil and neighbourhood data
• National Pupil Database includes pupil postcodes
• Census data and Indices of Deprivation linked to the National Pupil Database using National Statistics Postcode Directory (NSPD, formerly AFPD)
• The NSPD provides a look-up between postcodes and various administrative geography codes
14
Some problems in using linked data
• Reductions in sample size – NPD has approx 0.5 million cases per year– LSYPE has sample size of around 15,000
• Missing data
• Representativeness of data which does link successfully
• Getting access to linked data
15
Outcomes of data linkage with LSYPE
N %
Total sample in LSYPE (Wave 1) 15,770 100.0
Did not merge with neighbourhood data
838 5.3
Did not merge with school data 675 4.3
Remaining cases 14,257 90.4
16
Linking a pupil survey to National Pupil Database: Diversity Pathfinders Project
Purpose of survey: to collect data as part of a 3.5 year evaluation of Diversity Pathfinders (2002-2006).
Six Local Authorities provided with some funding by DfES to promote collaboration between groups of secondary schools with the purpose of raising standard and promoting diversity through attaining specialist status.
Largely a qualitative study using interviews and some participant observation, supplemented by an analysis of examination performance and a survey of students’ views and experiences ‘before’ and ‘after’ three years.
17
DP research design
Since DP was ‘pathfinding’ it was not a uniform treatment with controls.
By intention each LA developed its own approach and own way of selecting and grouping schools for collaboration within the DP project.
The research team selected 31 schools as case studies for which evidence collected by interviews.
These schools were also the ones selected for a survey.
Each school selected one mixed ability Year 11 form to respond to the survey on-line.
18
Purpose of the DP student survey
To establish: how did students rate aspects of their
learning experience? did students in 2005/6 rate their learning
experiences better than those in 2002/3, especially with regard to increased working with students from other schools?did students’ learning experiences differ by school and by student characteristics?did more disadvantaged students have an improved learning experience after 3 years of DP?
19
Use of National Pupil Database
NPD provides data on student’s
• prior attainment (KS2 and KS3)
• gender
• ethnicity
• special educational needs
• eligibility for free school meals
• English as an additional language
20
Advantages of data linkage
Obtaining data on student characteristics without needing to ask intrusive questions on the survey or extend length of questionnaire;
Did not need to use alternative of asking the school to supply the data – would add to burden of survey to schools and reduce further the response rate.
21
Mechanics of achieving data linkage between DP survey and NPD
NPD consists of Pupil Level Annual Census plus test results.
Each pupil has a Unique Pupil Number (UPN) used by the school when reporting data to DfES.
We needed the schools to give us the UPNs of the students in the form doing the survey. Also DoB in case needed for matching.
UPNs are highly confidential – letter from DfES to schools requesting this.
Problem: getting UPNs out of each school. 28 schools responded in 2002/3: only 16 in 2005/6.
22
How UPNs used
Each pupil given a DP project identifier number which was attached to a questionnaire.
At school pupil used id number to download own questionnaire.
NPD uses matching pupil reference number.
We sent UPNs to DfES and they matched with pupil reference number and sent us matched NPD data for these students.
23
Linking UPN, matching pupil reference number and DP survey pupil identity number
(example: not actual data) pupilid UPN PMR104184 J330414491063 CCF850CD35D9B8F0DD104205 B330714491039 CCF850CE37D9BFFEDD104212 F330414491095 CCF850CD30DCBFF9DD104245 N330414491094 CCF850CE31D3BEF1DD104247 F330414561003 CCF850CF31D8BBF8AA104259 F770414491025 CCF850CF30D3BCF1BB104279 A330414541014 CCF850CA3CDEB5ECE104289 A359414491041 CCF850CC33DAB4FBH2104272 D330514491011105057 B330505781077 CCF850CA3CD9B5FDD5105047 G330325791127 CCF850CE34D3J8F1D6105088 A336405791079 CCF850CA3CDCB4G3B3
24
Methodological issues
Missing data:
from schools that do not supply UPNs
due to non matching of UPNs and PMRs
due to missing data in NPD.
Raises questions about how representative the data are.
Inconsistent data between DP survey and NPD- gender in some cases.
25
DP survey: some results using data linkage
School satisfaction construct
Pupil factors which are significant
Quality of teaching Girl: negative KS3 attainment: negative
Quality of school Indian subcontinent ethnicity: negative First language is not English: positive
Perceived teacher support Girl: negative Negative attitudes to
school Prior attainment: inversely associated
School harassment Prior attainment: negatively associated Indian subcontinent: positively associated (only just significant.)
26
Advantages of data linkage for DP evaluation
Able to compare students from two waves of the survey
Able to control for pupil characteristics in analysis of questionnaire responses when comparing years or schools.
Able to address research questions on relationship between pupils’ characteristics and experience of school.