test scores and teacher selection

Upload: kai-chung-tam

Post on 07-Apr-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/6/2019 Test Scores and Teacher Selection

    1/32

    TEACHERS COLLEGE, COLUMBIA UNIVERSITY

    TEST SCORES AND TEACHER SELECTION

    AN EMPIRICAL ANALYSIS FOR TURKEY

    M. ALPER DINCER

    4/26/2011

    [In 2002 in Turkey, a decentralized model of teacher hiring was replaced with a teacher selectionmodel which operates through centralized testing. This study evaluates the impact of this newteacher selection policy on mathematics and science test scores of 8 th graders. The findings show

    that a 0.17 standard deviation increase in test scores can be attributed to the new teacherselection policy and the estimated impact is much higher for below median achievers andstudents with female teachers. The findings also provide evidence exhibiting that the new teacherselection policy assigns more teachers to relatively poor schools and classrooms.]

  • 8/6/2019 Test Scores and Teacher Selection

    2/32

    1. Introduction: test scoresThe primary and secondary education systems in Turkey have been undergoing a

    restructuring since late 1990s in response to swift developments in the formation of its

    economy and the demographics of its young population. One of the main goals of this

    restructuring is to increase the quality of learning outcomes in Turkey (Aksit, 2007).

    Thus it is important to investigate empirically whether these reform efforts achieve the

    intended outcomes or not.

    The Trends in International Mathematics and Science Study (TIMSS) and Program for

    International Student Assessment (PISA) periodically measure the student achievement

    on an international scale and assemble information about students, their families and

    schools. Thus with the help of these projects it is possible to track student achievement in

    participating countries and make cross-country comparisons. Therefore these projects

    provide the necessary data in order to analyze the trend of learning outcomes in Turkey.

    A representative set of the student body in 8 th grade which is the final grade of mandatory

    schooling in Turkey participated in TIMSS 1999 and 2007. The average mathematics and

    science scores of students in Turkey in 1999 were 429 and 433 whereas the international

    average scores were 487 and 488, respectively. Similarly the average mathematics and

    science scores of students in Turkey in 2007 were 433 and 454 whereas the international

    average scores were 488 and 500, respectively. Thus the students in Turkey performed

    lower than the average international student achievement. The following table gives the

    percentages of students reaching the TIMSS international benchmarks:

  • 8/6/2019 Test Scores and Teacher Selection

    3/32

    Table 1 the percentages of students reaching the TIMSS international benchmarks

    Advanced High Intermediate Low

    Mathematics

    1999 1 6 20 38

    2007 5 10 18 26

    Science1999 1 5 19 37

    2007 3 13 24 31

    Source: (Martin et al., 2001a), (Martin et al., 2001b), (Martin, Mullis, Foy, & Olson,2008a), (Martin, Mullis, Foy, & Olson, 2008b)

    As a cautionary note, it should be stated that these percentages are not directly

    comparable between 1999 and 2007 for Turkey (Martin, et al., 2008a, 2008b). However

    these figures present the same pattern in mathematics and science for the students in

    Turkey. There are more students in advanced and high international benchmark levels

    and there are fewer students in low international benchmark levels1.

    PISA offers more definitive information about the trend of learning outcomes of students

    in Turkey. Similar to TIMSS PISA measures the reading, mathematics and science test

    scores of a student body which is representative for the 15-year old student population in

    each participating country. Turkey has participated PISA in 2003, 2006 and 2009 and the

    trend in mathematics score is comparable between 2003 and 2009 and the trend in

    science test score is comparable between 2006 and 2009 (OECD, 2010).

    According to PISA results average mathematics score of 15-year old students in Turkey

    increased by 22 points (more than 0.2 standard deviation) and average science score of

    1 For the description of these benchmark proficiency levels please see (Martin, et al., 2008b) and

    (Martin, et al., 2008a).

  • 8/6/2019 Test Scores and Teacher Selection

    4/32

    15-year old students in Turkey increased by 30 points (approximately 0.3 standard

    deviation) (OECD, 2010).

    PISA data also shows that in which segment of the student body these improvements

    occurred. The percentage of students who falls below the proficiency level 2 decreased

    from 52 to 42 in mathematics and in science the same percentage dropped from 47 to 30.

    On the other hand the percentages of top performers did not show any increase or

    decrease between the respective periods (Figure 1).

    These figures on the trend of the average student achievement in mathematics and

    science in Turkey highlight at least three important facts. First, for a period which follows

    1999, average student achievement in mathematics and science is increasing for the

    student population which is either in grade 8 or 15 years old. Second, this increase in

    average student achievement is not homogenous. Indeed it is much more intensive on the

    lower end of the student achievement distribution in these subjects. Third, these

    improvements in average student achievement in Turkey are not due to inflation in test

    score scales; average performance of students in Turkey is converging to international

    benchmarks as it is defined either by TIMSS or PISA. This convergence is pretty quick at

    least according to the measure PISA provided.

    These facts immediately raise several questions: Are these changes in student

    achievement related to restructuring in the education system in Turkey? If yes, which

    aspects of the reform initiative in Turkey did lead to higher learning outcomes in

    mathematics and science? Is it possible to identify the channels through which the policy

    intervention leads to increases in student achievement? This study attempts to offer some

    candidate answers to these questions.

  • 8/6/2019 Test Scores and Teacher Selection

    5/32

  • 8/6/2019 Test Scores and Teacher Selection

    6/32

    2. Possible explanationsOECD (2010) stresses the role of the Basic Education Programme (BEP) in increasing

    learning outcomes in Turkey. The World Bank supported programme defined the

    framework for the education reform initiative in Turkey according to the Law No. 4306 2.

    With this legislation Ministry of National Education (MONE) aimed to achieve

    increasing primary school education, improving the quality of education and overall

    student outcomes, closing the performance gap between boys and girls, providing equal

    opportunities, matching the performance indicators of the European Union, developing

    school libraries, increasing the efficiency of the education system, ensuring that qualified

    personnel were employed, integrating information and communication technologies into

    the education system and creating local learning centers, based in schools, that are open

    to everyone3.

    In response to these efforts the attendance rate in the eight-year primary education system

    soared from 85 to 100 percent. Similarly, the attendance rate in pre-primary education

    system increased from 10 to 25 percent. These increases led to an expansion of the

    education system by 3.5 million pupils. These quantitative expansions of the education

    system were accompanied by qualitative improvements: During the same period average

    class size was reduced from approximately 40 to 30; conditions were improved in all

    rural schools and computer laboratories were established in every primary school and

    lastly the cost of the BEP exceed the equivalent of USD 11 billion (OECD, 2010).

    2http://mevzuat.meb.gov.tr/html/24.html

    3http://www.meb.gov.tr/Stats/Apk2002/502.htm

    http://mevzuat.meb.gov.tr/html/24.htmlhttp://mevzuat.meb.gov.tr/html/24.htmlhttp://mevzuat.meb.gov.tr/html/24.htmlhttp://www.meb.gov.tr/Stats/Apk2002/502.htmhttp://www.meb.gov.tr/Stats/Apk2002/502.htmhttp://www.meb.gov.tr/Stats/Apk2002/502.htmhttp://www.meb.gov.tr/Stats/Apk2002/502.htmhttp://mevzuat.meb.gov.tr/html/24.html
  • 8/6/2019 Test Scores and Teacher Selection

    7/32

    OECD (2010) as well as MONE also highlights the importance of recent curriculum

    change in mathematics and science (TTKB, 2008): New curricula were launched in the

    2006-2007 school year, starting from the 6 th grade. Similarly, mathematics and language

    curricula were also updated and starting from the 9 th grade in the 2008-2009 school year a

    new curriculum of science was in force. According to the Board of Education (TTKB)

    the aim of this change was to update the content of school education as well as to change

    the teaching philosophy and culture within schools.

    Although the new curricula is the preferred explanation of MONE and some other

    research institutions in Turkey

    4

    for the increased learning outcomes the connection is not

    clear and there is a problem with this specific explanation: First, given that the TIMSS

    covers the period between 1999 and 2007 the new curricula explanation does not explain

    the improvement in learning outcomes which is evident in TIMSS data. Second, average

    achievement in mathematics in PISA is not comparable between 2006 and 2009.

    Therefore the timing of the inception of the new curricula and the increase in average

    mathematics achievement in Turkey do not overlap. Third, the students who were subject

    to the curricula change in science are 9 th graders which constitute only a portion of the

    PISA 2009 sample in Turkey; moreover they experienced the new curricula only for two

    semesters. It is not clear whether these students may drive a 0.3 standard deviation

    increase in the average student achievement in science between 2006 and 2009.

    As mentioned earlier, one of the targets of the BEP was to ensure that qualified personnel

    were employed. In line with this goal teacher selection policy was changed in 2002 in

    4http://www.setav.org/public/HaberDetay.aspx?Dil=tr&hid=57559&q=pisa-yi-dogru-okumak ,

    http://www.tepav.org.tr/upload/files/1292255907-

    8.PISA_2009_Sonuclarina_Iliskin_Bir_Degerlendirme.pdf

    http://www.setav.org/public/HaberDetay.aspx?Dil=tr&hid=57559&q=pisa-yi-dogru-okumakhttp://www.setav.org/public/HaberDetay.aspx?Dil=tr&hid=57559&q=pisa-yi-dogru-okumakhttp://www.setav.org/public/HaberDetay.aspx?Dil=tr&hid=57559&q=pisa-yi-dogru-okumakhttp://www.tepav.org.tr/upload/files/1292255907-8.PISA_2009_Sonuclarina_Iliskin_Bir_Degerlendirme.pdfhttp://www.tepav.org.tr/upload/files/1292255907-8.PISA_2009_Sonuclarina_Iliskin_Bir_Degerlendirme.pdfhttp://www.tepav.org.tr/upload/files/1292255907-8.PISA_2009_Sonuclarina_Iliskin_Bir_Degerlendirme.pdfhttp://www.tepav.org.tr/upload/files/1292255907-8.PISA_2009_Sonuclarina_Iliskin_Bir_Degerlendirme.pdfhttp://www.tepav.org.tr/upload/files/1292255907-8.PISA_2009_Sonuclarina_Iliskin_Bir_Degerlendirme.pdfhttp://www.setav.org/public/HaberDetay.aspx?Dil=tr&hid=57559&q=pisa-yi-dogru-okumak
  • 8/6/2019 Test Scores and Teacher Selection

    8/32

  • 8/6/2019 Test Scores and Teacher Selection

    9/32

    The difference in teacher quality may lead to substantial difference in student

    achievement. In order to understand the relative significance of teacher quality Rivkin et

    al. (2005) analyze a unique matched panel data from the UTD Texas Schools Project

    which allows them to identify teacher quality based on student performance. They

    conclude that the contribution of a ten student reduction in class size to learning is less

    than that of a standard deviation increase in teacher quality.

    In another study, Rockoff(2004) analyzes a 10-year panel data of test scores and teacher

    assignments to understand how much teachers affect learning. The panel structure allows

    him to focus on differences in the performance of the same student with different teachers

    and to decompose the variation in teacher quality from variation in students

    characteristics. His analysis shows that variation in teacher quality explains 23 percent of

    the variation in the test scores which is potentially open to policy influence.

    Third, teacher characteristics such as qualifications, teaching experience and teacher

    education do not exhibit consistently clear and strong effects on student achievement:

    Hanushek(2002, 2003) reviews the studies focusing on United States and concludes that

    overall there are no systematic effects of characteristics such as teacher education or

    teacher experience. Thus it is a challenging inquiry to identify the components which

    characterize the quality of teachers.

    In the same reviews Hanushek(2002, 2003) also highlights that there is convincingly

    strong support for the effects of teachers academic ability as measured by teacher test

    scores. In line with Hanusheks inference National Center on Teacher Quality (NCTQ)

    (2004) reports that teachers academic aptitude has a clear, measurable effect on learning

    and this finding is robust and consistent. The same reports emphasizes that a teachers

  • 8/6/2019 Test Scores and Teacher Selection

    10/32

    literacy ability as measured by standardized tests has an impact on learning more than

    any other measureable teacher characteristics. Thus a broad conclusion emerges from

    research connecting teacher quality to teachers test scores: Teachers test scores may be

    a good measure for teacher quality if these tests are measuring academic aptitude.

    Interestingly, there are some studies from Turkey which is in line with these findings.

    Several studies which analyze PISA 2006 data for Turkey show that students who were

    taught by teachers who passed rigorous testing procedures are associated with higher test

    scores (Alacaci & Erbas, 2010; Dincer & Uysal, 2010).

    The literature leads to two main conclusions in these aspects: First, teacher quality is an

    essential ingredient of education production and it is open to policy influence. Second,

    screening teachers with testing which measures academic ability may lead to an increase

    in the teacher quality.

    4. Basic characteristics of teacher labor market in TurkeyThe main characteristic of teacher labor market in Turkey is the excess supply of

    teachers. As of 2010, approximately 327 thousand teachers wait to be employed by the

    public sector and the number of applicants is three to four times higher than the number

    of the opening teaching positions (Figure 2). This army of inactive teachers represents a

    significant population given that the number of employed teachers in the public sector is

    680 thousand. MONE also predicts that the optimal number of employed teachers in

    public education system 717 thousand5

    . Under these circumstances the gap between the

    supply and the demand of teachers widens cumulatively.

    5http://icden.meb.gov.tr/digeryaziler/MEB_ic_denetim_faaliyet_raporu_2009.pdf

    http://icden.meb.gov.tr/digeryaziler/MEB_ic_denetim_faaliyet_raporu_2009.pdfhttp://icden.meb.gov.tr/digeryaziler/MEB_ic_denetim_faaliyet_raporu_2009.pdfhttp://icden.meb.gov.tr/digeryaziler/MEB_ic_denetim_faaliyet_raporu_2009.pdfhttp://icden.meb.gov.tr/digeryaziler/MEB_ic_denetim_faaliyet_raporu_2009.pdf
  • 8/6/2019 Test Scores and Teacher Selection

    11/32

    As of 2010, MONE demanded 782 mathematics teachers and it received 2798

    applications. For science these figures are 861 and 3546 6, respectively and the gap more

    or less is evident in every subject; thus excess supply is not specific to some of the

    subjects.

    Figure 2: The number of open positions and applicants by subject

    0

    500

    1000

    1500

    2000

    2500

    3000

    3500

    4000

    Math Scienceand Tech

    Physics Biology Chemistry

    # Open positions # Applicants

    Source: Authors own calculations fromhttp://personel.meb.gov.tr/ana_sayfa.asp

    A candidate rationalization of this excess supply may be the presence of very attractive

    teacher salaries in Turkey. However the teacher salaries are not attractive at all in Turkey.

    In the public sector the starting salary of a teacher is around 14000$ and it does not

    improve much with experience (Figure 3). The salary of a teacher with 15 years of

    experience is around 16000$ (OECD, 2009).

    Dolton and Gutierrez (2011) present a cross-country analysis of teacher pay and

    performance by taking the relative earning distribution in each country into account.

    6http://personel.meb.gov.tr/ana_sayfa.asp

    http://personel.meb.gov.tr/ana_sayfa.asphttp://personel.meb.gov.tr/ana_sayfa.asphttp://personel.meb.gov.tr/ana_sayfa.asphttp://personel.meb.gov.tr/ana_sayfa.asphttp://personel.meb.gov.tr/ana_sayfa.asphttp://personel.meb.gov.tr/ana_sayfa.asphttp://personel.meb.gov.tr/ana_sayfa.asphttp://personel.meb.gov.tr/ana_sayfa.asphttp://personel.meb.gov.tr/ana_sayfa.asp
  • 8/6/2019 Test Scores and Teacher Selection

    12/32

    Their analysis confirms that the teacher salaries are not especially attractive in Turkey

    and the salary-experience profile is flat (Figure 4).

    Figure 3: Ratio of salary after 15 years of experience to GDP per capita

    0

    0.5

    1

    1.5

    2

    2.5

    Kore

    a

    German

    y

    Portugal

    Japan

    Scotlan

    d

    NewZealand

    Switzerlan

    d

    Mexico

    Spai

    n

    England

    CzechRepubli

    c

    Turkey

    Slovenia

    Ireland

    Belgium(Fl.)

    Australia

    OECDaverag

    e

    Greec

    e

    Netherland

    s

    Belgium(Fr.)

    Denmar

    k

    Chile

    Finland

    Austria

    Italy

    Franc

    e

    UnitedState

    s

    Swede

    n

    Luxembourg

    Hungar

    y

    Iceland

    Norwa

    y

    Israel

    Estonia

    Source: (OECD, 2009)

    Figure 4: Average teacher wage-experience profile in Turkey

    Source: (Dolton & Marcenaro Gutierrez, 2011)

    Therefore the starting salaries and the expectation of relatively higher salaries in the

    teaching profession cannot explain the excess supply in the teacher labor in Turkey.

  • 8/6/2019 Test Scores and Teacher Selection

    13/32

    Another important feature of the teacher labor market is that all the public servants in

    Turkey are protected by law and unions and the job separation is a very unlikely event.

    As a result teaching profession offers substantial job security and given the presence of

    very high chronic unemployment rates individuals value job security heavily. One study

    (Caner & Okten, 2010) analyzes the college major choice decision in a risk and return

    framework using university entrance exam data from Turkey and show that individuals

    are very sensitive to risk during career choice.

    It should be also noted total enrollment in education faculties in Turkey also increased

    steadily in time: The total enrollment increased from 33 thousand in 2007 to 45 thousand

    in 2008 and 54 thousand in 2009 and MONE expands the teaching force by

    approximately 40 thousand each year7.

    Thus a combination of an intense demand for job security and increased quotas of

    education faculties may provide a more sensible explanation for the excess supply in

    teacher labor market in Turkey.

    5. Legal framework of teacher selection in TurkeyThere are three main legal sources which regulates the hiring of teachers in Turkey. First,

    teachers working in the public sector are subject to Law No. 657. This law defines the

    rights as well as legal obligations of public servants since 1965. Second, the regulation of

    the tests concerning the assignments of public servant candidates describes the testing

    procedure for public servant posts since 2002. Third, MONEs regulation of teacher

    assignment and replacementexplains how the testing procedure and test results apply to

    7http://www.ogretmenportali.net/HaberGoster/228716e4-64bf-4b55-bb17-

    fc0ee89baf38/atanmayan-ogretmen-ordusu-buyuyor.aspx

    http://www.ogretmenportali.net/HaberGoster/228716e4-64bf-4b55-bb17-fc0ee89baf38/atanmayan-ogretmen-ordusu-buyuyor.aspxhttp://www.ogretmenportali.net/HaberGoster/228716e4-64bf-4b55-bb17-fc0ee89baf38/atanmayan-ogretmen-ordusu-buyuyor.aspxhttp://www.ogretmenportali.net/HaberGoster/228716e4-64bf-4b55-bb17-fc0ee89baf38/atanmayan-ogretmen-ordusu-buyuyor.aspxhttp://www.ogretmenportali.net/HaberGoster/228716e4-64bf-4b55-bb17-fc0ee89baf38/atanmayan-ogretmen-ordusu-buyuyor.aspxhttp://www.ogretmenportali.net/HaberGoster/228716e4-64bf-4b55-bb17-fc0ee89baf38/atanmayan-ogretmen-ordusu-buyuyor.aspxhttp://www.ogretmenportali.net/HaberGoster/228716e4-64bf-4b55-bb17-fc0ee89baf38/atanmayan-ogretmen-ordusu-buyuyor.aspx
  • 8/6/2019 Test Scores and Teacher Selection

    14/32

    teacher selection process. The current version of this regulation is legislated in 2010 and

    it has changed many times in the past according to the needs of MONE.

    The regulation of the tests concerning the assignments of public servant candidates

    basically forms a turning point in teacher selection; because it causes a radical change in

    teacher selection policy in Turkey.

    In teacher selection system before the legislation of this regulation, i.e. prior 2002, any

    eligible teacher candidate was able to apply to any available position announced by

    MONE. The applications were processed in provincial offices of MONE and then the

    final decision was given by the headquarters of MONE in the capital, Ankara (Figure 5).

    Figure 5: A presentation of teacher selection system before 2002

    This system was a cause of concern in MONE as well as in State Planning Organization

    (SPO) (SPO, 1989). One of the main issues of the pre-2002 system was highlighted by

    MONE as a constant imbalance of teacher population across regions. According to the

    Research and Development department of MONE, one preliminary report of the 1993

    National Education Assembly stressed that more than 10 percent of teachers employed by

    MONE in urban areas did not teach a single class. Another issue documented in MONEs

    record associated with the pre-2002 was that political pressures and interventions

  • 8/6/2019 Test Scores and Teacher Selection

    15/32

    damaged the fairness and equality principles in teacher employment and caused unrest

    among teachers (EARGED, 1995). Indeed this was well-known publicly that to have

    connections in provincial offices as well as in the capital was essential to get hired. Thus

    nepotism was a general worry about this selection process.

    Following the legislation of the above mentioned testing regulation Center of Student

    Selection and Placement (OSYM) launched a central examination process which is

    known as Public Servant Selection Examination (KPSS). This exam has two sessions:

    For the first session the teacher candidates have to answer 120 multiple choice questions

    about Turkish, Mathematics, History, Citizenship, General Culture and Geography in 180

    minutes. In the second the teacher candidates have to answer 120 multiple choice

    questions about educational psychology, educational programs and teaching and

    educational guidance in 180 minutes. Then applicants are assigned to teaching positions

    centrally by MONE according to their test scores in the central examination and their

    ranked list of preferred teaching positions (Figure 6). OSYM conducts the exam annually

    and if a teacher candidate fails to be placed to a teaching position then s/he has to take the

    exam again in the following year.

    Figure 6: A hypothetical presentation of teacher selection after 2002

    In this teacher selection system it is not possible to game the hiring process and it is also

    not possible to leverage nepotism in order to get a teaching position. Thus it is reasonable

  • 8/6/2019 Test Scores and Teacher Selection

    16/32

    to claim that the central examination and allocation of teaching positions based on test

    scores address the problem of lack of fairness. However two questions remain to be

    answered: Does the new system ensure that the qualified teachers are employed? Does

    this system have an impact on the regional imbalance of teacher population? The first

    question is critical because it was one of the main goals of BEP. The second question is

    critical because it was the chronic problem of education system (EARGED, 1995; SPO,

    1989).

    6. DataIn order to answer these research questions I employed TIMSS 1999

    8

    and TIMSS 20079

    data sets for Turkey. These data sets have some very important qualities which render

    them very suitable to analyze the questions in interest.

    First, as mentioned earlier, these projects assess a representative set of 8th graders in the

    participating countries. 8th grade is the final grade of primary education in Turkey and

    thus students in the sample should have spent at least a couple of years in their current

    institutions.

    Second, it is possible to link teachers to students in the same classroom which makes

    these data sets especially attractive for this analysis.

    Third, the TIMSS project conducts four questionnaires, i.e. student, school, mathematics

    teacher and science teacher questionnaires. The student and teacher questionnaires

    contain extensive information about demographic and socioeconomic characteristics of

    8http://timss.bc.edu/timss1999.html

    9http://timss.bc.edu/timss2007/index.html

    http://timss.bc.edu/timss1999.htmlhttp://timss.bc.edu/timss1999.htmlhttp://timss.bc.edu/timss1999.htmlhttp://timss.bc.edu/timss2007/index.htmlhttp://timss.bc.edu/timss2007/index.htmlhttp://timss.bc.edu/timss2007/index.htmlhttp://timss.bc.edu/timss2007/index.htmlhttp://timss.bc.edu/timss1999.html
  • 8/6/2019 Test Scores and Teacher Selection

    17/32

    students and teachers. In addition, the school questionnaire contains information on

    school location, resources and governance.

    Fourth, the information collected in 1999 and 2007 is comparable to a certain extent. The

    questionnaires in 1999 and 2007 are not overlapping extensively; however most of the

    essential information is available in both data sets.

    Fifth and most importantly, the policy change which is subject to the evaluation in this

    study falls into the middle of 1999 and 2007, the dates Turkey participated to TIMSS.

    This allows me to have a reasonable number of observations who are subject to the policy

    change which was launched in 2002.

    Lastly, the teacher experience is reported in years such as 1, 2, 3 etc. but not in year

    categories such as 0-4, 5-8 etc. This distinction is crucial for this analysis because the

    data on teacher experience in TIMSS allow me to define the treatment and control groups

    with respect to the inception date of the policy change.

    7. Methodology and empirical analysisFor the empirical analysis, first, I merged the student, school and teachers data sets for

    1999 and 2007 and compiled the 1999 and 2007 TIMSS data sets. Then I defined the

    treatment group as the students whose teachers have four or less years of experience. This

    assumption is necessary because I do not observe whether the teachers were selected via

    central examination or not. Thus I claim that this definition of treatment group

    approximates the ideal case.

    The justification of this assumption is based on the timing of the TIMSS application and

    the central examination. The first central examination in Turkey was conducted in July

  • 8/6/2019 Test Scores and Teacher Selection

    18/32

    2002; OSYM announced the test scores in August 200210 and MONE distributed the

    teaching posts based on announced test scores in September, October and November

    200211. On the other hand The TIMSS 2007 application in Turkey was conducted in

    April, May and June 2007(Olson, Martin, Mullis, & Arora, 2008). Thus a teacher who

    was selected with the first central examination should have assigned to the post as early

    as September 2002 and the same teacher should have answered the TIMSS teacher

    questionnaire as late as June 2007. According to this hypothetical example this teacher

    should not have five years of experience at the time of TIMSS application. Therefore the

    treatment group is assumed to be as defined above.

    However this is an imperfect measure of selection via central examination: First, teacher

    turnover leads to measurement error; because it is possible to quit and return teaching

    which may be especially an issue for female teachers who may substitute teaching with

    child raising for a couple of years. Second, OSYM conducted another central

    examination which is known as Central Elimination Examination for Institutions (KMS)

    in 200112. KMS was different then KPSS and it is not clear how many teaching posts

    were distributed based on KMS scores as well as whether KMS scores were the sole

    determinant of the teacher assignments. This issue may also lead to measurement error.

    Keeping these shortcomings in mind I basically compared the difference of average

    student achievement between treatment and control groups in 1999 and 2007 with a basic

    differences-in-differences approach. The main assumption of this approach is that the

    10http://www.osym.gov.tr/belge/1-6128/2002-sinavlari.html 11http://personel.meb.gov.tr/sayfa_goster.asp?ID=207

    12http://www.osym.gov.tr/belge/1-12485/2001-sinavlari.html

    http://www.osym.gov.tr/belge/1-6128/2002-sinavlari.htmlhttp://www.osym.gov.tr/belge/1-6128/2002-sinavlari.htmlhttp://www.osym.gov.tr/belge/1-6128/2002-sinavlari.htmlhttp://personel.meb.gov.tr/sayfa_goster.asp?ID=207http://personel.meb.gov.tr/sayfa_goster.asp?ID=207http://personel.meb.gov.tr/sayfa_goster.asp?ID=207http://www.osym.gov.tr/belge/1-12485/2001-sinavlari.htmlhttp://www.osym.gov.tr/belge/1-12485/2001-sinavlari.htmlhttp://www.osym.gov.tr/belge/1-12485/2001-sinavlari.htmlhttp://www.osym.gov.tr/belge/1-12485/2001-sinavlari.htmlhttp://personel.meb.gov.tr/sayfa_goster.asp?ID=207http://www.osym.gov.tr/belge/1-6128/2002-sinavlari.html
  • 8/6/2019 Test Scores and Teacher Selection

    19/32

    change in mean test scores that the control group experiences over time reflects the same

    change that the treatment group would have experienced had they not been exposed to the

    treatment. Another important assumption of differences-in-differences approach is that

    unobserved characteristics have the same distribution across time points and across

    treatment groups. I will discuss the validity of these assumptions in the subsequent

    sections.

    For the differences-in-differences analysis I have estimated the following regression

    models:

    Table 2: Difference-in-Differences estimations

  • 8/6/2019 Test Scores and Teacher Selection

    20/32

    In these regression models represents the dependent variable which is either the

    mathematics or science test score. However it should be mentioned that TIMSS does not

    provide point estimates of mathematics and science test scores instead TIMSS gives five

    plausible values of mathematics and science ability. For the sake of simplicity I averaged

    the five plausible values for each subject and then used the averaged plausible values as

    the measure of the subject test score. TIMSS 2007 Technical Report highlights that

    taking the average of the plausible values will not yield suitable estimates of individual

    student scores (Olson, et al., 2008). In this analysis I repeated some of the estimations

    with plausible values and then compared the point estimates and the standard errors of the

    population parameter in interest, i.e. . In all cases the point estimates were very close to

    each and the standard errors were slightly larger which did not affect the statistical

    significance levels.

    In these regression models stands for the TIMSS cycle (1999 and 2007),

    defines the treatment variable which equals to 1 if the subject teacher has four

    or less years of experience. Observed information regarding teachers, students, classes

    and schools enters the regression models as control variables (Table 3).

    The list of control variables was basically constructed within the data limitations. The

    variables available in TIMSS 1999 and 2007 data sets are not overlapping to a significant

    degree and in some cases although the necessary variables are available in both data sets

    the scales of measurement are different. For example this was a serious issue in terms of

    school location variable. All in all I experimented with every variable which is available

    in both data sets. The number of missing observations partially had an impact on the list

    of control variables.

  • 8/6/2019 Test Scores and Teacher Selection

    21/32

    Table 3: List of Control Variables

    Teacher

    characteristics

    Class

    characteristics

    Student

    characteristics

    School

    resources

    Sex Diversity inacademic ability

    Sex An indicatorfor school

    resourcesAge Diversity in

    socioeconomicbackground

    Age Location

    Subject degree Presence ofdisruptive students

    Parentaleducation

    Experience Class size # books at home

    Instructional time Computer athomeLanguagespoken at home

    Following the difference-in-differences analysis with mathematics and science

    achievement I utilized another aspect of the data structure: The treatment variable offers

    variation by subject. This means that same student may have a mathematics teacher who

    has four or less years of experience whereas his science teacher may have more than four

    years of experience (or vice versa). Given that both the mathematics and science test

    scores are observed for each student this structure allows me to employed individual

    fixed effects. For that purpose I compiled the mathematics and science data sets and

    incorporated student fixed effects into the regression models defined in Table 2. This

    approach allowed me to relax one of the assumptions which are associated with

    difference-in-differences approach. After adding student fixed effects into the model I do

    not have assume that unobserved student and school characteristics have the same

    distribution across time points and across treatment groups. However I still have to

    assume that unobserved class characteristics have the same distribution across time points

    and across treatment groups (Table 4). Lastly it should be also mentioned that there are

  • 8/6/2019 Test Scores and Teacher Selection

    22/32

    other examples which employs very similar identification strategies such as the study of

    Lavy (2010). In this study the researcher establishes a causal link between instructional

    time and student achievement by making use of the within-individual variation in the test

    scores and within-subject variation in the instructional time. In its essence the

    identification strategy I am employing is identical to the approach Lavy (2010) used with

    one exception that I embedded it into a difference-in-differences framework (Table 4).

    Table 4: Fixed effects and difference-in-differences estimations

    Although this identification strategy allows me to relax some of the assumptions of the

    differences-in-differences approach it has also its own shortcomings: First, it leads to a

    reduction in the sample size automatically and this problem becomes more pronounced in

    sub-group analysis. Second, it is not possible to decompose the effect into two parts as

    learning gains in mathematics and learning gains science.

    8. FindingsThe following table gives the estimated values for the coefficient of interest under

    different specification as described in Table 2 as well as it also presents sub-group

    estimates of this coefficient. The analysis has been conducted separately for mathematics

    and science test scores (Table 5).

  • 8/6/2019 Test Scores and Teacher Selection

    23/32

    Table 5: Estimation results of difference-in-differences

    MathematicsWhole sample Female teacher sample Male teacher sample Below median achievers

    sampleAbove median achie

    sample

    Coef Std Err Adj R Coef Std Err Adj R Coef Std Err Adj R Coef Std Err Adj R Coef Std Err A

    Model 1 -17.86 [13.94] 0.04 -23.53 [22.02] 0.07 -15.50 [17.44] 0.03 -2.38 [6.32] 0.03 -11.90 [8.50]

    Model 2 -28.15* [16.46] 0.08 -25.76 [20.89] 0.19 -20.10 [21.03] 0.06 -4.34 [7.47] 0.04 -18.43* [10.30]

    Model 3 -27.12 [17.26] 0.10 -10.76 [21.23] 0.26 -28.35 [24.38] 0.09 -2.80 [7.12] 0.05 -16.59 [11.18]

    Model 4 -14.19 [14.02] 0.27 -7.01 [18.09] 0.38 -10.20 [20.20] 0.24 -2.40 [6.82] 0.10 -8.38 [9.29]

    Model 5 -0.61 [13.56] 0.30 6.45 [17.84] 0.40 0.55 [19.41] 0.26 0.22 [6.66] 0.11 -0.26 [9.14]

    Obs. 6,750 2,757 3,993 3,354 3,396

    Science

    Model 1 -15.49 [12.30] 0.07 -42.53** [16.51] 0.10 4.36 [17.24] 0.05 -3.41 [6.31] 0.01-8.46 [6.25]

    Model 2 -17.42 [12.34] 0.09 -28.14 [17.63] 0.12 -2.40 [17.70] 0.08 -5.99 [5.76] 0.03 -5.87 [6.94]

    Model 3 -17.85 [13.71] 0.14 -15.22 [21.73] 0.18 -12.49 [17.03] 0.17 -4.83 [6.05] 0.04 -4.79 [7.31]

    Model 4 -6.89 [10.63] 0.31 -4.98 [17.31] 0.35 -5.00 [13.05] 0.31 -2.70 [5.23] 0.10 -2.21 [6.52] Model 5 -6.29 [10.56] 0.31 -11.90 [18.50] 0.36 -2.73 [12.71] 0.31 -4.15 [5.32] 0.11 -1.38 [6.53]

    Obs. 7,085 3,131 3,954 3,536 3,549

    Robust standard errors in brackets clustered at the class level, *** p

  • 8/6/2019 Test Scores and Teacher Selection

    24/32

    The results in Table 5 draw attention to several important issues: First, standard errors are

    very large. Among 50 point estimates of the treatment effect only three of them are

    statistically different than different than zero at least at 10 percent significance level.

    Second, almost all of the point estimates have a negative sign. Third, the point estimates

    are not stable. In the Model 1 without any control variables the point estimates are

    negative and large; however the addition of teacher, class, student and school

    characteristics into the regression model rasps this negative treatment effect towards zero.

    In some sub-groups addition of these control variables also led to sign changes. A closer

    look to the female teacher and male teacher sub-groups highlights that this problem is

    much more severe in female teacher sub-group. All in all, the difference-in-differences

    analysis does not provide any information about the possible impact of treatment on

    student learning. Because of the very large standard errors the treatment effect may be

    negative, zero or positive. However it also shows that observed class, student and school

    characteristics do not have the same distribution across time points and across treatment

    groups given that the point estimates are instable and change signs. Therefore it is also

    very likely that unobserved class, student and school characteristics do not have the same

    distribution across time points and across treatment groups which is a violation of the

    assumptions underlying difference-in-differences approach. This may also be a sign of

    differential assignment of teachers with four or less years of experience to classrooms

    between 1999 and 2007. In the following I incorporate the student fixed effects into the

    regression models in order to take into account the factors at the student and school levels

    (Table 6). However teacher and class characteristics vary between the subjects; thus the

    regressions contain controls for observed teacher and class characteristics.

  • 8/6/2019 Test Scores and Teacher Selection

    25/32

    Table 6: Estimation results of student fixed effects and difference-in-differences

    Mathematics & science scores combinedWhole sample Female teacher sample Male teacher sample Below median achievers

    sampleAbove median ac

    sample

    Coef Std Err Adj R Coef Std Err Adj R Coef Std Err Adj R Coef Std Err Adj R Coef Std Err

    Model 1 3.68 [10.62] 0.01 15.10 [12.92] 0.04 7.11 [10.85] 0.03 3.94 [12.59] 0.02 2.94 [8.44]

    Model 2 4.28 [9.36] 0.09 16.07 [12.61] 0.10 8.59 [13.17] 0.14 -0.60 [12.49] 0.20 2.42 [8.13]

    Model 3 14.77** [6.89] 0.22 41.56** [18.32] 0.30 17.63 [14.13] 0.23 20.67*** [5.32] 0.52 6.17 [9.91]

    Obs. 4619 612 1166 2959 1675

    Robust standard errors in brackets clustered at the class level, *** p

  • 8/6/2019 Test Scores and Teacher Selection

    26/32

    The results in Table 6 are in contrast with the result in Table 5. Generally the standard

    errors are smaller; more interestingly with one exception all of the point estimates of the

    treatment effect are positive. The point estimates are not sensitive to the addition of the

    teacher characteristics to the regression; however they are very sensitive to the addition

    of class characteristics. According to the Model 3, i.e. after controlling for teacher and

    class characteristics, the impact of the treatment is estimated precisely for the whole,

    female teacher and below median achievers samples.

    The standard deviation of the dependent variable in the whole sample is 89. Thus the

    impact of the policy change in 2002 on student achievement is around 0.17 standard

    deviations. However the sub-group analysis exhibits that this impact is channeled mostly

    through female teachers. The estimated impact of the treatment effect in the female

    teacher sample is 2.8 times higher than the whole sample whereas in the male teacher

    sample the impact is not precisely estimated. Another important inference which can be

    drawn from Table 6 is that the below median achievers benefit more from the new

    teacher selection compared to above median achievers. Thus the treatment effect is

    concentrated on below median achievers. Lastly, the sensitivity of the point estimates to

    the addition of class characteristics are in line with the findings in Table 5. This may be

    due to the within-school (between classroom) differential assignment of teachers with 4

    or less years of experience to classrooms between 1999 and 2007.

    The findings in Table 6 provide evidence in favor of a positive and moderately large

    treatment effect. Thus it may be claimed that within the contextual framework in Turkey

    teacher selection with centralized testing may lead to higher learning outcomes compared

    to a decentralized recruitment system. However, there may be other underlying reasons

  • 8/6/2019 Test Scores and Teacher Selection

    27/32

    which can potentially explain the findings in Table 6: For example, there may be a

    secular increase in the quality of education faculties in Turkey. If this is the case the

    estimated impact may be due to the quality increase in education faculties instead of the

    new teacher selection policy. In the same line of thought it can be said that more and

    more high school students with higher ability opt for education faculties; thus ability

    distribution of the pool of teacher candidates may shift in time. However if these

    arguments are true I should expect to detect positive estimates of treatment effect for

    different segments of teachers. In order to test these arguments I divided the sample of

    teachers who have more than four years experience into three parts such that the sizes of

    the subsamples are equal. These segments are 5-8, 9-20 and 20+ years of experience.

    Thus these categories defined the alternative treatment variables for each case and I

    repeated the individual fixed effects exercise with the full model which includes teacher

    and class characteristics as controls. In Table 7 none of the point estimates are

    statistically significant and positive; additionally statistically insignificant point estimates

    are small when compared with the positive point estimates in Table 6. Thus I failed to

    detect any positive impact of the treatment effect with alternative treatment definitions.

    Therefore it is more likely that the estimated impact is due to the new selection policy

    rather than a secular increase in the quality of education faculties or student body.

    9. ConclusionThese findings are suggestive in their nature and they are not suitable to make causal

    inferences: Combining individual fixed effects with difference-in-differences allows for a

    relatively precise estimate of the treatment effect. The remaining problem with this

    approach is the lack of a complete set of classroom characteristics. The point estimates

  • 8/6/2019 Test Scores and Teacher Selection

    28/32

    are sensitive to the classroom characteristics and unobserved classroom characteristics

    may cause a bias on the estimate. Although all of this analysis shows that the possible

    direction of this bias is downward.

    The findings also provide a reasonable explanation for the trend in TIMSS and PISA

    results. First, since the analyzed period precedes the curriculum reform in Turkey the

    findings cannot be attributed to the curriculum reform. Second, the findings present a

    concentrated impact on below median achievers whereas no impact for above median

    achievers. This is perfectly in line with what we observe in PISA cycles for students in

    Turkey.

    The findings are also in accordance with the literature on teacher quality: As mentioned

    earlier teachers academic ability is one of most robust indicators of teachers

    effectiveness (Hanushek, 2002, 2003; NCTQ, 2004). Basturk(2008) shows that test

    scores in college entrance exam are highly predictive for the KPSS test score. Therefore

    it should be reasonable to interpret success in KPSS as an indication of higher academic

    ability.

    Lastly, the following table depicts the degree of differential assignment of teachers into

    schools and classrooms. These tables can be interpreted as MONE attempts to ensure a

    more balanced distribution of teacher assignment across resource rich and poor regions.

    As mentioned earlier MONE as well as SPO were concerned about the imbalance of

    teaching force across regions (Table 8).

    After the introduction of the central examination the teaching force became much more

    female, the new teachers were assigned to classrooms which were much more diverse in

    terms of socioeconomic background and have fewer resources for instruction. The

  • 8/6/2019 Test Scores and Teacher Selection

    29/32

  • 8/6/2019 Test Scores and Teacher Selection

    30/32

  • 8/6/2019 Test Scores and Teacher Selection

    31/32

    Lavy, V. (2010). Do Differences in Schools Instruction Time Explain International

    Achievement Gaps in Math, Science, and Reading? Evidence from Developed

    and Developing Countries: National Bureau of Economic Research.

    Martin, M.O., Mullis, I.V.S., Foy, P., & Olson, J.F. (2008a). TIMSS 2007: International

    Mathematics Report: Findings from IEA's Trends in International Mathematics

    and Science Study at the Fourth and Eighth Grades: IEA TIMSS & PIRLS

    International Study Center, Lynch School of Education, Boston College.

    Martin, M.O., Mullis, I.V.S., Foy, P., & Olson, J.F. (2008b). TIMSS 2007: International

    Science Report: Findings from IEA's Trends in International Mathematics and

    Science Study at the Fourth and Eighth Grades: IEA TIMSS & PIRLS

    International Study Center, Lynch School of Education, Boston College.

    Martin, M.O., Mullis, I.V.S., OConnor, K.M., Chrostowski, S.J., Gregory, K.D., Smith,

    T.A., & Garden, R.A. (2001a). Mathematics benchmarking report: TIMSS

    1999Eighth grade. Chestnut Hill, MA: International Study Center.

    Martin, M.O., Mullis, I.V.S., OConnor, K.M., Chrostowski, S.J., Gregory, K.D., Smith,

    T.A., & Garden, R.A. (2001b). Science benchmarking report: TIMSS 1999

    Eighth grade. Chestnut Hill, MA: International Study Center, Lynch School of

    Education, Boston College.

    NCTQ. (2004). Increasing the Odds How Good Policies Can Yield Better Teachers:

    NCTQ.

    OECD. (2009).Education at a Glance 2009: OECD Indicators : Organization for

    Economic Cooperation and Development.

    OECD. (2010). PISA 2009 Results: Learning Trends: OECD.

  • 8/6/2019 Test Scores and Teacher Selection

    32/32

    Olson, J.F., Martin, M.O., Mullis, I.V.S., & Arora, A. (2008). TIMSS 2007: Technical

    Report: International Association for the Evaluation of Educational Achievement.

    Rivkin, S.G., Hanushek, E.A., & Kain, J.F. (2005). Teachers, schools, and academic

    achievement.Econometrica, 73(2), 417-458.

    Rockoff, J.E. (2004). The impact of individual teachers on student achievement:

    Evidence from panel data. The American Economic Review, 94(2), 247-252.

    Santiago, P. (2002). Teacher demand and supply: Improving teaching quality and

    addressing teacher shortages. OECD Education Working Papers.

    Schacter, J., & Thum, Y.M. (2004). Paying for high-and low-quality teaching.Economics

    of Education Review, 23(4), 411-430.

    SPO. (1989).Altinci bes yillik kalkinma plani 1990-1994. Ankara: SPO.

    TTKB. (2008).lkgretim Matematik Dersi 68 Snflar retim Program ve Klavuzu

    (Teaching Syllabus and Curriculum Guidebook for Elementary school mathematics

    course: Grades 6 to 8). Ankara: Ministry of National Education (MONE)