mfrm to adjust for rater severity leniency

Upload: farah-bahrouni

Post on 06-Apr-2018

231 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 Mfrm to Adjust for Rater Severity Leniency

    1/13

    Sultan Qaboos University

    Language Centre

    MFRM TO ADJUST FOR RATER SEVERITY/LENIENCY

    Presentation for the LC Conference

    by

    Farah Bahrouni/[email protected]

    April 20, 2011

    1Farah Bahrouni/LC Conf./April 20, 2011

    mailto:[email protected]:[email protected]
  • 8/3/2019 Mfrm to Adjust for Rater Severity Leniency

    2/13

    Plan Briefing about MFRM

    Run the analysis for 5 facets: candidate, rater, background ,

    experience & category

    Adjusting scores as per FACETS estimates

    Conclusion

    2Farah Bahrouni/LC Conf./April 20, 2011

  • 8/3/2019 Mfrm to Adjust for Rater Severity Leniency

    3/13

    Student 1

    TA:25 CC:25 LR:25 GR:25 Total: 100

    Mean 19.62132 Mean 19.38971 Mean 18.20956 Mean 16.45588

    Max 25 Max 24 Max 23 Max 22 94

    Min 14 Min 13 Min 14 Min 10 51Range 11 Range 11 Range 9 Range 12 43

    Count 68 Count 68 Count 68 Count 68

    Student 2

    Mean 20.13971 Mean 20.09926 Mean 19.88235 Mean 18.88971

    Max 25 Max 25 Max 25 Max 24 99Min 14 Min 13 Min 12 Min 11 50

    Range 11 Range 12 Range 13 Range 13 49

    Count 68 Count 68 Count 68 Count 68

    Student 3

    Mean 15.16544 Mean 15.79559 Mean 15.48162 Mean 18.88971

    Max 25 Max 23 Max 20 Max 24 92

    Min 10 Min 10 Min 8 Min 11 39

    Range 15 Range 13 Range 12 Range 13 53

    Count 68 Count 68 Count 68 Count 68

    3Farah Bahrouni/LC Conf./April 20, 2011

  • 8/3/2019 Mfrm to Adjust for Rater Severity Leniency

    4/13

    Assessment of language proficiency:Speaking/Writing subjectivity

    a number ofdistinct factors directly orindirectly impinge upon the

    assessment/measurement outcomes.

    These factors are referred to asfacets.

    4Farah Bahrouni/LC Conf./April 20, 2011

  • 8/3/2019 Mfrm to Adjust for Rater Severity Leniency

    5/13

    Afacethas been defined as

    Any factor, variable, or component [e.g. examinees,

    tasks, raters, interviewers, etc] of the

    measurement situation that is assumed to affecttest scores in a systematic way.

    (Backman, 2004; Linacre, 2002; Wolfe & Dobria, 2008, cited in Eckes,2009: 2)

    5Farah Bahrouni/LC Conf./April 20, 2011

  • 8/3/2019 Mfrm to Adjust for Rater Severity Leniency

    6/13

    The error-prone nature of mostmeasurement facets bring about serious

    concerns about both the reliability and

    validity of the obtained scores.

    6Farah Bahrouni/LC Conf./April 20, 2011

  • 8/3/2019 Mfrm to Adjust for Rater Severity Leniency

    7/13

    The usual approaches to deal with rater variability include:

    rater training

    using 2 or more raters in the scoring of performance

    assessment

    call for an adjucator (3

    rd

    /4th

    .. rater, usu. > exp./senior/expert..)

    developing rubrics that spell out the proficiency levels

    identifying anchor papers to provide concrete examples of

    each proficiency level

    (for details see Johnson, et al. 2005, 2003, 2001, 2000)

    7Farah Bahrouni/LC Conf./April 20, 2011

  • 8/3/2019 Mfrm to Adjust for Rater Severity Leniency

    8/13

    Nevertheless, research has found that try as they may,

    none of these methods is effective enough toguarantee reliable objective scores.

    They are diverse enough to raise questions about the

    quality of the resolved scores.

    Underlying these resolution models is the common assumption that

    the discrepant scores might lack the requisite levels of reliability and

    validity, and that adjudication might improve this deficit to someextent (Johnson, et al. 2005 :123).

    8Farah Bahrouni/LC Conf./April 20, 2011

  • 8/3/2019 Mfrm to Adjust for Rater Severity Leniency

    9/13

    As for rater training, it has been found that even

    with proper training, substantial differences

    between raters persist.

    (Linacre, 1990; Hamp-Lyons, 1991; Weigle, 1994, 1998, 2002; Lumley & McNamara ,

    1995; McNamara, 1996; Lumley 2005)

    Raters differences are reduced by training, but do

    persist. (McNamara, 1996: 118 )

    Reason:

    Some see severity much as a personality trait thatis inherently brought to any rating situation.

    (Myford, et all. 2003)

    9Farah Bahrouni/LC Conf./April 20, 2011

  • 8/3/2019 Mfrm to Adjust for Rater Severity Leniency

    10/13

    Multi-facet Rasch Model (MFRM) provides a rich

    set of highly flexible tools to account, and

    compensate, for measurement error, especially

    rater-dependent measurement error.

    It is an extension of the basic Rasch model thatincorporates more facets than the 2 usally included

    in dichotomous item tests, i.e. candidates and

    items.

    10Farah Bahrouni/LC Conf./April 20, 2011

  • 8/3/2019 Mfrm to Adjust for Rater Severity Leniency

    11/13

    Multifaceted Rasch measurement is a stochastic model

    performed using FACETS, a computer program developed

    by Linacre (1989).

    Candidate ability is estimated from all ratings given by all

    raters on all items(Lunz & Wright, 1997; McNamara, 1996: 132).

    Item difficulty (TA,CC,LR & GA) is estimated from all

    responses across all candidates to that item (ibid).

    Rater severity is estimated from all ratings given across

    all candidates and items (ibid).

    11Farah Bahrouni/LC Conf./April 20, 2011

  • 8/3/2019 Mfrm to Adjust for Rater Severity Leniency

    12/13

    Farah Bahrouni/LC Conf./April 20, 2011 12

    In addition, MFRM has 2 more very informative

    functions:

    Bias analysisFit analysis

    These 2 functions enable researchers to look at

    how individual raters, ratees, or traits included in the analysis are performing: (fit

    analysis: z score values between +2 & -2 are usually accepted in contexts similar to ours)

    how the individual elements within the facets interact: individual-level effects of the

    various elements: (bias analysis: z score values between +2 & -2 )

    Thus, source(s) of variation in the scores are efficiently determined.(Myford, et al. 2003; Lunz & Wright, 1997)

  • 8/3/2019 Mfrm to Adjust for Rater Severity Leniency

    13/13

    Conclusion

    Owing to the above features, MFRM has been found a

    model with a great potential to improve our capacity to

    produce objective measures of the ability of test takers

    in performance assessment contexts. It is practical and

    can be used in our context along with the pair rating.

    (Linacre, et al. 1990; Engelhard, 1991, 1992, 1994, 1996; Engelhard & Myford, 2003; Hamp-Lyons, 1991; Lunz

    1996, 1997a, 1997b; Lunz & Wright 1997, Weigle, 1994, 1998, 2002; Schaefer 2003, 2008; Kondo-Brown 2002;Lumley & McNamara 1995, Lumley 2005; McNamara 1991, 1996, 1997, 2000, 2002, 2008; McNamara & Roever,

    2006; Myford et al, 2003, 2004; Shaw & Weir 2007; Wigglesworth, 1993, 1994).

    13Farah Bahrouni/LC Conf./April 20, 2011