theme 10 evaluation
DESCRIPTION
Theme 10 Evaluation. In this theme we discuss in detail the topic “ evaluation ” . This is a comprehensive and a complex theme . Therefore , during this session, we discuss only a first part of the overall theme . Evaluation. Definition. Quality criteria. - PowerPoint PPT PresentationTRANSCRIPT
Theme 10Evaluation
In this theme we discuss in detail the topic “evaluation”. This is a comprehensive and a complex theme. Therefore, during this session, we discuss only a first part of the overall theme.
Evaluation
Aggregation level
Quality criteria
Trends in evaluation:Dimensions
Definition
Functions evaluation?
When to evaluate?
Evaluation techniques
Who is responsible
Validity
Reliability
Recency
Authenticity
In relation to “Evaluation”, we will discuss three main themes and related subthemes.
Measuring
Valueing
Scoring
Evaluation
Aggregation level
Quality criteria
Trends in evaluation:Dimensions
Definition
Functions evaluation?
When to evaluate?
Evaluation techniques
Who is responsible
Validity
Reliability
Recency
Authenticity
We start with a focus on the first main theme about the definitionand the concept of evaluation.
Measuring
Valueing
Scoring
Evaluation: the concept
Defining the concept evaluation is a difficult issue since the concept itself only emphasizes one aspect of what evaluation fully embraces; namely the “giving a value” to what is being observed .
As we will see, it also does not help to replace the concept by other popular concepts, such as “assessment”. Again, only one particular aspect of the whole process is being emphasized.
Evaluation: the concept
Read the following description of evaluation: “Evaluation is the entire process of collecting, analysing and interpreting information about potentially every aspect of an instructional activity, with the aim of giving conclusions about the efficacy, efficiency and or any other impact” (Thorpe, 1988).
You can observe that evaluation is a comprehensive process that can be related to potentially every element in our educational frame of reference.
Evaluation: the concept
In the literature, an important distinction is made between evaluation and assessment.• Assessment or “measuring” refers to the process
of collecting and analysing information (Burke, 1999 en Feden & Vogel, 2004)
• Evaluation refers to, as stated earlier, adding a value to what has been collected and analyzed in view of coming to a conclusion about the efficacy, efficiency or any other impact.
But in the literature, an even more detailed distinction is made between:• Measuring/testing: collecting information• Evaluating/valuing: what is this information
worth?• Scoring/grading: depending the « worth », what
score will we giveIt is essential to distinguish these three approaches. One can measure without valuing or scoring. And one cannot score without collecting and valuing information.
Evaluation: the concept
Evaluation
Aggregation level
Quality criteria
Trends in evaluation:Dimensions
Definition
Functions evaluation?
When to evaluate?
Evaluation techniques
Who is responsible
Validity
Reliability
Recency
Authenticity
We now move to the second main theme that centers on quality criteria
Measuring
Valueing
Scoring
Evaluation: quality requirements
Prior to a discussion of recent developments in the field of evaluation, we first deal with some critical quality requirements that are central in discussions about evaluation:• Validity• Reliability • Authenticity• Recency
Evaluation
Quality criteria
Validity
Reliability
Recency
Authenticity
ValidityValidity refers to the extent that the content of what is being measured, valued and scored is related to the initial evaluation objective.Typical questions that are raised in this context are:• What if we only measure geometry, when we want
to come to conclusions about mathematics performance in primary school?
• What if we only get questions from chapter 5 during an exam?
• What if we only ask memorization questions in atest when we also worked in the laboratory andsolved chemistry problems?
Evaluation
Quality criteria
Validity
Reliability
Recency
Authenticity
Reliability
Reliability refers to the extent our measurement is stable. Typical questions raised are:• If I repeat the same test tomorrow, will I get the same
results (stability)?• Is there a large difference in the ability to solve the
different questions about the same topic (internal consistency)?
• If someone else measured, valued and scored the test, would he/she end up with the same results?
Evaluation
Quality criteria
Validity
Reliability
Recency
Authenticity
Authenticity
Authenticity refers to the extent the information we gather, mirrors in a relevant, adequate, and authentic way reality.Examples of related questions:• Is it sufficient to ask student nurses to give injections on a
doll to evaluate their injection skills?• Is it adequate to give a flying license to someone
who was only tested in a flight simulator?• Is it sufficient to say that one is able to “teach”
after evaluating his/her capacities with small group teaching?
Evaluation
Quality criteria
Validity
Reliability
Recency
Authenticity
Recency
Recency questions the “date” information has been collected, valued or scored in view of evaluation:• Can we accept credits obtained 5 years ago from someone
who asks being releaved of courses in a new study program?
• Can we hire a young house mother who got her degree 10 years ago?
• Are the Basic Life Support Skills mastered sixmonths ago, still relevant today in an activefirst aid officer?
Evaluation
Quality criteria
Validity
Reliability
Recency
Authenticity
Evaluation
Aggregation level
Quality criteria
Trends in evaluation:Dimensions
Definition
Functions evaluation?
When to evaluate?
Evaluation techniques
Who is responsible
Validity
Reliability
Recency
Authenticity
From here on, we move to the third main theme in this session about the recent developments in evaluation. Five subthemes are discussed.
Measuring
Valueing
Scoring
Recent developments in evaluationRecent developments in evaluation can be clustered along five dimensions:
• At what aggregation level is the evaluation being set up?
• What are the functions/roles of the evaluation?• Who carries out the evaluation?• When is the evaluation being set up?• What evaluation techniques are being adopted?
We discuss some examples in relation to each dimension.
Dimension 1: aggregation levelsFirstly, we observe that evolutions in evaluation are related to the aggregation levels in our educational frame of reference:• Micro level• Meso level• Macro level
We look – in relation to each aggregation level – to particular new developments.
Evaluation
Aggregation level
Trends dimensions
Functions evaluation?
When to evaluate?
Evaluation techniques
Who is responsible
Micro level
Meso level
Macro level
Dimension 1: aggregation levelsAt each aggregation level, the same elements re-appear. Evaluation can be related to every element in the educational frame of reference• Responsible for the instruction• Learner • Learning activities• Organisation• Context• Instructional activities
(objectives, learning content, media, didactical strategies, evaluation)
Micro levelExample 1: evaluation of the extent the learning
objectives have been attained;Example 2: evaluation of didactical strategies.
Evaluation
Aggregation level
Trends dimensions
Functions evaluation?
When to evaluate?
Evaluation techniques
Who is responsible
Micro levelObjectives
Didactical strategies
Micro level: evaluation learning objectives
During evaluation we measure the behavior, we value the behavior and give a score. The question is “What is the base of giving a certain value?”.• Based on a criterion?
– Criterion referenced assessment• Based on a norm, e.g., group mean?
– Norm referenced assessment• Based on earlier performance of learner?
– Ipsative assessment orself-referenced assessment Micro level
Objectives
Didactical strategies
Example: athletics, 15-year olds have to run 100 meter?• Criterion referenced assessment
– Every performance is compared to an a priori stated criterion; e.g., less than 15 seconds
• Norm referenced assessment– Every performance is compared to the classroom mean
(imagine your are in a class with fast runners). • Ipsative assessment of self-referenced assessment
– Every performance is compared to the earlier performance of the individual learner; emphasis on progress.
Micro level: evaluation learning objectives
Micro levelObjectives
Didactical strategies
Micro level: evaluation instructional strategies
Hattie (2009) discusses in his meta-analysis instructional activities. These analyses look whether different instructional strategies have a differential impact on learners. Do they matter?In the following example you see that the didactical strategy “homework” has an average “effect size” d = .29. This is far below the benchmark d = .40.
Micro levelObjectives
Didactical strategies
Meso level: evaluation at school level
Evaluation
Aggregation level
Trends dimensions
Functions evaluation?
When to evaluate?
Evaluation techniques
Who is responsible
Micro level
Meso level
Macro level
Meso level: evaluation at school level• Recent developments at the school level look
whether “schools” have a value-added; this means an additional value that results in better learning performance.
• But can we simply compare schools with one another? Does this not lead to simple ranking as depicted in this journal
Aggregation levelMeso level
• One cannot simply compare schools.• Calder (1994) puts forward in this context, the CIPP
model to consider everything in balance:– Context evaluation: the geographical position of a school,
the available budget, the legal base, etc.– Input evaluation: what the school actually uses as
resources, its program, its policies, the number and type of staff members, etc.
– Process evaluation: the way a program is implemented, the strategies being used, the evaluation approach, the professional development of the staff, etc.
– Product evaluation: the effects, such as goal attainment, throughput, return on investment, etc.
Meso level: evaluation at school level
Aggregation levelMeso level
• Comparing schools with the CIPP model can as such imply that:– A school with a lot of migrants outperforms a school with
dominantly upper class children.– A school can be good in attaining certain goals, but can be
less qualified in attaining other goals.– A school can be criticized as to its policies.– That one will consider the geographical location of a
school when discussing results (e.g., an unsafe neighbourhood).
– That we will also look at what the learners do later when they go to another school (e.g., success at university).
• Schools are being assessed by the inspection on the base of the CIPP model.
Meso level: evaluation at school level
Aggregation levelMeso level
• The inspection reports are public.
Meso level: evaluation at school level
Macro level: school effectiveness
Evaluation
Aggregation level
Trends dimensions
Functions evaluation?
When to evaluate?
Evaluation techniques
Who is responsible
Micro level
Meso level
Macro level
Macro level: school effectiveness
Read the following description:• “The aim of school effectiveness research is to describe
and explain the differences between schools on the base of specific criteria. This research explores the differences in performance on the base of differences in those responsible for teaching, the learners, the classes, the school.”
You can see that – as in the CIPP model – explanations are sought at the level of all schools in the educational system.
Aggregation levelMacro level
This development started from very critical reports as to the value-added of schools:
– Coleman report (1966, chapter 1): “Schools have little effect on students’ achievement that is independent of their family background and social context.”
– Plowden report (1967, p.35): “Differences between parents will explain more of the variation in children than differences between schools. (…) Parental factors, in fact, accounted for 58% of the variance in student achievement in this study.”
• Schools want – in contrast to these reports – proof they make a difference and contribute to learner performance
Macro level: school effectiveness
Aggregation levelMacro level
A central critique on the Coleman and Plowden report is that they neglect the complex interplay that helps to explain differences between, schools; see the CIPP model.
Instead of simply administering tests and comparing results, we have to look– next to “product effects” – to the processes and variables that are linked to these results. This is labelled with the concept performance indicators.
Macro level: school effectiveness
Aggregation levelMacro level
Macro level: performance indicators• Performance indicators are: "statistical data, numbers,
costs or any other information that measures and clarifies the outcomes of an institution in line with preset goals.“
• You can notice that the emphasis in performance indicators is on the description and explanation of differences in performance.
• One of the best known performance indicator studies is the three-yearly PISA study: Programme for International Student Assessment.E.g., in PISA 2006, performance was compared of schools in 54 countries.
Aggregation levelMacro level
Macroniveau: performance indicators• Results of PISA in 2006 show – for example – the
high performance of Flemish schools for sciences, mathematics, and reading literacy.
Aggregation levelMacro level
PISA results are not only described. They are also explained. In this graphic, one sees how the PISA results are associated with the socio-economic status (SES) of the learners.The higher the status, the higher the results. SES is determined by the educational level of the parents, their income, their possession of cultural goods (e.g., books), etc.
Aggregation levelMacro level
Dimension 2: Functions of evaluationEvaluation
Aggregation level
Quality criteria
Definition
Functions evaluation?
When to evaluate?
Evaluation techniques
Who is responsible
Validity
Reliability
Recency
Authenticity
Trends in evaluation:Dimensions
Measuring
Valueing
Scoring
Dimension 2: Functions of evaluationWhy do we evaluate? There might be different reasons:• Formative evaluation
– To see where one is in the learning process and how we can redirect the learning process
• Summative evaluation– To determine the final attainment of the goals.
• Prediction function– To predict future performance
(e.g., success in higher education)• Selection function
– To see whether one is fit for a job or task.
Functions
Formative
Summative
Selection
Prediction
Dimension 2: Functions of evaluationAbroad, there is a lot of attention for the selection function; see the emphasis on entrance exams.In this example, one sees a lucky candidate (and his mother) who succeeded in the entrance exam for a Chinese university.
Functions
Formative
Summative
Selection
Prediction
Dimension 2: Functions of evaluationEarlier, there was a major emphasis on summative evaluation. Nowadays this emphasis has shifted towards formative evaluation. Why?• Does one learn from evaluative feedback; this is also
called consequential validity genoemd?• From the evaluation results, does this not imply that
the teacher has to redirect the instruction, the support, the learning materials, etc?
• Does a learner already reach a preliminaryattainment level?
Functions
Formative
Summative
Selection
Prediction
Dimension 3: Who is responsible?Evaluation
Aggregation level
Quality criteria
Definition
Functions evaluation?
When to evaluate?
Evaluation techniques
Who is responsible
Validity
Reliability
Recency
Authenticity
Trends in evaluation:Dimensions
Measuring
Valueing
Scoring
Dimension 3: Who is responsible?
Traditionally, the teachers is responsible for the evaluation. But there are new developments:• The learner him/herself carries out the evaluation :
self assessment• The learner and peers carry out the evaluation
together: peer assessment• An external responsible carries out the evaluation
(e.g., other teacher).• An external company carries out
the evaluation: assessment centers• …
Self assessment
Peer assessment
Assessment center
Who
• New development: self assessment• Self-assessment is seen as a type of evaluation that aims
at fostering the learning process (Assessment-as-learning) : formative evaluation function
• Two main steps to be taken:– Initial training to develop criteria and tool, and to discuss the
value of what is being measured.– Next, usage of the tools/instruments
and developing a personal opinion.Scoring is not an issue here.
• Very useful technique: rubrics (see further)
Dimension 3: Who is responsible?
Self assessment
Peer assessment
Assessment center
Who
• Assessment centres: external company that carries out evaluation; mostly with selection function
• “Standardized procedure to assess complex behavior on the base of multiple information bases. The behavior is assessed in simulated contexts. Multiple persons evaluate and come to a shared vision.”
• Different evaluators are involved and guarantee a 360° approach of the evaluation
• This technique fulfills selection function e.g., when screening candidates for a job
Dimension 3: Who is responsible?
Self assessment
Peer assessment
Assessment center
Who
Dimension 4: When to evaluate?
Evaluation
Aggregation level
Quality criteria
Definition
Functions evaluation?
When to evaluate?
Evaluation techniques
Who is responsible
Validity
Reliability
Recency
Authenticity
Trends in evaluation:Dimensions
Measuring
Valueing
Scoring
Dimension 4: When to evaluate?There is a shift in the moment the evaluation is being set up: towards « prior to » and « during » the learning process; serving a formative evaluation function:• Prior
– Prior knowledge testing• During
– Progress testing– Portfolio evaluation
• After– Final evaluation
Dimension 5: What technique?Evaluation
Aggregation level
Quality criteria
Definition
Functions evaluation?
When to evaluate?
Evaluation techniques
Who is responsible
Validity
Reliability
Recency
Authenticity
Trends in evaluation:Dimensions
Measuring
Valueing
Scoring
Dimension 5: What technique?
Next to traditional evaluation tests with multiple choice questions, open answer questions, fill-in questions, sort questions, … we observe a series of new techniques. Examples:• Rubrics: attention is paid to criteria and indicators• Portfolio’s: file with letters, information,
illustrations, products, … as the information base for the evaluation
• …Evaluation techniques
Rubrics
Portfolios
Dimension 5: Technique rubrics
Rubrics:• Define clear criteria:
concrete element of a complex learning objective that is being measured, valued and scored
• Determine for each criterion a number of quality indicators: indicators exemplify the level at which a certain criterion is being met, answered, attained
Evaluation techniques
Rubrics
Portfolios
Dimension 5: Technique RubricsExample rubric: “mixing colours”
In next steps of the learning process, we can add criteria and/or performance indicators to the rubric
1 2 3 4Amount of paint being used? Learner does not
consider the amount of different colours being used
- - Learner uses right from the start minimal amounts of paint to start mixing colours
What colour is mixed first? Starts with the darkest colour to mix
- - Starts with the lightests colour to mix
What order in mixing colours?
…
…
Performance indicators
Criteria
Dimension 5: Technique RubricsExample rubric: “Writing of a historical fiction story”
Evaluation techniques
Rubrics
Portfolios
Dimension 5: Technique Portfolio
Read this description of a portfolio:A portfolio is a file with letters, information, illustrations, products, … that is used as an information base for the evaluation.
Evaluation techniques
Rubrics
Portfolios
Dimension 5: Technique Portfolio• Types of portfolios:
– A document portfolio or product portfolio: documentation that helps to describe the activities in the training, intership, practical experience, … (measurement). In addition to this info, learners can add their reflections (valueing).Typically used with student doctors, nurses, teachers, …
– A process portfolio: a logbook. Documentation of the progress in the learning process, enriched with reflections.Typically used with student doctors, nurses, midwives, teachers, … .
– A showcase portfolio: “the best of …”. Bundle of the best work of a student that helps to come to a conclusion about his/her performance.Typical use in decorative arts, music, theater, architects, … .
Evaluation techniques
Rubrics
Portfolios
Example of aprocess portfolio for student teachers
Dimension 5: Technique Portfolio
Evaluation
Aggregation level
Quality criteria
Trends in evaluation:Dimensions
Definition
Functions evaluation?
When to evaluate?
Evaluation techniques
Who is responsible
Validity
Reliability
Recency
Authenticity
We hope you developed now a first comprehensive picture about evaluation.
Measuring
Valueing
Scoring
Einde van dit instructiepakket
Pak nu de eindtoets aan. Ga opnieuw naar je Minerva werkplek