philosophies and types of evaluation research - management

Philosophies andtypes of evaluation research

Elliot Stern

In:

Descy, P.; Tessaring, M. (eds)

The foundations of evaluation and impact researchThird report on vocational training research in Europe: background report.

Luxembourg: Office for Official Publications of the European Communities, 2004(Cedefop Reference series, 58)

Reproduction is authorised provided the source is acknowledged

Additional information on Cedefop’s research reports can be found on:http://www.trainingvillage.gr/etv/Projects_Networks/ResearchLab/

For your information:

• the background report to the third report on vocational training research in Europe contains originalcontributions from researchers. They are regrouped in three volumes published separately in English only.A list of contents is on the next page.

• A synthesis report based on these contributions and with additional research findings is being published inEnglish, French and German.

Bibliographical reference of the English version:Descy, P.; Tessaring, M. Evaluation and impact of education and training: the value of learning. Thirdreport on vocational training research in Europe: synthesis report. Luxembourg: Office for OfficialPublications of the European Communities (Cedefop Reference series)

• In addition, an executive summary in all EU languages will be available.

The background and synthesis reports will be available from national EU sales offices or from Cedefop.

For further information contact:

Cedefop, PO Box 22427, GR-55102 ThessalonikiTel.: (30)2310 490 111Fax: (30)2310 490 102E-mail: [email protected]: www.cedefop.eu.intInteractive website: www.trainingvillage.gr

http://www.trainingvillage.gr/etv/Projects_Networks/ResearchLab/

www.cedefop.eu.int

www.trainingvillage.gr

Contributions to the background report of the third research report

Impact of education and training

Preface

The impact of human capital on economic growth: areviewRob A. Wilson, Geoff Briscoe

Empirical analysis of human capital development andeconomic growth in European regionsHiro Izushi, Robert Huggins

Non-material benefits of education, training and skillsat a macro levelAndy Green, John Preston, Lars-Erik Malmberg

Macroeconometric evaluation of active labour-marketpolicy – a case study for GermanyReinhard Hujer, Marco Caliendo, Christopher Zeiss

Active policies and measures: impact on integrationand reintegration in the labour market and social lifeKenneth Walsh and David J. Parsons

The impact of human capital and human capitalinvestments on company performance Evidence fromliterature and European survey resultsBo Hansson, Ulf Johanson, Karl-Heinz Leitner

The benefits of education, training and skills from anindividual life-course perspective with a particularfocus on life-course and biographical researchMaren Heise, Wolfgang Meyer

The foundations of evaluation andimpact research

Preface

Philosophies and types of evaluation researchElliot Stern

Developing standards to evaluate vocational educationand training programmesWolfgang Beywl; Sandra Speer

Methods and limitations of evaluation and impactresearchReinhard Hujer, Marco Caliendo, Dubravko Radic

From project to policy evaluation in vocationaleducation and training – possible concepts and tools.Evidence from countries in transition.Evelyn Viertel, Søren P. Nielsen, David L. Parkes,Søren Poulsen

Look, listen and learn: an international evaluation ofadult learningBeatriz Pont and Patrick Werquin

Measurement and evaluation of competenceGerald A. Straka

An overarching conceptual framework for assessingkey competences. Lessons from an interdisciplinaryand policy-oriented approachDominique Simone Rychen

Evaluation of systems andprogrammes

Preface

Evaluating the impact of reforms of vocationaleducation and training: examples of practiceMike Coles

Evaluating systems’ reform in vocational educationand training. Learning from Danish and Dutch casesLoek Nieuwenhuis, Hanne Shapiro

Evaluation of EU and international programmes andinitiatives promoting mobility – selected case studiesWolfgang Hellwig, Uwe Lauterbach,Hermann-Günter Hesse, Sabine Fabriz

Consultancy for free? Evaluation practice in theEuropean Union and central and eastern EuropeFindings from selected EU programmesBernd Baumgartl, Olga Strietska-Ilina,Gerhard Schaumberger

Quasi-market reforms in employment and trainingservices: first experiences and evaluation resultsLudo Struyven, Geert Steurs

Evaluation activities in the European CommissionJosep Molsosa

Philosophies andtypes of evaluation research

Elliot Stern

AbstractThis chapter considers different types of evaluation in vocational education and training (VET). It does sofrom two standpoints: debates among evaluation researchers and the way contexts of use and evalua-tion capacity shape evaluation in practice. The nature of VET as an evaluation object is discussed andtheories of evaluation are located in wider debates about the nature of knowledge and philosophies ofscience. The various roles of evaluation in steering and regulating decentralised policy systems arediscussed, as is the way evaluation itself is regulated through the development of standards and profes-sional codes of behaviour.

1. Introduction 12

1.1. Scope of this chapter 12

2. Can evaluation be defined? 13

2.1. Assessing or explaining outcomes 13

2.2. Evaluation, change and values 14

2.3. Quantitative and qualitative methods 15

2.4. Evaluation types 15

2.5. A focus on results and outcomes 17

2.6. Participatory methods and devolved evaluations 18

2.7. Theory and practice 19

3. The object of VET evaluation 21

3.1. The domain of VET 21

3.2. Overarching characteristics of evaluation objects 22

4. Philosophical foundations 24

4.1. Positivism, observation and theory 24

4.2. Scientific realism 24

4.3. Constructivists 26

5. Evaluation theory 30

5.1. Programme theory 30

5.2. Theory based evaluation and theories of change 31

5.3. A wider theoretical frame 32

6. Determinants of evaluation use 34

6.1. Instrumental versus cumulative use 34

6.2. Process use of evaluation 34

6.3. The importance of methodology 35

6.4. Evaluation and learning 35

6.5. The institutionalisation of evaluation 36

6.6. Organisation of evaluation in public agencies 37

6.7. Evaluation as dialogue 38

6.8. Strategies and types of evaluation use 39

7. Evaluation standards and regulation 40

7.1. Evaluation codes and standards 40

7.2. Evaluation as a method for regulating decentralised systems 43

8. Conclusions and future scenarios 45

List of abbreviations 47

References 48

Table of contents

List of tables

TablesTable 1: Overlaps between methodology and purpose 16

Table 2: Evaluation types 18

Table 3: Rules guiding realistic enquiry and method 26

Table 4: Comparing constructivist and ‘conventional’ evaluation 27

Table 5: Grid for a synthetic assessment of quality of evaluation work 42

Until quite recently, evaluation thinking has beencentred in North America. Only over the last tenyears have we seen the growth and spread ofevaluation in Europe. This has been associatedwith a significant expansion of evaluations sup-ported by the European Union (EU), especially inrelation to Structural Funds (European Commis-sion, 1999), and the establishment of evaluationsocieties at Member State and European levels.There has also been the beginnings of a traditionof evaluation publishing in Europe, including theemergence of a major new evaluation journaledited for the first time from a European base andwith a high proportion of European content. Manyfactors account for the growth in evaluation activi-ties in Europe in recent years (Leeuw et al., 1999;Rist et al., 2001; Toulemonde, 2001). Theseinclude both structural and management consider-ations. Expenditure pressures, both at national andEuropean level, have increased demands forimproved performance and greater effectivenesswithin the public sector. Furthermore, public actionis becoming increasingly more complex both interms of the goals of programmes and policiesand the organisational arrangements throughwhich they are delivered.

The decentralisation of public agencies,together with the introduction of results basedmanagement and other principles commonlydescribed under the heading ‘new public manage-ment’, have created new demands for account-ability; these are often in multi-agency and part-nership environments. Without the bottom line offinancial measures to judge success, new ways ofdemonstrating impacts and results are beingdemanded; so are new ways of regulating andsteering decentralised systems. We see evaluationnowadays not only applied to programmes orpolicy instruments but also built into the routinesof administration. This is often associated withstandards that are set by policy-makers in relationto the performance of those expected to deliverpublic services. Standards, a concept central toevaluation, have also come to be applied to evalu-ation itself. Even evaluation is not free from thedemands to deliver reliable, high quality output.

1.1. Scope of this chapter

It is against this background that this chapter on‘types and philosophies’ of evaluation has beenprepared. The main sections are as follows.

Chapter 2 begins by seeking to define, or, moreaccurately, review, attempts to define evaluationthrough the work of various scholars and experts. Itallows us to clarify the main types of evaluationthat are in use in different institutional and adminis-trative settings, even though the writing of scholarsand experts focus on ‘pure types’ when comparedwith evaluation as practised. This section alsobegins to highlight issues related to standards andtheir role in evaluation more broadly.

Chapter 3 then considers the nature of evalua-tion in the context of vocational education andtraining (VET) and addresses the question: whatcharacterises evaluation in this domain; is it in anyway distinctive? The section includes both specificconsideration of VET and more general character-istics of evaluation objects and configurations.

Chapter 4 seeks to locate evaluation theorywithin the broader setting of the nature of theoryin the philosophy of science. Many of the debatesin evaluation are shaped by, and reflect, thesewider debates.

Evaluation theory – narrowly conceived – is thenreviewed in Chapter 5. In many ways theory withinevaluation (as will be discussed) is a very particularconstruction, though this does not invalidate widerunderstandings of the role of theory.

The reality and practice of evaluation are thenconsidered in Chapter 6, bringing together asubstantial body of research into evaluation useand institutionalisation in order to understandbetter different types of evaluation in situ.

Evaluation standards and codes of behaviourand ethics for evaluators are reviewed inChapter 7, drawing on experience in NorthAmerica, Australasia and, more recently, Europe.

Finally, in Chapter 8, the discussion returns tothe question of evaluation standards and the rolethey play both in the evaluation process and ingovernance and regulatory processes to steerinstitutions and promote policies and reforms.

1. Introduction

There are numerous definitions and types of eval-uation. There are, for example, many definitionsof evaluation put forward in handbooks, evalua-tion guidelines and administrative procedures, bybodies that commission and use evaluation. All ofthese definitions draw selectively on a widerdebate as to the scope and focus of evaluation. Arecent book identifies 22 foundation models for21st century programme evaluation (Stufflebeam,2000a), although the authors suggest that asmaller subset of nine are the strongest. Ratherthan begin with types and models, this chapterbegins with an attempt to review and bringtogether the main ideas and orientations thatunderpin evaluation thinking.

Indicating potential problems with ‘definition’by a question mark in the title of this sectionwarns the reader not to expect straightforward orconsistent statements. Evaluation has grown upthrough different historical periods in differentpolicy environments, with inputs from many disci-plines and methodologies, from diverse valuepositions and rooted in hard fought debates inphilosophy of science and theories of knowledge.While there is some agreement, there is alsopersistent difference: evaluation is contestedterrain. Most of these sources are from NorthAmerica where evaluation has been established –as a discipline and practice – and debated for 30or more years.

2.1. Assessing or explainingoutcomes

Among the most frequently quoted definitions isthat of Scriven who has produced an evaluationThesaurus, his own extensive handbook of evalu-ation terminology: ‘“evaluation” refers to theprocess of determining the merit, worth or valueof something, or the product of that process […]The evaluation process normally involves someidentification of relevant standards or merit, worthor value; some investigation of the performanceof evaluands on these standards; and some inte-

gration or synthesis of the results to achieve anoverall evaluation or set of associated evalua-tions.’ (Scriven, 1991; p. 139).

This definition prepares the way for what hasbeen called ‘the logic of evaluation’(Scriven, 1991; Fournier, 1995). This logic isexpressed in a sequence of four stages:(a) establishing evaluation criteria and related

dimensions;(b) constructing standards of performance in

relation to these criteria and dimensions;(c) measuring performance in practice;(d) reaching a conclusion about the worth of the

object in question.This logic is not without its critics (e.g.

Schwandt, 1997) especially among those of anaturalistic or constructivist turn who cast doubton the claims of evaluators to know, to judge andultimately to control. Other stakeholders, it isargued, have a role and this changed relationshipwith stakeholders is discussed further below.

The most popular textbook definition of evalu-ation can be found in Rossi et. al.’s book Evalua-tion – a systematic approach: ‘Program evalua-tion is the use of social research procedures tosystematically investigate the effectiveness ofsocial intervention programs. More specifically,evaluation researchers (evaluators) use socialresearch methods to study, appraise, and helpimprove social programmes in all their importantaspects, including the diagnosis of the socialproblems they address, their conceptualizationand design, their implementation and administra-tion, their outcomes, and their efficiency.’ (Rossiet al., 1999; p. 4).

Using words such as effectiveness rather thanScriven’s favoured ‘merit worth or value’ begins toshift the perspective of this definition towards theexplanation of outcomes and impacts. This ispartly because Rossi and his colleagues identifyhelping improve social programmes as one of thepurposes of evaluation. Once there is an intentionto make programmes more effective, the need toexplain how they work becomes more important.Yet, explanation is an important and intentionally

2. Can evaluation be defined?

The foundations of evaluation and impact research14

absent element in Scriven’s definitions of evalua-tion: ‘By contrast with evaluation, which identifiesthe value of something, explanation involvesanswering a Why or How question about it or acall for some other type of understanding. Often,explanation involves identifying the cause of aphenomenon, rather than its effects (which is amajor part of evaluation). When it is possible,without jeopardizing the main goals of an evalua-tion, a good evaluation design tries to uncovermicroexplanations (e.g. by identifying thosecomponents of the curriculum package that areproducing the major part of the good or badeffects, and/or those that are having little effect).The first priority, however, is to resolve the evalua-tion issues (is the package any good at all, thebest available? etc.). Too often the research orien-tation and training of evaluators leads them to doa poor job on evaluation because they becameinterested in explanation.’ (Scriven, 1991, p. 158).

Scriven himself recognises that one pressuremoving evaluation to pay greater attention toexplanation is the emergence of programmetheory, with its concern about how programmesoperate so that they can be improved or betterimplemented. A parallel pressure comes from theuptake of impact assessment associated with thegrowth of performance management and othermanagerial reforms within public sector adminis-trations. The intellectual basis for this work wasmost consistently elaborated by Wholey andcolleagues. They start from the position that eval-uation should be concerned with the efficiencyand effectiveness of the way governments deliverpublic services. A core concept within thisapproach is what is called ‘evaluability assess-ment’ (Wholey, 1981). The starting point for thisassessment is a critical review of the logic ofprogrammes and the assumptions that underpinthem. This work constitutes the foundation formost of the thinking about programme theory andlogical frameworks. It also prefigures a later gener-ation of evaluation thinking rooted more in policyanalysis that is concerned with the institutionalisa-tion of evaluation within public agencies (Boyleand Lemaire, 1999), as discussed further below.

These management reforms generally linkinterventions with outcomes. As Rossi et al.recognise, this takes us to the heart of broaderdebates in the social sciences about causality:

‘The problem of establishing a program’s impactis identical to the problem of establishing that theprogram is a cause of some specified effect.Hence, establishing impact essentially amountsto establishing causality.’ (Rossi et al., 1999).

The difficulties of establishing perfect, ratherthan good enough, impact assessments arerecognised by Rossi and colleagues. This takesus into the territory of experimentation and causalinference associated with some of the most influ-ential founders of North American evaluationssuch as Campbell, with his interest in experi-mental and quasi-experimental designs, but alsohis interest in later years in the explanatorypotential of qualitative evaluation methods. Thedebate about experimentation and causality inevaluation continues to be vigorously pursued invarious guises. For example, in a recent authori-tative text on experimentation and causal infer-ence, (Shadish et al., 2002) the authors begin totake on board contemporary criticisms of experi-mental methods that have come from the philos-ophy of science and the social sciences moregenerally. In recent years, we have also seen asustained realist critique on experimentalmethods led in Europe by Pawson and Tilley(1997). But, whatever their orientations to experi-mentation and causal inference, explanationsremain at the heart of the concerns of an impor-tant constituency within evaluation.

2.2. Evaluation, change andvalues

Another important strand in evaluation thinkingconcerns the relationship between evaluation andaction or change. One comparison is between‘summative’ and ‘formative’ evaluation methods,terms also coined by Scriven. The formerassesses or judges results and the latter seeks toinfluence or promote change. Various authorshave contributed to an understanding of the roleof evaluation and change. For example, Cron-bach (1982, 1989) rooted in policy analysis andeducation, sees an important if limited role forevaluation in shaping policy ‘at the margins’through ‘piecemeal adaptations’. The role of eval-uation in Cronbach’s framework is to inform poli-cies and programmes through the generation of

Philosophies and types of evaluation research 15

knowledge that feeds into the ‘policy shapingcommunity’ of experts, administrators andpolicy-makers. Stake (1996) on the other hand,with his notion of ‘responsive evaluation’, seesthis as a ‘service’ to programme stakeholdersand to participants. By working with those whoare directly involved in a programme, Stake seesthe evaluator as supporting their participation andpossibilities for initiating change. This contrastswith Cronbach’s position and even more stronglywith that of Wholey (referred to earlier) givenStake’s scepticism about the possibilities ofchange at the level of large scale national (or inthe US context Federal and State) programmesand their management. Similarly, Patton, (1997and earlier editions) who has tended to eschewwork at programme and national level, shareswith Stake a commitment to working with stake-holders and (local) users. His concern is for‘intended use by intended users’.

Virtually everyone in the field recognises thepolitical and value basis of much evaluationactivity, albeit in different ways. While Stake,Cronbach and Wholey may recognise the impor-tance of values within evaluation, the values thatthey recognise are variously those of stake-holders, participants and programme managers.There is another strand within the general orienta-tion towards evaluation and change which isdecidedly normative. This category includesHouse, with his emphasis on evaluation for socialjustice and the emancipatory logic of Fettermanet al. (1996) and ‘empowerment evaluation’.Within the view of Fetterman and his colleagues,evaluation itself is not undertaken by externalexperts but rather is a self-help activity in which –because people empower themselves – the roleof any external input is to support self-help. So,one of the main differences among those evalua-tors who explicitly address issues of programmeand societal change is in terms of the role of eval-uators, be they experts who act, facilitators andadvocates, or enablers of self help.

2.3. Quantitative and qualitativemethods

Much of the literature that forms the foundation ofthe explanatory strand in evaluation is quantita-tive, even though we have noted that the later

Campbell began to emphasise more the impor-tance of qualitative understanding. However,among those concerned with formative, respon-sive and other change-oriented evaluations, thereis a predominance of qualitative methods. Thescope of ‘qualitative’ includes processes as wellas phenomena, which by their nature requirequalitative description. This would include, forexample, the means through which a programmewas being implemented as well as the dynamicsthat occur during the course of evaluation (e.g.learning to do things better, improving proce-dures, overcoming resistance).

Stake emphasises qualitative methods. He isoften associated with the introduction of casestudies into evaluation practice, although he alsoadvocates a full range of observational, interviewand conversational techniques (Stake, 1995).Patton’s commitment to qualitative methods isreinforced by his interest in the use byprogramme managers of the results of evalua-tions. ‘They must be interested in the stories,experiences and perceptions of programmeparticipants’ (Patton, 2002; p. 10).

2.4. Evaluation types

After this ‘tour’ around some of the main argu-ments and positions in evaluation, it becomespossible to return to the matter of definition andtypes of evaluation. This is not a simple or singledefinition but types of evaluation can be seen tocohere around two main axes. The first axis ismethodological and the second concernspurposes.

In terms of methodologies, looking across thedifferent approaches to evaluation discussed above,we can distinguish three methodological positions:(a) the criteria or standards based position,

which is concerned with judging success andperformance by the application of standards;

(b) the causal inference position, which isconcerned with explaining programmeimpacts and success;

(c) the formative or change oriented position,which seeks to bring about improvementsboth for programmes and for those whoparticipate in them.

Alongside these methodological distinctionsare a series of definitions that are concerned


Evaluation for the purpose of accountabilitytends to be concerned with criteria and stan-dards (or indicator studies). Development evalua-tions use change oriented methods to pursue thedesired improvements in programme delivery.Evaluations for the purpose of knowledgeproduction are often concerned with drawingcausal inference from evaluation data. Finally,evaluations for the purposes of social improve-ment are also preoccupied with change orientedmethods, though to improve the circumstancesof programme participants and citizens ratherthan programme management per se. However,this is not to suggest a one-to-one associationbetween methodologies and purposes.

In the world of evaluation in practice, there arealso incompatibilities and tensions. Thus, theaccountability driven goal of evaluation often sits

alongside, and sometimes competes with,management and delivery logic. Evaluation isoften seen by programme managers as a meansof supporting improved effectiveness of imple-mentation. In many institutional settings, fundsare committed for evaluation to meet account-ability purposes (Vedung, 1997), but are spentmainly for managerial, formative and develop-mental purposes. Nor is causal inference alwaysabsent from evaluation purposes concerned withsocial improvement. Nonetheless, the clusteringsrepresented in Table 1 do summarise the maintypes of evaluation. These are:(a) accountability for policy-making evaluations

that rely on criteria, standards and indicators;(b) development evaluations that adopt a change

orientated approach in order to improveprogrammes;

with evaluation purposes. Distinguishing evalua-tion in terms of purpose has been taken up bymany authors including Vedung (1997), evalua-tors at the Tavistock Instititute (Stern, 1992;Stern et al., 1992) and Chelimsky (1995, 1997).Most of these authors distinguish betweendifferent evaluation purposes that are clearlyconsistent with the overview presented above.Along this axis, we can distinguish between thefollowing purposes:(a) accountability, where the intention is to give

an account to sponsors and policy-makers

of the achievements of a programme orpolicy;

(b) development, where the intention is toimprove the delivery or management of aprogramme during its term;

(c) knowledge production, where the intention isto develop new knowledge and understanding;

(d) social improvement, where the intention is toimprove the situation of the presumed benefi-ciaries of public interventions.

There is a degree of correlation between thesetwo axes as Table 1 suggests.

Purposes Methodology

Criteria and standards Causal inference Change orientation

Outcome and impactAccountability evaluations.

Mainly summative

DevelopmentFormative evaluation of programmes

‘What works’ – Knowledge production improving future

policy/practice

Social improvementEmpowerment and participative evaluations

Table 1: Overlaps between methodology and purpose

(c) knowledge production evaluations that areconcerned to establish causal links explana-tions and valid knowledge;

(d) social improvement evaluations that seek toimprove the circumstances of beneficiaries bydeploying change, advocacy and facilitationskills.

Already implicit in the above discussions anddefinitions is the dimension of time. Evaluations forthe purpose of accountability tend to occur at theend of a programme cycle. Development orientedevaluations tend to occur while the programme isongoing, and knowledge production evaluationscan continue long after the initial programme cyclehas ended. Wholey’s concept of evaluabilityfocuses attention on the initial programme logicwhile Cronbach’s interest in the ‘policy shapingcommunity’ carries over into the long term and theperiods of transition between one programme andanother. Notions of ex-ante evaluation (andappraisal or needs analysis), ongoing or mid-termevaluations, and ex-post evaluations that havebeen adopted as a basic framework by the Euro-pean Commission and other agencies, derive fromthese different understandings of when evaluationactivity is most relevant.

The main types of evaluation identified abovecan be further elaborated in terms of the kinds ofquestions they ask, the stakeholders that areincluded and the focus of their activities.

Accountability for policy-making, evaluationmeets the needs of external stakeholders whorequire the delivery of programme or policyoutputs. Management may also demandaccountability but here we mean externalmanagement rather than management internal toa programme or policy area. This has become adominant form of evaluation in public administra-tions, consistent with the growth of performancemanagement philosophies more generally. Evalu-ations of this type tend to occur at the end of aprogramme or policy cycle and focus on results.

Development evaluation follows the lifecycle ofan initiative with the aim of improving how it ismanaged and delivered. These evaluations aremore likely to meet the needs of internalmanagers and partners rather than externalstakeholders. Formative evaluations and processevaluations tend to fall into this category.

Knowledge production evaluation is mainlyconcerned with understanding in the longer term.These evaluations often seek to synthesise under-standing coming from a number of evaluations.

While both of the previous evaluation types areexpected to affect current programme learningand knowledge production, this type looks toapply lessons to future programmes and policies.

Social improvement evaluation can take manyforms. Many social and economic programmesdepend for their success on consensus amongthe intended beneficiaries. Participative evalua-tions that seek to involve target groups contributeto the development of consensus and consent.This type of evaluation may also take on an advo-cacy role: promoting certain interests or groups.It is within this evaluation purpose thatprogramme beneficiaries are most likely to bedirectly involved, not merely consulted.

These different evaluation types, can be furtherelaborated, in terms of the following questions: (a) who are the stakeholders?(b) what is the focus of the evaluation?(c) what are the main approaches and methods?(d) what are the key questions that can be

asked?Table 2 presents the main elements of the four

evaluation types in relation to these questions.However, we would not wish to suggest that

these types do full justice to the diversity of eval-uation models; rather they summarise the mainhigh level differences. It is possible, for example,to see the emergence of sometimes contradictoryevaluation subtypes in recent years. Two exam-ples of these are outcome focused evaluationsand participation focused evaluations.

2.5. A focus on results andoutcomes

The concern that public interventions should leadto specific and measurable results is mirrored inthe development of evaluation practice. Incomplex socioeconomic programmes in partic-ular, the tendency is often to focus on interme-diate outcomes and processes of implementa-tion. Sometimes this is inevitable, when the finalresults of interventions will only be discernible inthe long term. Contemporary models of publicmanagement create a demand for methods thatfocus on results and there has been considerableinvestment in such methodologies in recentyears. These methodologies tend to be in threeareas.


The first deals with systematic reviews.Reaching policy conclusions and taking actionson the basis of the evaluation of single projects,or even programmes, has for long been criticised.The evidence-based policy movement works onthe assumption that it is necessary to aggregatethe results of different evaluations throughsystematic reviews in order to produce reliableevidence.

Next is results based management. This is nowa feature of most public management systemsand can be variously expressed in terms oftargets, league tables, payment-by-results andoutcome funding. Within the Commission therehas been a move in this direction, under the labelof activity based management. It is also theunderlying principle of the performance reservewithin the Structural Funds and relevant tocurrent debates about impact assessment.

Finally there are macro and micro economicmodels. These seek to simulate the relationshipbetween key variables and explain outputsthrough a mixture of available data and assumed

causal relationships. Such models are especiallyuseful where data sources are incomplete andresults have to be estimated rather than preciselymeasured.

2.6. Participatory methods anddevolved evaluations

There is a general tendency in programme andpolicy evaluation for multiple stakeholder andcitizen involvement. These general developmentshave led to a spate of innovations among evalua-tors, who are now able to draw on an extensiverepertoire of participative methods and tech-niques, many of them pioneered in internationaldevelopment contexts. They include: rapidappraisal methods, empowerment evaluation,methods for involving stakeholders, and user-focused evaluations.

Evaluation is often seen as an instrument fordeveloping social consensus and strengthening


Table 2: Evaluation types

Purpose Stakeholder Focus Main evaluation Key questions approaches

Accountability Parliaments, Ministers, Impacts, outcomes, Indicators, What have been for policy-makers funders/sponsors, achievement of targets, performance the results?

Management Boards value for money measures, value for Are they intended money studies, or unintended?quantitative surveys Are resources

well-used?

Development Project coordinators. Identifying constraints. Relating inputs to How well is the for programme Partner organisations. How they should be outputs, qualitative programme being improvement Programme managers overcome? Delivery description, following managed?

and implementation processes over time Can it be strategies. implemented better?

Knowledge Programme planners, Dissemination of Experimental and What is being learnt?production and policy-makers. good practice. quasi-experimental Are there lessons explanation Academics What works? studies, case studies, that can be applied

Organisational change systematic reviews elsewhere?and syntheses How would we do

it next time?

Social improvement Programme To ensure full Stakeholder What is the best and social change beneficiaries and involvement, influence involvement, way to involve

civil society and control by citizens participative reviews, affected groups?and affected groups. advocacy How can equal

opportunities and social inclusion be ensured?

social cohesion. The expectation is that owner-ship and commitment by citizens to public policypriorities will be maximised when they have alsobeen involved in setting these priorities and eval-uating the outcomes of interventions. There is astrong managerial logic within large scale decen-tralised programmes to use evaluation as aninstrument to strengthen programme manage-ment by diffusing the culture of evaluation amongall programme participants.

This also focuses attention more generally onwho undertakes evaluations and where is evalua-tion located? Among the types outlined above,the assumption is that some outside expert occu-pies the evaluator role. Already within the partici-pative subtype just referred to, the role of theevaluator is far less prominent. The role of theevaluator – as orchestrator, facilitator and enabler –is further elaborated in the discussion of construc-tivist evaluation in a later section of this chapter.However, even within other types of evaluationthere are different possible types of operationali-sations and locations of the evaluation role. Oneimportant variant is devolved evaluation.

It is becoming increasingly common for evalu-ation to become a devolved ‘obligation’ forprogramme beneficiaries. In the European Struc-tural Funds, requirements for ex-ante andmid-term evaluations are now explicitly theresponsibility of Member States and monitoringcommittees. The same is true of internationaldevelopment aid within the CEC, where projectevaluation is consistently devolved to beneficia-ries. Often, those who evaluate on such a‘self-help’ basis are required to undertake theevaluations and must demonstrate that theyincorporate and use findings. In fact, the ‘devolu-tion chain’ is far more extended. In the EuropeanStructural Funds, monitoring committees willoften require beneficiaries and programmemanagers to conduct their own evaluations. Thesame is true for national programmes. In the UKfor example, ‘local evaluation’ conducted byprojects within a programme are the norm. Theseare variously intended to focus on local concerns,inform local management and generate data thatwill be useful for accountability purposes.

These intentions are not conflict-free. Forexample, top-down demands by the EU or bycentral governments can easily undermine thelocal focus on local needs (Biott and Cook, 2000).

Nonetheless, the role of devolved evaluation inthe management and ‘steering’ of programmeshas been a noticeable trend over the last tenyears. By requiring programme participants toclarify their priorities, collect information, interpretfindings and reflect on the implications, it isassumed that programme management at asystemic level will be improved.

2.7. Theory and practice

It has not been the intention to focus on evaluationpractice in this chapter. However, it is worthreflecting briefly on how evaluation practice relatesto some of the main debates outlined above:(a) evaluations in the public sector are firmly

within ‘accountability’ and ‘programmemanagement’ purposes (e.g. Nagarajan andVanheukelen, 1997);

(b) the notion of ‘goal-free’ evaluation that doesnot start from the objectives of programmeshas never been favoured by public adminis-trations in Europe or elsewhere. Althoughthere is often scope to examine overallimpacts and to consider ‘unintentional conse-quences’ the design of most evaluations isfirmly anchored around goals and objectives;

(c) there is a trend to take on board evaluationcriteria such as relevance, efficiency, effec-tiveness, impact and sustainability (the nowstandard World Bank and OECD criteria). Inthe EU guide referred to above, these areapplied as evaluative judgements, in relationto programme objectives and in relation tosocioeconomic problems as they affect targetpopulations;

(d) there is sometimes confusion betweeneconomic appraisal and evaluation. In mostpublic administrations judgements have to bemade, before new policy initiatives arelaunched, on whether to proceed or not (e.g.the UK Treasury’s Green Book and recent EUguidance on impact assessment). For manyeconomists, this pre-launch appraisal is seenas the same as evaluation. In general it issensible to confine the term evaluation towhat happens once a programme or policyhas been decided on;

(e) macro-economic methods in particular aremore difficult to apply, when the resource


input is relatively small. Where policy inputsare large scale and can be isolated from otherinputs (e.g. in Objective 1, but not Objective 2or 3 within EU Structural Funds), they can bemore easily applied;

(f) the status of stakeholders has undoubtedlybeen enhanced in most evaluations in recentyears. However, the role of stakeholders isgenerally as informants rather than sources ofevaluative criteria, let alone as judges of meritand worth. There is considerable scope withindecentralised and devolved evaluationsystems for participative and constructivistapproaches. What is more common is closeworking with stakeholder to define criteria for

evaluation and contribute to consensusprocess;

(g) the boundaries between research and evalua-tion remain clouded. Many studies commis-sioned as evaluation contribute to knowledgeproduction and are indistinguishable fromresearch. Various distinctions have beenproposed, including the short- rather thanlong-term nature of evaluation and its mainlyinstrumental intent. However, few of thesedistinctions are watertight. For example, manyof the elements within this overall study couldbe defined as research and, arguably, once aresearch-generated study is deployed forevaluative purposes, its character changes.


What is evaluated is a factor in how evaluation ispractised. The object of evaluation is different indifferent domains (health, transport, education,vocational training, etc.) and this partly shapeswhat we call evaluation in these various domains.At the same time, there are overarching charac-teristics of evaluation objects that are similaracross domains. In this section, we consider bothapproaches to the object of evaluation; those thatfollow from the nature of the domain and thosethat follow from the characteristics of what isbeing evaluated.

3.1. The domain of VET

VET is a broad field that, at a minimum, includesinitial vocational training, continuing vocationaltraining, work-based learning and VET systems.

However, this selection understates the scopeof VET as an object of evaluation. VET hasbecome more complex and multi-faceted overthe years as can be seen in previous Cedefopreports on vocational training research in Europe.This is mainly because there has been a shiftfrom decontextualised studies of impact tostudies that increasingly incorporate context. SoVET, even at the level of the firm, is seen as beingembedded in other corporate policies and proce-dures such as marketing, the organisation ofproduction, supervisory and managerial practiceand human resource management. In order todescribe, let alone explain, the impact of VET, thisbroader set of factors needs to be considered.The same is true for policy level interventions. Forexample, what is called ‘active labour-marketpolicies’, especially for those who aremarginalised in the labour market, usuallyincludes VET, but this is embedded in a raft ofother policies including subsidies to employers,restructuring of benefits and new screening andmatching processes.

This recontextualisation of the objects of eval-uation is happening across many fields of evalua-tive enquiry. Evaluations of health are no longerconfined to studies of illness. The ‘new public

health’ incorporates environmental, lifestyle andpolicy elements alongside data on illness topicssuch as morbidity and mortality. Similarly, evalua-tion in education is now more likely to includelearning processes, socioeconomic and culturalfactors and broader pedagogic understandingsalongside studies of classroom behaviour. It isprobably more useful to think of evaluationconfigurations as composites of contingent eval-uation objects rather than a single evaluationobject.

As we shall see below, methodological devel-opments within evaluation mirror this contextuali-sation. There is a move away from ceteris paribusassumptions, to focus increasingly on impacts incontext. It is likely that the broadening concep-tion of evaluation configurations such as VET isthe result of new methods and theories helpingredefine the core concept. As is often the case,methods and methodologies interact with corecontent, which they also help to shape.

Overall, most classes of evaluation object canbe found under the umbrella of VET and it is notpossible to associate the evaluation and impactof VET with a particular type of evaluation objector configuration. What is clear is that VET, as anobject of evaluation, calls on a vast range ofdisciplinary understandings, levels of analysisand potential areas of impact. The importance ofinterdisciplinary evaluation efforts is highlightedby this discussion.

The scope of VET itself is further complicatedby the different understandings of impact thatcharacterise the field. We have particular studiesof the impact of continuing vocational training(CVT): on company performance; on active labourmarkets; on individual employment and payprospects; pedagogic methods as they influencelearning-outcomes and competences; VETsystem reform affecting training outcomes; andknowledge and qualifications as an influence onnational economic performance.

It is these clusters of interest – the preoccupa-tions of a domain at any given time – that circum-scribe the object of evaluation. It is the sets ofobjects and understandings around what has

3. The object of VET evaluation

been called configurations that best describeswhat distinguishes evaluation of VET from otherdomains. The impacts of CVT on companyperformance, the way in which it is possible toimprove initial vocational training throughchanging qualification systems, and how VETaffects economic performance and social integra-tion are all examples of what defines the object ofevaluation within VET. Such preoccupations alsochange over time. It is worth adding that suchpreoccupations are also encapsulated in theoret-ical form. Topical theories – such as social exclu-sion, human capital, cultural capital, corporateinnovation – will be widely accepted in the VETdomain as in others. Today’s theories also helpdefine the evaluation object (see below for moregeneral discussion of theory in evaluation).

3.2. Overarching characteristicsof evaluation objects

Although it is not possible within the scope of thischapter to offer a full typology of evaluationobjects, it is worth highlighting the kinds of differ-ences that occur not only in the evaluation of VETbut also in many other evaluation domains. Thereare many ways in which evaluation configurationscan be differentiated; for simplicity’s sake thefollowing examples concentrate on commondimensions such as similarity or difference, moreor less, etc. Of course, there are also much morecomplex descriptions of evaluation configurations.

There are a number of important dimensions ofevaluation configurations, including input charac-teristics. Most programmes are operationalisedthrough inputs or policy instruments; in VET theseinclude new curricula, new forms of funding forenterprise-based training or new training courses.Such inputs may be standardised across aprogramme or may be more or less diverse. It is,for example, common for inputs to be carefullytailored to individual, local or enterprise needs. Thiswill have consequences for sampling and scale ofan evaluation. More seriously it will have implica-tions for the possibilities of generalisations that canbe made on the basis of evaluation findings.

Another dimension is the immediate context.The context or setting within which an input islocated can also be relatively standardised orrelatively diverse. This statement might apply at a

spatial level (characteristics of the area) or interms of the context of delivery or the institutionalsetting within which programmes are located orpolicies are expected to have an impact. In VET,the relevant context may be a labour market, atraining provider or an enterprise. A highly diver-sified initiative may be located across differentkinds of contexts and, even within a singlecontext, there may be considerable variety. Thediversity or standardisation of the immediatecontext will have many implications, in particularfor how policies and programmes are imple-mented and how much effort needs to bedevoted to the evaluation of implementation.

Modes of delivery are also important since thesame input or instrument can be delivered in verydifferent ways. For example, a needs analysismay be undertaken through a local survey aspart of the recruitment process of potentialtrainees or by a company reanalysing itspersonnel data. Nowadays it would be morecommon for programmes to be delivered throughpartnership arrangements rather than through asingle administrative chain. This will often be thecase, for example, in VET measures deliveredthrough EU Structural Funds.

Settings need considerations as well giventhe embedded and contextualised nature ofmany evaluation objects and that isolated evalu-ation objects are increasingly rare. With concep-tualisations that incorporate context, evaluationobjects have a tendency to become configura-tions. A classic example of an evaluation objectthat is presumed to be isolated is classroom-based studies that ignore the overall schoolcontext or the socioeconomic characteristics ofa catchment area. By contrast, a VET measurethat is bundled together with a package ofincentives, vocational guidance measures andqualifications will need to be evaluated in thiswider context.

A further dimension is the number of stake-holders. In any evaluation, there will be those whohave an interest in the evaluation and what isbeing evaluated. Within decentralised, multi-agency programmes there are often many stake-holders, each with their own evaluation questionsand judgement criteria. These might, for example,include regional authorities, training providers,sectoral representatives, social partners andEuropean institutions.


Finally, there is the degree of consensus. Poli-cies and programmes may be contentious and willbe supported by a greater or lesser degree ofconsensus among stakeholders. Numerous stake-holders are often associated with lower levels ofconsensus. Evaluations which draw a high level ofconsensus will be able easily to apply agreedcriteria. Where there is lower consensus, quitedifferent criteria may need to be applied to evalua-tion data and more work may need to be done tobring together different interests and perspectives.This not only shapes methodology but also thework required of evaluators.

While each of these characteristics or dimen-sions has consequences for the design of anevaluation and how it is organised, they alsointeract. For example, we can envisage twodifferent scenarios. In the first, a single subsidyis available to employers within firms in the retailsector to provide additional training to youngapprentices following a recognised nationalqualification. In the second, a package ofmeasures locally determined by partnerships ofcompanies, training providers and regionalauthorities is available to firms, colleges andprivate training providers, to improve the voca-

tional skills and work preparedness of the youngunemployed.

Within the first scenario it would be possibleand appropriate to assess success in terms of alimited range of output and outcome measuresand possibly to apply experimental and randomassignment techniques as part of the evaluationprocedure. Within this scenario there would belimited resources devoted to the evaluation of theprocesses of implementation. There is also likelyto be a limited number of stakeholders involved inthe evaluation.

Within the second scenario there will be a needfor several different measures of output andoutcomes. Comparisons across the programmewill be difficult to standardise given the diversityof modes of delivery and types of input or policyinstrument. There is also likely to be limitedconsensus among the many different stake-holders involved in the programme and its imple-mentation. The use of experimental methods (e.g.control groups and before and after measures)may be possible in such a configuration. It is alsolikely that case studies that illustrate the way allthe various dimensions come together will beappropriate.


4.1. Positivism, observation andtheory

Before addressing particular aspects of evaluationtheory it is important to locate the role of theory inevaluation within the broader set of debates withinthe philosophy of science. The dominant school,much criticised but of continuing influence in theway we understand the world, is logical positivism.Despite being largely discredited in academiccircles for some 50 years, this school still holdssway in policy debates. It constitutes the basemodel around which variants are positioned. Witha history that stretches back to Compte, Hume,Locke, Hobbes and Mill, positivism emerged partlyas a reaction to metaphysical explanations: thatthere was an ‘essence’ of a phenomenon thatcould be distinguished from its appearance. At theheart of positivism therefore, is a belief that it ispossible to obtain objective knowledge throughobservation and that such knowledge is verified bystatements about the circumstances in which suchknowledge is true.

In the field of evaluation, House (1983) hasdiscussed this tradition under the label of objec-tivism: ‘Evaluation information is considered to be“scientifically objective.” This objectivity isachieved by using “objective” instruments liketests or questionnaires. Presumably, resultsproduced with these instruments are reproducible.The data are analysed by quantitative techniqueswhich are also “objective” in the sense that theycan be verified by logical inspection regardless ofwho uses the techniques.’ (House, 1983; p. 51).

House goes on to emphasise that part ofobjectivist tradition that he calls ‘methodologicalindividualism’ in Mill’s work in particular. Thus,repeated observation of individual phenomena isthe way to identify uniformity within a category ofphenomena. This is one important strand in themainstream of explanations within the social andeconomic sciences. It is the basis for reduc-tionism: the belief that it is possible to understandthe whole by investigating its constituent parts.

‘By methodological individualism, I mean what-ever methodologically useful doctrine is assertedin the vague claim that social explanations should

be ultimately reducible to explanations in terms ofpeople’s beliefs, dispositions, and situations. […]It is a working doctrine of most economists, polit-ical scientists, and political historians in NorthAmerica and Britain.’ (Miller, 1991; p. 749).

In this world-view, explanations rest on theaggregation of individual elements and theirbehaviours and interactions. It is worth notingthat this has been described as a ‘doctrine’ aswell as a methodological statement. It underpinsmany of the survey based and economic modelsthat are used in evaluation.

There is now widespread agreement that empir-ical work cannot rely only on observations. Thereare difficulties empirically observing the entirety ofany phenomena; all description is partial andincomplete, with important unobservableelements. ‘Scientists must be understood asengaged in a metaphysical project whose veryrules are irretrievably determined by theoreticalconceptions regarding largely unobservablephenomena.’ (Boyd, 1991; p. 12). This is evenmore true for mechanisms which it is generallyrecognised can be imputed but not observed. AsBoyd goes on to say, ‘it is an important fact, nowuniversally accepted, that many or all of the centralmethods of science are theory dependent’.

This recognition of the theory dependence ofall scientific inquiry underpins the now familiarcritiques of logical positivism, even though thereis considerable difference between the alterna-tives that the critics of positivism advocate.

The two most familiar critiques of positivismare scientific realism and constructivism.

4.2. Scientific realism

Scientific realism, while acknowledging the limitsof what we can know about phenomena, assertsthat theory describes real features of a not fullyobservable world. Not all realists are the sameand the European tradition currently beinginspired mainly by the work of Pawson (Pawsonand Tilley, 1997; Pawson, 2002a and b) can bedistinguished in various ways from US realist

4. Philosophical foundations

thinking. For example, some prominent NorthAmerican realists commenting on Pawson andTilley’s work have questioned the extent to whichrealists need completely to reject experimentaland quasi-experimental designs, and suggestthat more attention should be paid in the realistproject to values. This is especially important if, inaddition to explanation, realists are to influencedecisions (Julnes et al., 1998). Nonetheless, thischapter draws mainly on the work of Pawson andTilley to describe the realist position in evaluation.

In some ways realism continues the positivistproject: it too seeks explanation and believes inthe possibility of accumulating reliable knowledgeabout the real world, albeit through differentmethodological spectacles. According to Pawsonand Tilley, it seeks to open the ‘black-box’ withinprogrammes or policies to uncover the mecha-nisms that account for what brings about change.It does so by situating such mechanisms incontexts and attributing to contexts the key towhat makes mechanisms work or not work. Thisis especially important in domains such as VETwhere the evaluation objects are varied anddrawn from different elements into differentconfigurations in differentiated contexts.

‘What we want to resist here is the notion thatprograms are targeted at subjects and that as aconsequence program efficacy is simply a matterof changing the individual subject.’ (Pawson andTilley, 1997; p. 64).

Rather than accept a logic that seesprogrammes and policies as simple chains of causeand effect, they are better seen as embedded inmultilayered (or stratified) social and organisationalprocesses. Evaluators need to focus on ‘underlyingmechanisms’: those decisions or actions that leadto change, which is embedded in a broader socialreality. However these mechanisms are not uniformor consistent even within a single programme.Different mechanisms come into play in differentcontexts, which is why some programmes or policyinstruments work in some, but not all, situations.

Like all those interested in causal inference,realists are also interested in making sense ofpatterns or regularities. These are not seen at thelevel of some programme level aggregation butrather at the underlying level where mechanismsoperate. As Pawson and Tilley (1997; p. 71) note:‘regularity = mechanism + context’. Outcomesare the results of mechanisms unleashed by

particular programmes. It is the mechanisms thatbring about change and any programme willprobably rely on more than one mechanism, notall of which may be evident to programme archi-tects or policy-makers.

As Pawson and Tilley summarise the logic ofrealist explanation: ‘The basic task of social inquiryis to explain interesting, puzzling, socially signifi-cant regularities (R). Explanation takes the form ofpositing some underlying mechanism (M) whichgenerates the regularity and thus consists ofpropositions about how the interplay betweenstructure and agency has constituted the regu-larity. Within realist investigation there is also inves-tigation of how the workings of such mechanismsare contingent and conditional, and thus only firedin particular local, historical or institutional contexts(C)’ (Pawson and Tilley, 1997; p. 71).

Applying this logic to VET, we may note, forexample, that subsidies to increase work-basedlearning and CVT in firms sometimes lead togreater uptake by the intended beneficiaries. Thisneed not lead to the assessment of the programmeas ineffective because, for example, positiveoutcomes can only be observed in 30 % of cases.We try rather to understand the mechanisms andcontexts which lead to success. Is the context onewhere firms showing positive outcomes are in aparticular sector or value chain or type of region?Or is it more to do with the skill composition of thefirms concerned? Are the mechanisms that work inthese contexts effective because a previous invest-ment has been made in work-based learning at thefirm level or is it because of the local or regionaltraining infrastructure? Which mechanisms are atplay and in what context:(a) the competitive instincts of managers (mech-

anism), who fear that their competitors willbenefit (context) unless they also increasetheir CVT efforts?

(b) the demands of trade unions concerned aboutthe professionalisation and labour-marketstrength of their members (mechanism),sparked off by their awareness of the avail-ability of subsidies (context)?

(c) the increased effectiveness of the marketingefforts of training providers (mechanism) madepossible by the subsidies they have received(context)?

According to the realists, it is by examining andcomparing the mechanisms and contexts in which


4.3. Constructivists

Constructivists deny the possibility of objectiveknowledge about the world. They follow more inthe tradition of Kant and other continental Euro-pean philosophers than the mainly Anglo Saxonschool that underpins positivism and realism. It isonly through the theorisations of the observerthat the world can be understood.

‘Socially constructed causal and metaphysicalphenomena are, according to the constructivist,

real. They are as real as anything scientists canstudy ever gets. The impression that there issome sort of socially unconstructed reality that issomehow deeper than the socially constructedvariety rests, the constructivist maintains, on afailure to appreciate the theory-dependence of allour methods. The only sort of reality any of ourmethods are good for studying is a theory-dependent reality.’ (Boyd, 1991; p. 13).

The way we know, whatever the instrumentsand methods we use, is constructed by human


they operate in relation to observed outcomes, thatit becomes possible to understand success anddescribe it. For Pawson and Tilley, all revolvesaround these CMO (context, mechanism, outcome)configurations.

Policy-makers are then in a position toconsider options such as:(a) focusing the programme more narrowly at

beneficiaries that are likely to changebecause of the mechanisms that work in thecontexts they inhabit;

(b) differentiating a programme and its instru-ments more clearly to ensure that differentmechanisms that work in different contextsare adequately covered;

(c) seeking to influence the contexts within whichthe programme aims to be effective.

The table below, taken from the concludingchapter of Pawson and Tilley’s book, provides abrief summary of the realist position, in terms ofeight ‘rules’ that are seen as encapsulating thekey ideas of realistic enquiry and method.

Table 3: Rules guiding realistic enquiry and method

Rule 1: Generative causation Evaluators need to attend to how and why social programmes have the potential to cause change

Rule 2: Ontological depthEvaluators need to penetrate beneath the surface of observable inputs and outputs of a programme

Rule 3: MechanismsEvaluators need to focus on how the causal mechanisms which generate social and behavioural problems are removed or countered through the alternative causal mechanisms introduced in a social programme

Rule 4: ContextsEvaluators need to understand the contexts within which problem mechanisms are activated and in which programme mechanisms can be successfully fired

Rule 5: OutcomesEvaluators need to understand what are the outcomes of an initiative and how they are produced

Rule 6: CMO configurationsIn order to develop transferable and cumulative lessons from research, evaluators need to orient their thinking to context-mechanism-outcome pattern configurations (CMO configurations)

Rule 7: Teacher-learner processesIn order to construct and test context-mechanism-outcome pattern explanations, evaluators need to engage in a teacher-learner relationship with program policy-makers, practitioners and participants

Rule 8: Open systemsEvaluators need to acknowledge that programmes are implemented in a changing and permeable social world, and that programme effectiveness may thus be subverted or enhanced through the unanticipated intrusion of new contexts and new causal powersAdapted from Pawson and Tilley (1997)

Source: Adapted from Pawson and Tilley (1997)

actors or stakeholders. According to Stufflebeamin his review of Foundation models for 21stcentury program evaluation: ‘Constructivismrejects the existence of any ultimate reality andemploys a subjectivist epistemology. It seesknowledge gained as one or more humanconstructions, uncertifiable, and constantly prob-lematic and changing. It places the evaluators andprogram stakeholder at the centre of the inquiryprocess, employing all of them as the evaluation’s“human instruments”. The approach insists that

evaluators be totally ethical in respecting andadvocating for all the participants, especially thedisenfranchised.’ (Stufflebeam, 2000a; pp. 71-72).

The most articulate advocates of construc-tivism in evaluation are Guba and Lincoln. Theyhave mapped out the main differences betweenconstructivists and the ‘conventional’ position (asthey label positivists) in their well-known textFourth generation evaluation (Guba and Lincoln,1989). The highlights of this comparison issummarised in the table below:


Table 4: Comparing constructivist and ‘conventional’ evaluation

Conventional Constructivist

The truth of any proposition (its credibility) can be determined by submitting it semiotically to thejudgement of a group of informed

The truth of any proposition (its factual and sophisticated holders of what Nature of truth quality) can be determined by testing it may be different constructions.

empirically in the natural world. Any Any proposition that has achieved proposition that has withstood such consensus through such a test is a test is true; such truth is absolute regarded as true until reconstructed

in the light of more information or increased sophistication; any truth is relative.

A proposition that has not been tested A proposition is neither tested empirically cannot be known to be true. nor untested. It can only be known

Limits of truth Likewise, a proposition incapable of to be true (credible) in relation empirical test can never be confirmed to and in terms of informed to be true. and sophisticated constructions.

Constructions exist only in the minds of constructors and typically cannot

Whatever exists in some measurable be divided into measurable entities. Measurability amount. If it cannot be measured If something can be measured,

it does not exist. the measurement may fit into some constructions but it is likely, at best, to play a supportive role.

Facts are aspects of the natural world Facts are always theory-laden, that is, Independence that do not depend on theories that they have no independent meaning of facts and theories happen to guide any given inquiry. except within some theoretical framework.

Observational and theoretical There can be no separate observational languages are independent. and theoretical languages.

Facts and values are independent.Facts can be uncovered and arrayed independently of the values Facts and values are interdependent.

Independence that may later be brought to bear to Facts have no meaning except within of facts and values interpret or give meaning to them. some value framework; they are

There are separate factual and value-laden. There can be no separate valuational languages, the former observational and valuational languages.describing ‘isness’ and the latter ‘oughtness’.

Source: adapted from Guba and Lincoln, 1989

According to Guba and Lincoln, when consid-ering the purpose of evaluations, one needs todistinguish both between merit and worth andbetween summative and formative intent:(a) a formative merit evaluation is one concerned

with assessing the intrinsic value of someevaluand with the intent of improving it; so,for example, a proposed new curriculumcould be assessed for modernity, integrity,continuity, sequence, and so on, for the sakeof discovering ways in which those character-istics might be improved;

(b) a formative worth evaluation is oneconcerned with assessing the extrinsic valueof some evaluand with the intent of improvingit; so, for example, a proposed newcurriculum could be assessed for the extentto which desired outcomes are produced insome actual context of application, for thesake of discovering ways in which its perfor-mance might be improved;

(c) a summative merit evaluation is oneconcerned with assessing the intrinsic valueof some evaluand with the intent of deter-mining whether it meets some minimal (ornormative or optimal) standard for modernity,integrity, and so on. A positive evaluationresults in the evaluand being warranted asmeeting its internal design specifications;

(d) a summative worth evaluation is oneconcerned with assessing the extrinsic valueof some evaluand for use in some actualcontext of application. A positive evaluationresults in the evaluand being warranted foruse in that context. (Guba and Lincoln, 1989;pp. 189-190).

In practical terms, what it is that the evaluatorshould do, Guba and Lincoln start from the‘claims, concerns and issues’ that are identifiedby stakeholders, people ‘who are put at somerisk by the evaluation’. It is therefore necessaryfor evaluators to be ‘responsive’. ‘One of themajor tasks for the evaluator is to conduct theevaluation in such a way that each group mustconfront and deal with the constructions of allothers, a process we shall refer to as hermeneuticdialectic. […] Ideally responsive evaluation seeksto reach consensus on all claims, concerns andissues […]’ (Guba and Lincoln, 1989; p. 41).

A distinctive role of the evaluator, therefore, isto help put together ‘hermeneutic’ circles. This is

defined by Guba and Lincoln as a process thatbrings together divergent views and seeks tointerpret and synthesise them mainly to ‘allowtheir mutual exploration by all parties’ (Guba andLincoln, 1989; p. 149). As Schwandt has arguedfrom a postmodernist standpoint, ‘only throughsituated use in discursive practices or languagegames do human actions acquire meaning’(Schwandt, 1997; p. 69). Applied to evaluation,this position argues for the importance of the‘dialogic encounter’ in which evaluators are‘becoming partners in an ethically informed,reasoned conversation about essentiallycontested concepts […]’ (Schwandt, 1997; p. 79).

In more down to earth terms, Guba andLincoln emphasise the role of the evaluator to:(a) prioritise those unresolved claims, concerns

and issues of stakeholders that have survivedearlier rounds of dialogue orchestrated by theevaluator;

(b) collect information through a variety of means– collating the results of other evaluations,reanalysing the information previously gener-ated in dialogue among stakeholders,conducting further studies – that may lead tothe ‘reconstruction’ of understandings amongstakeholders;

(c) prepare and carry out negotiations that, as faras possible and within the resources avail-able, resolve that which can be resolved and(possibly) identify new issues that the stake-holders wish to take further in another evalu-ation round.

So how might this be exemplified in the VETdomain? It should be noted that what followsdoes not fully conform to Guba and Lincoln’svision of constructivist evaluation, largelybecause it is situated in a larger scale socioeco-nomic policy context than many of their ownsmaller scale case examples. But also it shouldbe noted that constructivist thinking is, to someextent, relevant to many contemporary evaluationchallenges and the example below is intended toillustrate such potential relevance.

So, to apply this logic to VET, constructivistthinking can be especially helpful where there is aproblem area with many stakeholders and theentire system will only be able to progress if thereis a broad consensus. For example, there may bea political desire to become more inclusive andinvolve previously marginalised groups in training


opportunities. The problem is how to ensure thatcertain groups such as women, young peopleand ethnic communities are given a higher profilein VET. Here, the involvement of many stake-holders will be inevitable. Furthermore the viewsof these stakeholders are more than data for theevaluator: they are the determinants and shapersof possible action and change. Unless thetrainers, employers, advocacy groups, fundingauthorities and employment services responsiblefor job-matching and the groups being ‘targeted’cooperate, change will not occur. It is also likelythat these stakeholders hold vital information andinsights into the past experience of similar efforts;what went wrong and right and what could bedone to bring about improvements in the future.

The evaluator might then follow much of theconstructivist logic outlined above:(a) identify the different stakeholders who poten-

tially have a stake in these areas of concern;(b) conduct a series of initial discussions to

clarify what they know, what they want andwhat are their interests;

(c) feed back to all stakeholders their own andeach other’s interests, knowledge andconcerns in a way that emphasis the similari-ties and differences;

(d) clarify areas of agreement and disagreementand initiate discussions among the stake-holders and their representatives to clarifyareas of consensus and continuing dissent;

(e) agree what other sources of informationcould help move the stakeholders forward –perhaps by synthesising other availablestudies, perhaps by initiating new studies;

(f) reach the best possible consensus about whatshould be done to improve VET provision andparticipation for the groups concerned.

It is worth highlighting that the balance ofactivities within constructivist evaluation is verydifferent from both positivist and realist variants.It emphasises the responsive, interactive, dialogicand ‘orchestrating’ role of the evaluator becausethe sources of data that are privileged are seen toreside with stakeholders, as much as with newstudies and externally generated data.


There has been a strong bias within evaluation asit has evolved to focus on method, techniqueand, to a lesser extent, methodology. It is only inrecent years that there has been an upsurge ininterest in the role of theory in evaluation. Tosome extent, this reflects the wider debates fromwithin the philosophy of science that has beensketched out above. From the early 1990sonwards there has been a re-balancing of attention towards theory. Chen’s book, Theory-drivenevaluations (1990), has become a landmark in thisshift in focus towards theory. The now classictext Foundations of evaluation (Shadish et al.,1991) is organised around five main bodies oftheory: social programming, knowledge construc-tion, valuing, knowledge use and evaluation practice.

As it has been widely recognised, Weiss wasamong the first to direct our attention to theimportance of theory (Weiss, 1972) and hasactively carried forward this debate under theumbrella of the Aspen Institute’s New approachesto evaluating Community initiatives (Connel et al.,1995; Fulbright-Anderson et al., 1998). While thestarting point of the discussion that follows isthese authors, it continues to need to be situatedin the broader philosophical debates outlinedearlier.

5.1. Programme theory

The dominant school of theory in evaluation is‘programme theory’. This is concerned withopening up the programme ‘black-box’, goingbeyond input/output descriptions and seeking tounderstand how programmes do and do not work.

Chen’s conceptualisation distinguishes‘normative’ and ‘causative’ components ofprogramme theory which he defines as ‘a specifi-cation of what must be done to achieve thedesired goals, what other important impacts mayalso be anticipated and how these goals andimpacts would be generated’ (Chen, 1990; p. 43).

Chen’s conceptualisation extends to what heidentifies as ‘six domains’.

‘The following three domain theories are part ofthe general normative theory: 1) treatment theoryspecifies what the nature of the program treat-ment should be; 2) implementation environmenttheory specifies the nature of the contextual envi-ronment within which the program should beimplemented; 3) outcomes theory specifies whatthe nature of the program outcomes should be.

The following three domain theories are relatedto the general causative theory: 1) impact theoryspecifies the causal effect between the treatmentand the outcome; 2) intervening mechanismtheory specifies how the underlying interveningprocesses operate; 3) generalization theory spec-ifies the generalizability of evaluation results tothe topics or circumstances of interest to stake-holders.’ (Chen, 1990; pp. 49 and 51)

Although avowedly seeking to escape from thelimitation of input/output thinking, Chen’sconceptualisation is still linear. His domainsfollow the treatment/implementation/outcomelogic and incorporate concepts such as inter-vening mechanisms: ‘the causal processesunderlying a program so that the reasons aprogramme does or does not work can be under-stood’ (Chen, 1990; p. 191).

The underlying logic of causality favoured byChen is essentially consistent with classic experi-mental thinking. For example, with regard to aprogramme that used comic books to influenceadolescent smoking: ‘The underlying causalmechanism of this program is the assumption thatthe comic book will attract adolescents’ interestand attention and that they will read it closely andfrequently, and thereby pick up the important anti-smoking message contained in it. The messagewill in turn change their attitudes, beliefs, andbehaviour regarding smoking. The causal structureof this program is that the program treatment vari-able (exposure to the comic book) attempts toaffect the intervening variable (the intensity ofreading), which in turn will affect the outcome vari-ables (attitudes, beliefs, and behaviour towardsmoking)’ (Chen, 1990, p. 193).

In comparison to realist evaluation approaches(see above), there continues to be an emphasis

5. Evaluation theory

on the programme’s interventions rather than onthe mechanism operating within the context. Forexample, if we try to answer the question ‘Whyare some adolescents more likely to be influ-enced by such exposure?’, answers do not fallout easily from traditional programme theorylogic. It is these ‘underlying mechanisms’ that arenot explained in this framework. Nor are thecontexts within which these interventions occurexplored in detail.

5.2. Theory based evaluation andtheories of change

The other main strand of theory in evaluation islabelled ‘theory based evaluation’ and latterly‘theory of change’ and is associated with theAspen round table on comprehensive communityinitiatives. In Volume 1 of the Aspen collection,Weiss identifies four main rationales for theorybased evaluation: ‘1) It concentrates evaluationattention and resources on key aspects of theprogram; 2) it facilitates aggregation of evaluationresults into a broader base of theoretical andprogram knowledge; 3) it asks program practi-tioners to make their assumptions explicit and toreach consensus with their colleagues aboutwhat they are trying to do and why; 4) evaluationsthat address the theoretical assumptionsembedded in programs may have more influenceon both policy and popular opinion.’ (Weiss,1995; p. 69).

This is also, in essence, a programme theoryapproach. In a subsequent volume, Connel andKubisch define this evaluation approach ‘as asystematic and cumulative study of the linksbetween activities, outcomes, and contexts of theinitiative.’ (Fulbright-Anderson et al., 1998; p. 16).They emphasise collaborative working withstakeholders to bring to the surface underlyingmechanisms.

Weiss herself takes these ideas of joint workingfurther: ‘Three big advantages for pursuing atheory of change evaluation are as follows:(a) a theory of change evaluation allows evalua-

tors to give early word of events withouthaving to wait until the end of the wholeprogram sequence;

(b) the evaluators can identify which assump-tions are working out and which are not. They

can pinpoint where in the theory the assump-tions break down. This should enable theprogram to take corrective action before toomuch time goes by;

(c) The results of a theory of change evaluationcan be more readily generalized acrossprograms. Seeing the successes and the failures between closely linked assumptions,such as between greater parental attention tochildren and better child behaviour, is easier thanbetween say, parenting education programs and better child behaviour.’ (Weiss, 2000).

By focusing on the assumptions of ‘programmepractitioners’ and aspiring to encourageconsensus among them, this vision of evaluationtheory shares many features with the participativeand even constructivist schools of evaluation. Asdescribed by some of its main proponents, the‘theory of change’ evaluation approach – as it hascome to be called – is also a collaborative, dialogicprocess. It takes the themes of practitioners,makes them coherent, explicit and testable andthen seeks to measure and describe programmeoutcomes in these terms.

Overall, over the last seven or eight years inparticular, there has been a gradual blurring ofwhat is meant by programme theory. The termnow seems to encompass several approachesthat unpick the logic of programmes, makeexplicit their assumptions, work with stake-holders, monitor progress and explain theoutcomes that are observed (Rogers, 2000 as anexample of this tendency). As such, it has cometo include what is now classic programme theorywith theory based evaluations and realistic evalu-ations such as those advocated by Pawson andTilley (Rogers, 2000; p. 219).

The programme theory approach has alsobeen taken on board by those who advocatelogic models or logical frameworks that linkoutcomes with programme activities andprocesses. Thus, a recent W.K. Kellogg Founda-tion guide (2000) also makes the link with ‘theo-retical assumptions/principles of the programme’and devotes an entire chapter to ‘developing atheory of change logic model for yourprogramme’. This is an important developmentgiven the power of logic models in the world ofevaluation. These were initiated by the WorldBank and taken on also by the EU. One of themain criticisms of these models is their lack of


explanatory power and a-theoretical nature.Bringing theory-based approaches into the logicmodel framework begins to address some ofthese criticisms.

5.3. A wider theoretical frame

However, we would not wish to confine descrip-tions of theory solely to programme theory andassociated elaborations. The Shadish et al. (1991)framework includes a theory of social program-ming. However, their focus is more on a theory ofwhat social programmes do and how effectivethey are. This is consistent with their overallapproach which is to elaborate the theoreticalbasis for evaluation practice. Other theoreticalfocuses for Shadish et al. concern: (a) the theory of use, i.e. what is known about

how to encourage use;(b) the theory of valuing, i.e. about judging

outcomes and the role of values and stake-holder interests in such judgements;

(c) the theory of knowledge in evaluation, i.e. thefamiliar questions of what constitutes knowl-edge, explanation and valid method;

(d) the theory of practice in evaluation, i.e. themain decisions about resource allocation, thechoice of methods, questions to ask andevaluation purposes.

Beyond the various focuses identified – that isprogramme theory and its various elaborations –and theories of evaluation itself, as articulated byShadish et al., there are a number of other bodiesof theory that are undoubtedly relevant and comeinto the discourse of evaluators. In particular,there are domain theories and implementationtheories and change.

In every policy domain or field where evalua-tion occurs, there are bodies of theory unrelatedto the practice of evaluation and to the logic ofprogrammes. Thus, in social welfare, there aretheories related to the welfare state, the nature ofsocial solidarity, the behavioural consequences ofdifferent benefit regimes and the interactionsbetween social welfare and labour-market perfor-mance. Similar bodies of theory exist in relationto VET as they do in other domains such asresearch and development, regional planning,education and criminal justice. In the Europeancontext, at least, there seems to be an expecta-

tion that evaluators will have some knowledge ofthe domain contexts within which they work,including relevant domain theories.

There are also substantial bodies of relevanttheory about policy change and implementationderiving mainly from political science and policystudies. For example this extends beyond Chen’sdescription of implementation environment evalu-ation (Chen, 1990). It is well exemplified by thework of people such as Sabatier (1988) andSabatier and Jenkins-Smith (1993). There is alsomore generic literature on implementation andchange, often encapsulated under the heading‘the diffusion of innovation’ and following on fromthe work of Rogers (1995). This is particularlyrelevant to the issue of generalisability of innova-tions that are first broached on pilot basis.

In summary, five bodies of theory appear to berelevant to evaluators: (a) theories of evaluation. These would include

programme theory; theories of changeapproaches and realist approaches whichemphasise the identification of mechanismsunderlying successful change which have tobe understood in specific contexts andsettings;

(b) theories about evaluation. Thus there is agrowing literature on evaluation practice, use,design and capacity. Included in this categorywould be particular aspects of practice iden-tified by Shadish et al. such as theories ofvaluing;

(c) theories of knowledge, including the maindebates about the nature of knowledge, epis-temology, methodology, etc., and about thenature of causal inference;

(d) domain and thematic theories, which couldbe described as theory of the evaluationobject. This would include bodies of theoryabout domains such as human resourcedevelopment, skill acquisition, the develop-ment of human capital and equal opportuni-ties that could inform evaluation design,programme/policy implementation andoutcomes;

(e) theories of implementation and change oftenseen as relevant by evaluators. We wouldinclude here understandings of policychange, the diffusion of innovation andadministrative behaviour. Such bodies oftheory are likely to condition the success of


programme interventions and can be quiteseparate from the kind of programme theoriesreferred to above.

Finally, it is worth restating the main reasonsthat theory is seen as important in evaluation:(a) theory can help support interpretations. This

follows from the widespread recognition thatall evaluations are based on data that can beinterpreted in different ways. Theory providesan explicit framework for such an interpreta-tion;

(b) theory can help fill in the gaps in incompletedata. This follows from the recognition thathowever thorough evaluators may be, theywill never have the complete picture. Theorycan provide a plausible way of filling gaps inavailable evaluation data;

(c) theory can provide a framework to work withstakeholders. This follows from the increas-ingly common practice of dialogue andcollaboration between evaluators, presumedbeneficiaries and others who are affected byprogrammes and policy instruments;

(d) theory can help prediction and explanation.This is the classic scientific role of theory: tosuggest and explain causal links and likelyoutcomes;

(e) theory can make explicit the constructedobjects of evaluation. Many contemporary

objects of evaluation are constructed. Theyare abstracted ideas which do not have adirect empirical referent. (e.g. efficiency, alearning organisation, an enterprise culturewould all be examples of constructedobjects.) In order to describe and measurethese objects, theory is needed.

At a different level, we are also beginning tosee theoretical development around the issue ofcomplexity in socioeconomic programmes(Sanderson, 2000). Many interventions are notself-contained, they interact with otherprogrammes and with other social and organisa-tional processes. Thus in VET, a new trainingsystem is embedded in an institutional andeducational context which supports andconstrains this system. Similarly, a training initia-tive at the level of a particular work group ismediated by the way work is organised, differentmanagement styles and labour-market behaviourof employers and workers. Often there aremultiple programmes and interventions operatingsimultaneously on a particular target group orarea. The interaction of these various processesand programmes is one of the greatest chal-lenges that evaluators face. Complexity theory isa very new entrant into evaluation thinking. Ques-tions are raised more often than answers areprovided.


Evaluation types and philosophies are shaped bycontexts of use. This term encompasses bothfactors that encourage use and what is some-times called ‘evaluation capacity development’.These contexts and capacities are not onlyconceived of differently by different scholars butalso change as context changes. It is these kindsof use and capacity considerations that ultimatelydecide how evaluation contributes to policy-making and the delivery of public policies.

6.1. Instrumental versuscumulative use

There has been debate about the use of evalua-tion for over 25 years. This is primarily associatedwith two main scholars: Weiss (1976) and Patton(1997, 2002 and earlier editions). In themid-1970s Weiss began to focus on evaluationuse, criticising simplistic notions of instrumentaluse. Based on empirical studies of howpolicy-makers use evaluation, Weiss has beenassociated with a complex understanding ofdecision-making and policy-making in whichevaluation findings are internalised, selectivelyused and rarely lead directly to specific decisionsor changes in policy. This theory, which is some-times labelled ‘an enlightenment view of evalua-tion’, takes a long-term incremental view aboutthe way evaluation findings feed through topolicy-making. In this regard, she is close toRossi’s understanding of the conceptual use ofevaluation i.e. ‘the use of evaluations to influencethinking about issues in a general way’ (Rossiet al., 1999). She challenges the rational model ofdecision-making and bases her conclusions onstudies of how organisations actually work. Sheeffectively argues for cumulative learning acrossmany evaluations rather than direct use fromparticular evaluations. In some of her more recentwork (Weiss, 1999) she associates herself witharguments of Sabatier and Jenkins-Smith (1993)about the importance of policy forums that bringtogether academics and policy-makers and can

act as a vehicle for the absorption of evaluationfindings. She nonetheless remains true to herearlier arguments that evaluators should neverexpect their inputs to override political agendasand administrative necessities which may pushdecisions in quite different directions from theirrecommendations.

Patton is more committed to the instrumentalpurposes of evaluation. ‘By utilization I meanintended use by intended users’ (Patton 1997,2002 and earlier editions). He places consider-able emphasis on understanding the priorities ofdecision-makers, engaging with them, encour-aging them to own evaluations and their results inorder to enhance use. Furthermore, Patton tendsto emphasise the example of individualdecision-makers, particular people and not thedecision itself. Indeed he is interested in indi-vidual ‘decision-makers’ cognitive style andlogic’. Unlike Weiss, he is less interested inorganisational and administrative processes andmore with the act of decision and the particulardecision-maker.

It has been commonly observed (e.g. Alkin,1990) that an important determinant of the differ-ence in perspective between Weiss and Patton istheir respective fields of practice and study.Patton has worked mainly in local community andvoluntary organisations; where he has worked atpublic administrations they have also have beenat a local level. Weiss has been more occupiedwith large-scale national programmes (beginningfor example with the 1970s’ war on poverty inthe US). This probably goes a long way toexplaining the one’s preoccupation with complexorganisational processes and the other’s preoc-cupation with individual decision-makers andtheir decisions.

6.2. Process use of evaluation

Although the above summarises, in a simplifiedway, the main debate between Weiss and Patton,both have contributed far more to our under-standing of evaluation use. In particular Patton is

6. Determinants of evaluation use

probably the originator of the term ‘process use’,i.e. ‘changes […] that occur among thoseinvolved in evaluation as a result of learning thatoccurs during the evaluation process’ (Patton,1997). Considerations of the process aspects ofevaluation open up discussions also of different,more action orientated, views of evaluationpurpose. For example, Patton also discussesintervention orientated evaluation, participatoryevaluation and empowerment evaluation. Thesekinds of evaluation approaches are less relevantto the immediate concerns of this particularstudy, but do raise interesting methodologicalquestions about how, in the course of evalua-tions, one can increase commitment and owner-ship by attending to the details of the evaluationprocess.

6.3. The importance ofmethodology

Weiss, especially in her early work, opens upanother aspect of evaluation use, the importanceof sound methodology and reliable data (Weissand Bucuvalas, 1980). Indeed in some of thisearly work Weiss anticipates some very contem-porary discussions about evidence-based policymaking and the use of experimental methods. Itshould be acknowledged, however, that in lateryears as she investigated more carefully theactual uses made of evaluation, she shiftedtowards the position outlined above, that we hadto understand more about use in an organisa-tional context. In her concern for sound methodsand valid research, Weiss is more ‘hands on’ thanmany of her contemporaries: Campbell, forexample, sees use as being essentially part ofpolitics and of little concern for evaluators whoshould be driven by ‘truth’.

In many ways we can perceive a pendulumswing, between organisational and implementa-tion concerns on the one hand and methodolog-ical and quality assurance concerns on the other,back and forth over the last 25 years. The currentinterest in evidence-based policy, the rediscoveryof ‘the gold standard’ of experimental methodsand randomised control trials (RCTs) and the‘realistic critique’ of such trials and experiments,represents a renewed movement toward themethodological end of the pendulum. Thus the

‘What works?’ school (Davies et al., 2000) ispre-eminently concerned with the nature ofresearch evidence and the need for rigorousmethodology. The adherents of this school oftenregard RCTs as the most reliable form of primaryevidence and systematic reviews and meta-anal-ysis – which make quantitative estimates of theoverall effects of interventions by aggregatingprimary studies – as the most reliable approachfor secondary evidence. However, already inDavies et al. (2000) the editors of this collectionthemselves begin to address questions of betterunderstanding the policy process. This is takenfurther by Nutley et al. (2003) when they movebeyond data and findings to consider howevidence is disseminated and incorporated intopractice. Thus these authors also follow thependulum between methodological and imple-mentation concerns.

However, the central components of theevidence-based policy debate remain method-ological, as in the debate around meta-analysisand systematic reviews (Gallo, 1978; Pawson,2002a). The core of this debate is the importanceof drawing together evidence from previousrelated initiatives before embarking on new initia-tives. For some, narrative reviews that are essen-tially structured qualitative literature reviewsdrawing together lessons, is their preferredapproach. Others seek to quantify precise effectsby looking to quantitative and experimentalstudies along the lines of RCTs in medical drugtrials. Of course, the possibility of undertakingsuch quantitative studies is limited to a subset ofpolicy areas usually in the social welfare, socialbenefit and welfare to work areas. Despite themethodological disagreements among the propo-nents of different approaches to such secondaryanalysis, there is a widespread consensus thatbringing together evidence from across manyevaluations is an important guarantor of the relia-bility of findings.

6.4. Evaluation and learning

The most recent attempt to revisit the issue ofevaluation use in a comprehensive fashion, is apublication of the American Evaluation Associa-tion (AEA), The expanding scope of evaluationuse (Caracelli and Preskill, 2000).


In Chapter 2, Preskill and Torres consider ‘thelearning dimension of evaluation use’. Theseauthors highlight, in particular, learning at anorganisational level that occurs as part of theevaluation process. They draw on constructivisttheories of learning, in which learners themselvesengage in an active process of interpretation anddialogue in order to construct their own mean-ings. They link this perspective with Lave andWenger’s (1991) notions of ‘communities of prac-tice’, i.e. groups of individuals working together,who are interdependent and whose tacit knowl-edge and problem-solving capacities are inte-grated into their social and professional life. Theauthors suggest that learning from evaluations‘will most likely occur when evaluation is collabo-rative, is grounded in constructivist and transfor-mational learning theories, and builds communi-ties of evaluation practice.’ The implication of thisargument for evaluation use is to reinforce theimportance of developing communities of bothevaluators and users within organisations, if eval-uation is to become part of an active learningprocess. It also steers us away from the narrowview of intended use that certainly informed theearlier work of Patton. Given that communities ofpractice will interpret and construct their ownmeanings using data and findings that they bringinto their own context, the use of evaluation maynot be as evaluators or the commissioners ofevaluation originally intended. However, trans-forming such evaluative outputs and processesinto their own organisational context still consti-tutes evaluation use.

6.5. The institutionalisation ofevaluation

There is extensive literature around the role ofevaluation in policy systems. This literature islargely based in policy analysis and public admin-istration rather than in evaluation per se. Thepreoccupations of this body of literature can beseen as mainly about evaluation capacity,including: organisational learning in the publicsector; how to build evaluation into public admin-istrations; the nature of effective evaluationcapacity; institutionalising and making evaluationmore professional; and analyses of the policyprocess itself. This latter set of preoccupations is

the closest ‘cousin’ to the earlier discussion ofevaluation use. However, within this literature thestarting point is the nature of the policy processwhich then moves on to considerations of evalu-ation practice. Within the earlier cited literature,the starting point is usually evaluation practice,which then moves on to considerations ofpolicy-making.

Organisational learning has become a commonmetaphor within studies of various kinds oforganisations, although most of this work hascentred on the private sector. Leeuw et al. (1994)bring together a range of experiences from thepublic sector across Canada, the Netherlands,Norway, Sweden and the US that seeks to applythe concept of organisational learning throughevaluation to public sector institutions. Amongthe themes highlighted in this study is the role ofinternal evaluations within public bodies (seeespecially chapters by Sonnichsen and byMayne). The authors highlight the important rolethat can be performed by internal evaluationoffices and how these can influence the overallorganisations’ evaluation practice and learning.They suggest that developing a ‘double-looplearning’ process – reporting not only toprogramme managers but also to top manage-ment within government departments – as animportant part of the contribution that evaluationunits can make. However, these studies alsosuggest that, while internal units can make animportant contribution to organisational learning,they depend ultimately on what Sonnichsen calls‘a disposition towards critical self-examination’.His notion that ‘self-reflection is crucial beforeorganisations begin to learn’ highlights the impor-tance of creating a general evaluation culturewithin an organisation.

In the concluding chapter of this collection, Ristidentifies two sets of preconditions for learningfrom evaluation. The first set arises from theimportance of fitting in with the policy cycle. Thisrecognises that organisations need ‘information atdifferent phases of the policy cycle’. Synchronisingevaluation outputs with different policy needs overtime is seen as an important means of encour-aging learning. The second set of preconditions forlearning emphasises how information is trans-mitted and filtered within public organisations.Thus studies within this framework suggest that‘governmental organisations appear more recep-


tive to information produced internally than thatwhich comes from external sources’. This is espe-cially so when the news is bad! Bad news is easierto receive from internal rather than external evalu-ators. Another precondition appears to be thecredibility of the sources of information. This may,on occasion, favour internally generated evaluationfindings, but may also depend on who are thesponsors or gatekeepers who bring information into a public body. This can be seen as related towider issues of relationship-building and trust. Theexistence of such organisational attributes seemsan important precondition for receptivity to evalua-tion findings and to organisational learning withinthe public sector.

Another collection we have considered,focuses directly on evaluation capacity (Boyleand Lemaire, 1999). Although some of theauthors in this volume overlap with the Leeuwet al. collection referred to above, the concernshere are broader. Building evaluation capacity isseen in terms of ‘national evaluation systems’.Thus, evaluation capacity goes beyond theinternal organisation of public bodies to includethe location of evaluation in the executive or thelegislature (e.g. Parliament) and broader issues ofgovernance and institutional arrangements. Forour purposes a number of themes explored inthis collection are particularly relevant.

One such theme is the design of evaluationfunctions and offices. Sonnichsen considers theadvantages and disadvantages of a centralisedand a decentralised model of evaluation func-tions. He notes the potential for greater indepen-dence and credibility of centralised functions andtheir ability to develop strategic evaluation plans.This model has been favoured in Canada (forexample) since the late 1970s. The downside ofcentralised units and functions is also recognised.They are often seen as threatening by other unitsof administration with attendant resistance tochange and potentially strained relationships.

6.6. Organisation of evaluation inpublic agencies

How evaluation is organised appears to be animportant factor in effective evaluation take upand implementation. There are a number of majortopics within this debate:

(a) it is a matter of values and attitudes, a beliefthat it is right to look critically at policies andprogrammes and to gather and considerevidence about what works and how toimprove performance;

(b) it is a matter of administrative practice, howadministrations are organised, how stake-holders are involved in the evaluation processand how appropriate levels of separation andintegration are maintained between thosewho implement and those who evaluatepublic policies;

(c) it is a matter of system integration; systemsthat are supportive of an evaluation cultureare usually networked through professionalassociations and adhere to common profes-sional standards.

In the section that follows, (which continues todraw mainly on Boyle and Lemaire), we concen-trate mainly on the administrative arrangements.

Decentralised evaluation units are most likelyto be intended to support decision-making andprogramme effectiveness at a programme level.They are usually aligned with programmemanagement and are hence less threatening.However, issues of independence and bias canarise from their closeness to programmepersonnel. Another weakness may be lack ofmethodological skills in evaluation, though themost important criticism raised by Sonnichsen isthe possible lack of power that decentralisedunits have, especially where decisions aboutprogramme and policy futures are still made at acentralised level. Ultimately this debate resolvesitself into one of evaluation purpose. Where thepurpose is primarily to improve programme andpolicy implementation, there appear to be strongarguments for a decentralised model which willalso favour learning at a decentralised level.Where the purpose is primarily to support centralstrategies and policy-making, the argument for acentralised model appears to be stronger: herethe learning would tend to occur centrally ratherthan at programme or policy division level. Issuesof professional competence and skills acquisitionappear to be stronger within a decentralisedframework. This issue is further examined in thesame volume by Boyle, who discusses thehuman resources aspect of evaluation specialistsand the professionalisation of evaluation. Hehighlights, in particular, the importance of appro-


priate training courses and education curricula,the way expertise is deployed mixing differentskills together, and the importance of continuousin-service training. From the point of view of thisreview, much of this discussion is generic ratherthan focused on use. However, from a useperspective, Boyle makes a strong argument fordeveloping evaluation users. Evaluation usersalso need to be trained and developed tobecome consumers. In some ways this can beseen as one of the consequences also of intro-ducing managing for results approaches in publicservice organisations. This, in Boyle’s terms,creates the link between the supply and demandsides of evaluation use.

The shift in the institutional practice of evalua-tion within the public sector from monitoring andmanagement to accountability and performanceis widely noted (see especially Bastoe, 1999) Onedefinite tendency that this and other studiesdemonstrate is the integration of evaluation andmonitoring with various approaches to perfor-mance management.

Among the elements of this approach,following an influential OECD paper (OECD,1995), are: (a) objective and target setting; (b) management responsibility to implement

against targets;(c) the monitoring of performance;(d) the feed-in of such performance into future

policy-making and programming; (e) the provision of information to external parlia-

mentary and audit committees for ex-postreview.

There is also disagreement about the need andbenefit of linking audit and evaluation, whichBastoe also recognises, especially when imple-menting performance management systems. It is,for example, important also to consider ‘howlearning actually takes place in organizations’.This attention to learning is also the preoccupa-tion of other authors in the institutionalisation ofevaluation in the public sector.

6.7. Evaluation as dialogue

A quite different approach to analysing evaluationuse in policy settings is suggested byVan der Knaap (1995) in his article Policy evalua-

tion and learning: feedback, enlightenment orargumentation? He challenges the traditionalrational-objectivist model of policy evaluation,favouring rather a constructivist view in whichpolicy-makers conduct dialogues about evalua-tion findings in order to reach their conclusions.Thus ‘policy-making is conceived of as anongoing dialogue, in which both governmentaland societal actors contest their views on policyissues by exchanging arguments’. At heart, thisargument challenges the ‘positivist idea thatpolicy evaluators may provide the policy-makerwith neutral or objective feedback information orrecommendations’. Rather than enlighten thepolicy-maker, ‘at best, the evaluator mightcontribute to the quality of policy discourse byentering the negotiations that compose thepolicy-making processes with informed argu-ments and a willingness to listen, argue, andpersuade or be persuaded’. This shift from therational to the argumentative is, according toVan der Knaap, a way to ‘institutionalise policyorientated learning’. This is not to suggest thatthe evaluator is relieved of the responsibility toprovide reliable information and findings but thatthere is a need also to supplement traditionalanalysis with material that will stimulate debateand allow different stakeholders to considermaterial presented from different perspectives.

A similar logic informs a recent article byValovirta (2002). ‘Rather than regarding evaluativeinformation as indisputable knowledge, it is viewedas a collection of arguments, which can bedebated, accepted and disputed’. According to theauthor, utilisation of evaluation should be regardedas a process that runs through four stages:(a) familiarisation with evaluation results and

involvement in the evaluation process;(b) interpretations based on expectations,

assessments of the quality of research (truthtest) and the feasibility of actions proposed orimplied (utility test);

(c) argumentation in which ‘individual interpreta-tions are […] subject to collective deliberation,discussion, negotiation and decision-making’;

(d) effects which may take the form of decisionsand actions, new shared comprehensionsand increased awareness; and increased orundermined legitimacy.

Within this perspective on evaluation use, ‘eval-uations force people to present well-grounded


arguments for refuting evaluation conclusions andrecommendations. This opens up possibilities fornew understandings to emerge.’ (Valovirta, 2002;p. 77).

Valovirta makes an important distinctionbetween the different contexts in which evalua-tion takes place. In particular he distinguishesbetween settings where there is a high level ofconsensus versus those with a high level ofconflict; and settings where there is a low pres-sure for change versus those with a high pressurefor change. He suggests that these contextualdifferences will determine the nature of the argu-mentation that takes place around evaluationfindings.

6.8. Strategies and types ofevaluation use

In summary, and cutting across the theoreticaldebates described above, we can identify sixmain strategies or approaches to evaluation usefrom this discussion:(a) instrumentalist, when evaluation is used

instrumentally to achieve an intended andexplicit type of use, e.g. make recommenda-tions that are then implemented;

(b) incrementalist, when evaluation becomesuseful cumulatively over time by bringingtogether evaluation findings from differentevaluations e.g. through meta-analysis andsynthesis, in order to inform action;

(c) process-oriented, when evaluation is usefulas much because of the processes ofengagement and debate it engenders amongstakeholders as because of the results itproduces: thinking changes even if recom-mendations are not implemented;

(d) administrative proceduralist, when the proce-dures through which evaluation is organisedand delivered make evaluation use more likely,e.g. well structured terms of reference plusrequirements that programme managers acton or, at least, respond to evaluation findings;

(e) systemic proceduralists, when the widersystem in which evaluation is embeddedincluding dissemination networks, communi-ties of practice and administrative culturesencourage evaluation use;

(f) performance management, when thedemands of improving administrative perfor-mance and achieving targets creates amarket for evaluation outputs.

In the real world, these categories are notmutually exclusive and most public agencies willpursue more than one. However, they do consti-tute major alternatives; few users of evaluationattempt many of these strategies simultaneously.Some of these different approaches to evaluationuse are more or less supportive of different typesof evaluation. For example, instrumentalapproaches are more likely to be consistent withan accountability or outcome and impact type ofevaluation; a performance managementapproach is mainly concerned with improving anddeveloping programmes, evaluation being forma-tive for these programmes. However differentcontexts of use can support very different typesof evaluation; processual strategies may conducttheir debates and arguments about evaluationsthat are neither concerned with processes nor setout to be formative. The main implication fromthe above discussion is that evaluation is notsimply an applied technology or method. Ratherit is embedded in contexts of use that shapewhat evaluation becomes in particular contextsand fields of application.


7.1. Evaluation codes andstandards

Questions about the roles of evaluators areunavoidable, partly because the dominant logic ofevaluation sees them exercise so much control. Aswith any other debate about the responsibility ofprofessionals (doctors, auditors, research scien-tists) the question of quis custodiet ipso custodes?is soon heard. Given that the usual answer amongprofessionals is ‘through self-regulation’, thesubsequent question is: ‘through what means andagainst what criteria and standards does regula-tion occur?’. It is in order to establish some sharedagreement among evaluators, commissioners andthose affected by evaluation, that evaluation codesand standards have become so prevalent.However, codes and standards also have abroader purpose. In a decentralised systemcomposed of many stakeholders, standards are away of regulating behaviour across organisationalboundaries, provided, that is, that all partiesaccept these norms.

As the previous section has tried to show, thereis widespread concern among evaluators aboutevaluation use. This has partly been fuelled by theacademic debates reviewed above, which havehighlighted the problem of evaluations not beingused. This is one impetus behind the developmentof guidelines and standards, usually for evaluatorsbut also for commissioners, that are intended topromote evaluation use directly and indirectly.They are direct because they often concern use.They are indirect because they always seek toenhance the quality of evaluation, which is widelyassumed to be a factor in enhancing evaluationuse and practice more generally.

The earliest of these efforts have beenproduced by the AEA (1995) and its precursororganisations, for example the Joint Committeeon Standards for Educational Evaluation (1994).

The programme Evaluation standards, a guide forevaluators, particularly from a programme perspec-tive, identifies standards under four main headings:(a) utility standards ‘are intended to ensure that

an evaluation will serve the information needsof intended users’;

(b) feasibility standards ‘are intended to ensurethat an evaluation will be realistic, prudent,diplomatic, and frugal’;

(c) propriety standards ‘are intended to ensurethat an evaluation will be conducted legally,ethically, and with due regard for the welfareof those involved in the evaluation, as well asthose affected by its results’;

(d) accuracy standards ‘are intended to ensurethat an evaluation will reveal and conveytechnically adequate information about thefeatures that determine worth or merit of theprogram being evaluated’ (from JointCommittee on Standards for EducationalEvaluation, 1994).

These standards are variously concerned withsound methods, timely dissemination, the inde-pendence and impartiality of evaluators and thenecessary level of evaluator skill and compe-tence.

Perhaps the clearest indication of how suchstandards are intended to establish norms thatwill influence the conduct of evaluation is to befound in the way ‘utility’ is elaborated as a stan-dard. Thus utility includes:(a) being clear about stakeholders so that their

needs can be addressed;(b) ensuring the credibility of the evaluators so

that their results are likely to be accepted;(c) collecting relevant information from a broad

range of sources as understood by clientsand stakeholders;

(d) being clear about value judgement used tointerpret findings;

(e) reporting clarity so that the informationprovided in reports is easily understood;

(f) disseminating reports to intended users in atimely fashion;

(g) planning evaluations from the outset in a waythat encourages follow-through from stake-holders.

It is worth noting that other standards also haveimplications for the conduct of evaluation. Forexample, feasibility standards include a substan-dard entitled political viability. This is concernedwith obtaining the cooperation of different interestgroups in order to limit bias or misapplication of

7. Evaluation standards and regulation

results. Similarly the propriety standards include asubstandard entitled service orientation which isdesigned to assist organisations to address andeffectively serve the needs of a full range oftargeted participants; and one entitled conflicts ofinterest which seeks to avoid compromising evalu-ations and their results. Accuracy standards arealso relevant, for example valid information, one ofthe substandards of accuracy, is justified in termsof assuring the interpretation arrived at is valid forthe intended use.

This set of standards has been widely imitatedand adapted to different national contextsincluding, most recently, by the DeutscheGesellschaft für Evaluation (DeGEval; 2002; seealso Beywl and Speer in this report). There arealso discussions and plans in France and the UKto develop standards. These discussions have,however, moved away from directly adopting theNorth American model.

Alongside the widespread adoption of the AEAProgramme Evaluation Guidelines there has alsobeen the emergence more recently of a set ofguidelines for the ethical conduct of evaluations.These differ from programme guidelines, beingmore concerned with the ethical dilemmas thatboth commissioners and evaluators face in thecourse of the evaluation process. Such guidelineshave been variously prepared by the AustralasianEvaluation Society (AES, 1997), the CanadianEvaluation Society (CES) (1) and the AEA. Theseethical guidelines are relevant to a discussionabout evaluation itself because they, too, areconcerned with the credibility as well as thefeasibility of evaluation.

Both the CES and the AEA direct their atten-tion to evaluators. Thus the CES guidelines,under the three main headings of competence,integrity and accountability, are concerned thatevaluators should be competent, act withintegrity and be accountable. Similarly the AEA inits Guiding principles for evaluators – undervarious headings of systematic inquiry, compe-tence, integrity/honesty, respect for people,responsibilities for general and public welfare andrecommendation for continued work – also directtheir attention to what evaluators ought to do.

One set of ethical guidelines drawn up by eval-uators’ professional bodies, produced by theAES, stand out from the others. These are

concerned with ethical behaviour and deci-sion-making among commissioners, users andteachers of evaluation as well as evaluatorsthemselves. According to the AES, the primarygroups addressed by their guidelines arecommissioners and evaluation teams or evalua-tors. In this regard also the AES diverges fromsome of the other guidelines referred to above.They acknowledge that evaluators often work inteams rather than mainly as individuals.

The AES Guidelines for the ethical conduct ofevaluations follows the evaluation cycle bygrouping its guidance under three main headings:(a) commissioning and preparing for an evaluation;(b) conducting an evaluation;(c) reporting the evaluation results.

One view one can take about the AES guide-lines is that they constitute a quality assuranceframework for the entire evaluation process. Theseguidelines focus on a number of ways in whichcredibility of evaluations can be enhanced, i.e.:(a) shared expectations between evaluators and

commissioners about what can be deliveredthrough an evaluation;

(b) strengthening the basis for evaluation judge-ments;

(c) reducing conflicts during the course of theevaluation;

(d) ensuring balance and simplicity in the wayreports are presented.

However, it should be noted that the AESregards these guidelines as complementary to,rather than as a substitute for, other guidelinessuch as the Programme evaluation guidelines;indeed they encourage the use of the Guidelinesfor the ethical conduct for evaluations jointly withthe Programme evaluation standards.

A set of evaluation guidelines and standardsthat is firmly set within the concerns of theprogramme managers and commissioners ofevaluation, has recently been issued by the Euro-pean Commission (2). These are linked to theintroduction of ‘activity based management’, theCommission’s form of results-based manage-ment, and are consistent with many of theapproaches to strengthening evaluation capacityand evaluation use discussed above. They coverhow evaluation ‘functions’ across the Commis-sion should be organised and resourced, howevaluations should be planned and managed,


(1) www.evaluationcanada.ca(2) http://europa.eu.int/comm/budget/evaluation/index_en.htm

how results should be used and disseminatedand how to ensure good quality reports.

There are a variety of other guidelines for evalu-ation that are indicative of the widespread interestin evaluation standards. For example, the Meanscollection (European Commission, 1999), Volume 1,Evaluation design and management, has a sectionon optimising the use of evaluation. This focuseson dissemination, distinguishing between bothdifferent communication channels (e.g. reports,synthesis, article, confidential note) and audiences(e.g. commissioners of the evaluation, steeringgroups, managers, European institutions, citizens

and journalists). The Means collection authors alsoacknowledge the ‘absence of a direct short-termlink between recommendations and decisions’.

Evaluation standards are not only intended asa framework for the design of particular evalua-tions. They are also used in meta-evaluations, totry to describe the range of evaluation practice,and as a quality assessment tool in relation tocompleted evaluations. In the European Commis-sion, for example, the means ‘quality criteria’ arewidely used as a framework for assessing evalu-ations, both at proposal and completion stages(Table 5 below).


Table 5: Grid for a synthetic assessment of quality of evaluation work

With regard to this criterion, the evaluation report is

1. unacceptable2. acceptable3. good4. excellent 1 2 3 4

� � � �

� � � �

� � � �

� � � �

� � � �

� � � �

� � � �

� � � �

� � � �

Source: European Commission, 1999

Meeting needs: does the evaluation adequately address the requests for infor-mation formulated by the commissioners and does it correspond to the terms of reference?

Relevant scope: have the rationale of the programme, its outputs, results,impacts, interactions with other policies and unexpected effects been carefully studied?

Defensible design: is the design of the evaluation appropriate and adequatefor obtaining the results (with their limits of validity) needed to answer the main evaluative questions?

Reliable data: are the primary and secondary data collected or selected suit-able? Are they sufficiently reliable compared to the expected use?

Sound analysis: are quantitative and qualitative data analysed in accordancewith established rules, and are they complete and appropriate for answering the evaluative questions correctly?

Credible results: are the results logical and justified by the analysis of data and by interpretations based on carefully presented explanatory hypotheses?

Impartial conclusions: are the conclusions just and non-biased by personal orpartisan considerations, and are they detailed enough to be implemented concretely?

Clear report: does the report describe the context and goal, as well as theorganisation and results of the evaluated programme in such a way that the information provided is easily understood?

In view of the contextual constraints bearing on the evaluation, the evalua-tion report is considered to be

On the boundaries of the territory coveredhere, is a wider literature on innovation and goodpractice. Thus in many studies that have beenconducted at a European level there is a concernfor dissemination of innovation. An interestingexample of this is found in Mannila et al. (2001).In this volume, Sonderberg discusses main-streaming, innovation and knowledge networks.The way in which innovations are spread is closeto the problem of how to encourage the take-upof the results of evaluations. Sonderberg alsofocuses on knowledge networks (which is closeto notions of policy communities and communi-ties of practice) as a means of ensuring thedissemination and mainstreaming of new ideas.Arguably this broader literature on the diffusion ofinnovation could be regarded more generally as asource of insights into how evaluation practiceand findings are disseminated.

While there has been a great deal of discussionaround standards and codes in North America,Australasia and, most recently, in Europe, theevidence of take-up of such standards and codesis sparse. Most evaluators can come up withexamples of ‘breaches’ in most of their evaluations.The systematic application of standards is rare.

7.2. Evaluation as a method forregulating decentralisedsystems

There is a need to understand the pervasivenessof evaluation across sectors, branches of publicadministration and the public sector more widelyand its growth as a practice. As was suggested inthe terms of reference for this study, part of theanswer rests with the nature of public sectorreform, including deregulation and decentralisa-tion, that has occurred over the last two decades.This process has fragmented previously estab-lished mechanisms of control and management.No longer are programmes, let alone policies,delivered by single agencies. The de facto normsand standards that characterised well-estab-lished public bodies are no longer shared amongthose responsible for programmes and policyinstruments. This is important not only for retro-spective accounting for success or failure butalso to plan and implement major interagencyinitiatives. Evaluation has become one of the key

elements in the new technology of coordinationand interagency management required by therestructured public sector.

This has been noted by evaluation researchersin the education sector in particular. Thus Henkel(1997) has noted how the external assessment ofhigher education institutions in the UK has led to areduction in their traditional autonomy andarguably a shift towards increased managerialintervention inside these institutions. Segerholm(2001) develops a more extensive theory of‘national evaluations as governing instruments’.She describes how the evaluation of educationprogrammes in Swedish Higher Education ‘workedas an administrative technique for disciplining and[…] governing’. In this, Segerholm follows Foucaultin arguing that diffused evaluation systems withcriteria that are internalised by institutions andpractices that make the application of such criteriavisible, increase opportunities for central control.Similar arguments have been advanced in otherparts of the public sector that have been subjectto wholesale reform and decentralisation – but stillheld accountable – especially in northern Euro-pean and North America.

A second explanation for the spread and take-up of evaluation has been the new demand foraccountability that comes from better informedand less deferential citizenry. Those responsiblefor delivering public policy are faced byconflicting demands. On the one hand their polit-ical masters require success, transparent opera-tions, evidence of efficiency and explanations offailure. On the other hand, communities,presumed beneficiaries and those affected bypolicy initiatives, schooled in the market rhetoricof customers who receive a service, are increas-ingly sceptical of government action. The growthin ever more detailed management oriented andexplanatory evaluation frameworks, and thesimultaneous growth of participative, bottom up,stakeholder led evaluation practice, is a responseto these contradictory demands.

Evaluation in these contexts can be seen as acontrol process within the cybernetic meaning ofthe term. Terms such as steering, guidance andregulation are frequently attached to descriptionsof evaluative activity. In these terms, standardsbecome especially important. Standards becomenot only a tool for judging and valuing, they alsobecome the means to develop and express


consensus among fragmented policy communi-ties and communities of practice.

The role of standards in evaluation, therefore,needs to be seen much more broadly than thedebate about the standards that apply to evalua-tion practice and outputs. Four main understand-ings of standards in evaluation can be identified: (a) those concerned with standards (and criteria)

to judge outcomes and effectiveness;(b) those concerned with required standards of

performance in decentralised administrativesystems engaged in programme delivery;

(c) those concerned with devolved self-evalu-ating systems operating within a predeter-mined framework;

(d) those concerned with the required behaviourof evaluators and those who commissionevaluation.

Each of these has been discussed in thischapter. However, the nature of these standardsdiffer subtly, for example: (a) evaluation standards allow for judgements to

be made based on norms and/or the beliefsof stakeholders;

(b) programme delivery standards are usually setby higher levels of an administration (or

possibly through regulation) in order to exer-cise influence over the performance of others;

(c) the standards that operate in devolvedsystems are processual; they concern theobligation of those to whom powers aredevolved to follow certain procedures,including making evaluation outputs available;

(d) standards for evaluators are part of theself-regulation agenda or a recently emergedprofessional group, mainly developed fromwithin that profession.

Types of evaluation identified in this chapter inpart derive from research and conceptualisationabout evaluation. However this research-basedactivity interacts with two others. First, evalua-tors as practitioners reflect on their own practiceand have become more professional to the pointwhere issues of professional self-regulation havebeen highlighted. Second evaluation is shapedby a complex web of contractual and institu-tional demands. Both of these activities, evalua-tors as reflective practitioners and evaluation asan institutionalised and market/network basedpractice, determine what types of evaluationsurvive as well as what types are possible oradvocated.


As this study is conceived within a Europeancontext, there are also questions to be asked asto the likely evolution of evaluation within Europe.This is not to suggest that evaluation in Europe issui generis, rather that there are distinctiveaspects of the EU’s institutions and the chal-lenges that evaluators face in Europe. Thisdistinctiveness is shaped by a number of under-lying dynamics, the most important of which arediscussed below.

The first is accountability and participation.The policy system, the main customer of evalua-tion, faces contradictory demands. On the onehand, the demand for accountability for moniesspent and for promises made generatesmounting pressure for top-down, performanceoriented, quasi-audit types of evaluation. On theother hand, citizens who often feel estrangedfrom their politicians and question the basis ofdistant, context-free decisions, demand closerinvolvement and participation in the way policiesare designed and implemented. The extent towhich evaluation evolves in an accountability orparticipative direction will depend on how thesecontradictory demands are – or are not –resolved. This is not to suggest that both variantscannot coexist, but the two cultures of evaluationare very different and the balance of thinking andresource constitute genuinely alternativescenarios.

Diversity and convergence is a second consid-eration. The ever-expanding European ‘space’ isincreasingly diverse in its traditions, institutions,languages and culture. The European project wasfounded on a vision of integration that assumesconvergence. Even though the contrary vision ofsubsidiarity emphasises continued diversity asmuch as convergence, both tendencies remainstrong. This matters for evaluation because thediversity/convergence dynamic underpins manyof the roles that evaluators are expected totake-up. In particular it affects how evaluatorsrelate to stakeholders. When diversity isaccepted, value-sensitivity and working closelywith stakeholders is taken for granted. Whenconvergence is assumed there is more likely to

be a presumption of homogeneity among thoseaffected by public intervention and a tendency tofavour methods and modus operandi that takelimited account of the particular circumstances ofpolicy and programme implementation.

Reconciliation of evaluation cultures is alsoimportant. Alongside the cultural diversity ofEurope, and not disconnected from it, is culturaldiversity among evaluators. Within the social andeconomic sciences in particular, there are familiarcleavages, many of which have been discussedin this chapter. For example, evaluation also hasits advocates for largely empirical and positivistmethodologies as well as advocates fortheory-based investigation in various forms. Whatconstitutes evidence, validity, generalisability andlegitimate conclusions are hotly debated amongevaluators as they are in other academic andprofessional circles. To an extent, these culturaldivides among evaluators mirror, and are rein-forced by, cultural divisions within Europe. Thephilosophical patrimony of Latin, Scandinavianand Anglo-Saxon countries, can make themreceptive to different evaluation approaches andmethodologies. There is also a history of policyborrowing within Europe, with establishedpatterns of shared professional and policynetworks that predispose some countries toadopt practices more easily from particular coun-tries rather than others. (Italians are more likely to‘borrow’ from the French, Scandinavians fromeach other and the British from North Americans.)So, the alternatives of convergence or national –or at least regional – specificity are available toevaluators as they are to Europe’s citizens andtheir Member States. Among the influences thatwill shape these alternatives, the growth of evalu-ation societies at a European level and the poli-cies of the EU, are likely to be important, as arethe dissemination of evaluation standards andprocedures by European institutions.

Solidarity and social exclusion also feature. Inmany socioeconomic programmes, includingVET, lack of social solidarity is a factor that bothunderpins the problems that the programmeseeks to address and constrains the policy

8. Conclusions and future scenarios

responses that are possible. In extremis socialexclusion is at the heart of these programmes.However, the methods of evaluation that arefrequently deployed assume solidarity and makelittle contribution towards social cohesion. Fromwithin a presumption of solidarity, evaluators caneasily become part of a regulatory or controlregime. A counter tendency within contemporaryevaluation is to engage actively in formative,developmental and trust building activities as partof the evaluation itself. This goes beyond partici-pation or offering stakeholders a voice: it takesthe opportunity to support social inclusion andsolidarity by the way an evaluation is conducted.For example, not only do evaluators study therealities of partnership working, they alsocontribute to the strengthening of partnerships bythe methods and research strategies they adopt;

There is also tension between complexity andlinearity. Characteristically, much evaluation workis increasingly complex. Such complexity under-pins a high level of uncertainty in policy interven-tions. Success is unsure and goals need to beredefined (or rationalised) along the way. However,many policy interventions are simple in their logic.They assume a linear input leading to apredictable output: increased investment in VETleads to higher wages, or more work-basedlearning leads to greater competitiveness. Whenreinforced by a strong accountability ethos, thisplaces pressure on evaluators to attempt to cali-brate the future (e.g. through impact assessment)and demonstrate through ‘success stories’ earlywins and the achievement of anticipated goals.When the policy system is able to acknowledgeuncertainty – which implies an unusual degree ofopenness to citizens and electorates – new possi-bilities are opened up also for evaluators. Theybecome part of a more reflexive and iterative‘learning culture’, with lessons learned frommistakes, as well as from evidence of success.

Finally, there continues to be important divi-sions between policy-makers and evaluators.Some of these derive from the dynamics identifiedabove. For example, the ethos of accountabilityand the need to demonstrate success, whichdrives policy-makers, and the preoccupation withcomplexity and value difference, which tends todrive evaluators to a greater extent. Bridging thecultural divide between policy-makers and evalua-

tors has become a theme in many Europeandebates about evaluation policy and practice. Inthe first generation of these debates, theemphasis has been on educating evaluators tounderstand the priorities and pressures onpolicy-makers: the importance of deadlines, theneed for clear recommendations, the need toaccept the parameters of current conventionalwisdom. This is often accompanied by a demandfrom evaluators that ‘policy-makers become morelike us’. The emergence of networks and commu-nities of practice that span both evaluators andpolicy-makers opens up a second generation ofdebate. This offers the possibility of shared frame-works, an understanding of what evaluation canand cannot achieve and an understanding of whatpolicy-makers need in order to learn from and touse evaluation outputs and processes. The extentto which this second generation debate becomesmore commonplace, so that European evaluatorsand policy-makers can move beyond the ‘whycan’t they be more like us’ refrain, will also shapethe way in which European evaluation evolves inthe future.

Considering the possible futures of evaluationbrings into focus many of the issues raised else-where in this article, but from a particularperspective. While evaluators, policy-makers andevaluation researchers advocate different evalua-tion approaches, models and practices, what thisconcluding section has sought to emphasise isthat evaluation itself is shaped by societaldynamics and contingencies. There are choicesbetween: types of evaluation and underlyingphilosophies of science; capacity developmentand evaluation use; evaluation theory and theorymore generally; and the prominence of participa-tion, empowerment and self-regulation on theone hand, and top-down, policy driven variants ofevaluation on the other. These are not open andunencumbered. Rather like other (social) tech-nologies, evaluation is shaped by wider societal,political and institutional dynamics. The future ofevaluation in Europe, as elsewhere, should there-fore be understood also in terms of the biggerpicture: of the way policy systems adapt tosocioeconomic, cultural and political challenges,and the way evaluators themselves engage asactors, making choices about how they cancontribute to these challenges.


List of abbreviations

AEA American evaluation association

AES Australasian evaluation society

CES Canadian evaluation society

CMO Context, mechanism, outcome

CVT Continuing vocational training

RCTs randomised control trials

VET Vocational education and training

Alkin, M. C. Debates on evaluation. London: Sage,1990.

AEA – American Evaluation Association Task Forceon Guiding Principles for Evaluators. GuidingPrinciples for Evaluators. In: Shadish, W. R.et al. (eds) New directions for program evalua-tion – guiding principles for evaluators, 1995,No 66, p. 19–24.

AES – AustralasianEvaluationSociety. Guidelines for theethical conduct of evaluation, 1997. Available fromInternet: http://www.aes.asn.au/aesguide02.pdf[Cited 30.6.2003].

Biott, C.; Cook, T. Local evaluation in national earlyyears excellence centres pilot programme.Evaluation – The International Journal of Theory,Research and Practice, 2000, Vol. 6, No 4,p. 399-413.

Boyd, R. Confirmation, semantics and the interpre-tation of scientific theories. In: Boyd, R.;Gasper, P.; Trout, J. D. The philosophy ofscience. Cambridge, MA: MIT Press, 1991.

Bastoe, P. O. Linking evaluation with strategic plan-ning, budgeting, monitoring and auditing. In:Boyle, R.; Lemaire, D. (eds) Building effectiveevaluation capacity: lessons from practice.London: Transaction Publishers, 1999,p. 93-110 (Comparative policy analysis series).

Boyle, R.; Lemaire, D. (eds) Building effective eval-uation capacity: lessons from practice. London:Transaction Publishers, 1999 (Comparativepolicy analysis series).

Caracelli, V. J.; Preskill, H. (eds) New directions forevaluation: the expanding scope of evaluationuse. San Francisco: Jossey-Bass, 2000.

Chelimsky, E. Politics, policy and researchsynthesis. Evaluation – The InternationalJournal of Theory, Research and Practice, 1995,Vol. 1, No 1 p. 97-104.

Chelimsky, E. Thoughts for a new evaluationsociety. Evaluation – The International Journalof Theory, Research and Practice, 1997, Vol. 3,No 1, p. 97-118.

Chen, H.-T. Theory-driven evaluations. London:Sage, 1990.

Connell, J. P. et al. (eds) New approaches to evalu-ating community initiatives: concepts, methods

and contexts. New York: The Aspen Institute,1995.

Cronbach, L. J. Designing evaluations of educa-tional and social programs. San Francisco:Jossey-Bass, 1982.

Cronbach, L. J. Construct validation after thirtyyears. In: Linn, R. L. (ed.) Intelligence: measure-ment, theory and public policy. Urbana: Univer-sity of Illinois Press, 1989.

DeGEval – Deutsche Gesellschaft für Evaluation[German Evaluation Society]. Standards für Eval-uation. Köln: DeGEval, 2002 Available fromInternet: www.degeval.de/standards/index.htm[cited 30.6.2003].

Davies, H. T. O.; Nutley, S. M.; Smith, P. C. (eds)What works? Bristol: The Policy Press, 2000.

European Commission. Evaluation design andmanagement. In: Means collection: evaluatingsocio-economic programmes. Luxembourg:Office for Official Publications of the EuropeanUnion, 1999, Vol. 1.

Fetterman, D. M.; Kaftarian, S. J.; Wandersman, A.(eds) Empowerment evaluation: knowledge andtools for self-assessment and accountability.London: Sage, 1996.

Fournier, D. M. Establishing evaluative conclusions:a distinction of general and working logic In:Fournier, D. M. (ed.) Reasoning in evaluation:inferential links and leaps. San Francisco:Jossey-Bass, 1995, p. 15–32 (New directionsfor program evaluation, 68).

Fulbright-Anderson, K.; Kubisch, A. C.; Connel, J. P.New approaches to evaluating communityinitiatives, Vol. 2: Theory, measurement andanalysis. Washington DC: The Aspen Institute,1998.

Gallo, P. S. Meta-analysis – a mixed metaphor.American Psychologist, 1978, Vol. 33,p. 515–517.

Guba, E. G.; Lincoln, Y. S. Fourth generation evalu-ation. London: Sage, 1989.

Henkel, M. Teaching quality assessments: publicaccountability and academic autonomy inhigher education. Evaluation, 1997, Vol. 3, No 1,p. 9–23.

References

House, E. R. Assumptions underlying evaluationmodels. In: Madaus, G. F.; Scriven, M.; Stuffle-beam, D. L. Evaluation models viewpoints oneducational and human services evaluation.The Hague: Kluwer-Nijhoff Publishing, 1983.

Joint Committee on Standards for EducationalEvaluation. The program evaluation standards.How to assess evaluations of educationalprograms. Thousand Oaks: Sage, 1994.

Julnes, G.; Mark, M. M.; Henry, G. T. Promotingrealism in evaluation: realistic evaluation andthe broader context. Evaluation – the Interna-tional Journal of Theory, Research and Practice,1998, Vol. 4, No 4, p. 483-504.

Lave, J.; Wenger, E. R. Situated learning: legitimateperipheral participation. Cambridge: CambridgeUniversity Press, 1991.

Leeuw, F. L.; Rist, R. C.; Sonnichsen, R. C. Cangovernments learn? Comparative perspectiveson evaluation and organization learning.Somerset NJ: Transaction Publishers, 1994(Comparative Policy Analysis Series).

Leeuw, F. L.; Toulemonde, J.; Brouwers, A. Evalua-tion activities in Europe: a quick scan of themarket in 1998. Evaluation – the InternationalJournal of Theory, Research and Practice,1999, Vol. 5, No 4, p. 487–496.

Mannila, S. M.; Ala-Kauhaluoma, M.; Valjakka, S.(eds) Good practice in finding good practice:international workshop evaluation. Naperville,IL: Rehabilitation Foundation, 2001.

Miller, R. W. Fact and method in the social sciences.In: Boyd, R.; Gasper, P.; Trout, J.D. The philos-ophy of science. Cambridge, MA: MIT Press,1991.

Nagarajan, N.; Vanheukelen, M. Evaluating EUexpenditure programmes: a guide to interme-diate and ex-post evaluation. Luxembourg:Office for Official Publications of the EuropeanCommunities, 1997.

Nutley, S. M.; Walter, I.; Davies, H. T. O. Fromknowing to doing: a framework for under-standing the evidence-into-practice agenda.Evaluation – the International Journal of Theory,Research and Practice, 2003, Vol. 9, No 2.

OECD – Organisation for Economic Cooperationand Development. Budgeting for results:perspectives on public expenditure manage-ment. Paris: OECD, 1995.

Patton, M. Q. Utilization-focussed evaluation: thenew century text (3rd Edition). London: Sage,1997.

Patton, M. Q. Qualitative research and evaluationmethods. London: Sage, 2002.

Pawson, R. Evidence based policy: in search of amethod. Evaluation – the International Journalof Theory, Research and Practice, 2002(a),Vol. 8, No 2, p. 157-181.

Pawson, R. Evidence based policy: the promise of‘realist synthesis’. Evaluation – the InternationalJournal of Theory, Research and Practice,2002(b), Vol. 8, No 3, p. 340-358.

Pawson, R.; Tilley, N. Realistic evaluation. London:Sage, 1997.

Rist, R. C.; Furubo, J. E.; Sandahl, R.(eds) The evaluation atlas. London, 2001.

Rogers, E. Diffusion of innovation. New York: TheFree Press, 1995.

Rogers, P. J. Program theory: not whether programswork but how they work. In: Stufflebeam, D. L.;Madaus, G. F.; Kellaghan, T. Evaluation modelsviewpoints on educational and human servicesevaluation (2nd edition). Dordrecht: KluwerAcademic Publishers, 2000.

Rossi, P. H.; Freeman, H. E.; Lipsey, M. W. Evalua-tion: a systematic approach (6th edition).London: Sage, 1999.

Sabatier, P. A. An advocacy coalition framework ofpolicy change and the role of policy-orientedlearning therein. Policy Sciences, 1988, Vol. 21,p. 129-168.

Sabatier, P. A.; Jenkins-Smith, H. C. Policy changeand learning – an advocacy coalition approach.Boulder, CO: Westview Press, 1993.

Sanderson, I. Evaluation in complex policysystems. Evaluation – the International Journalof Theory, Research and Practice, 2000, Vol. 6,No 4, p. 433-454.

Schwandt, T. Evaluation as practical hermeneutics.Evaluation – the International Journal of Theory,Research and Practice, 1997, Vol. 3, No 1,p. 69-83.

Scriven, M. Evaluation thesaurus (4th edition).London: Sage, 1991.

Segerholm, C. National evaluations as governinginstruments: how do they govern? Evaluation,2001, Vol. 7, No 4, p. 427-438.

Shadish, W. R.; Cook, T. D.; Leviton, L. C. Founda-tions of program evaluation: theories of prac-tice. London: Sage, 1991.


Shadish, W. R.; Cook, T. D.; Campbell, D. T. Exper-imental and quasi-experimental designs forgeneralised causal inference. Boston:Houghton Mifflin Company, 2002.

Stake, R. The art of case study research. London:Sage, 1995.

Stake, R. For all program evaluations there’s a crite-rion problem. Evaluation – the InternationalJournal of Theory, Research and Practice, 1996,Vol. 2, No 1, p. 99-103.

Stern, E. The characteristics of programmes andtheir evaluation. In: OECD. Nationalprogrammes in support of local initiatives. Paris, OECD, 1992.

Stern, E.; Kelleher, J.; Cullen, J. Preliminarycommon evaluation framework. London: TheTavistock Institute, 1992. (Articulate Deliver-able, No 1).

Stufflebeam, D. L. Foundational models for 21stcentury program evaluation. In: Stufflebeam,D. L.; Madaus, G. F.; Kellaghan, T. Evaluationmodels viewpoints on educational and humanservices evaluation (2nd edition). Dordrecht:Kluwer Academic Publishers, 2000(a).

Stufflebeam, D. L. Professional standards and prin-ciples for evaluations. In: Stufflebeam, D. L.;Madaus, G. F.; Kellaghan, T. Evaluation modelsviewpoints on educational and human servicesevaluation (2nd edition). Dordrecht: KluwerAcademic Publishers, 2000(b).

Toulemonde, J. Evaluation identities, 2001.Valovirta, V. Evaluation utilization as argumentation.

Evaluation – the International Journal of Theory,Research and Practice, 2002, Vol. 8, No 1,p. 60-80.

Van der Knaap, P. Policy evaluation and learning:feedback, enlightenment or argumentation.

Evaluation – the International Journal of Theory,Research and Practice, 1995, Vol. 1, No 2,p. 189-216.

Vedung, V. Public policy and programme evalua-tion. New Brunswick, NJ: Transaction, 1997.

W. K. Kellogg Foundation. Logic model develop-ment guide. Michigan: W. K. Kellogg Founda-tion, 2000.

Weiss, C. H. Evaluation research: methods ofassessing program effectiveness. New Jersey:Prentice-Hall, 1972.

Weiss, C. H.; Bucuvalas, M. J. Social scienceresearch and decision-making. New York:Columbia University Press, 1980.

Weiss, C. H. Nothing as practical as good theory:exploring theory-based evaluation for compre-hensive community initiatives for children andfamilies. In: Connell, J. P. et al. (eds) Newapproaches to evaluating community initiatives:concepts, methods and contexts. New York:The Aspen Institute, 1995.

Weiss, C. H. The interface between evaluation andpublic policy. Evaluation – the InternationalJournal of Theory, Research and Practice, 1999,Vol. 5, No 4, p. 468-486.

Weiss, C. H.. Theory-based evaluation: theories ofchange for poverty reduction programs. In: Fein-stein,O; Picciotto,R. (eds) Evaluation and povertyreduction. Washington: World Bank, 2000.

Weiss, C. H. Using research in the policy process:potential and constraints. Policy StudiesJournal, 1976, Vol. 4, p. 224-228.

Wholey, J. S. Using evaluation to improve programperformance. In: Levine, R. A. et al. (eds) Evalu-ation research and practice: comparative andinternational perspectives. Beverly Hills, CA:Sage, 1981.


philosophies and types of evaluation research - management

Documents