evaluation and decision-making in management development › 44bd › 18e01266fd...evaluation...

1

The role of evaluation in decision-making aboutmanagement and leadership programmes

Shirine Voller, Ashridge Business School

Postgraduate Students’ Day, 28 June 2010InCULT Conference, University of Hertfordshire

(Paper linked most closely to theme:Teaching & Learning and Assessment Issues for the Future)

Ashridge Business School http://www.ashridge.org.uk

http://www.ashridge.org.uk

2

Abstract

This paper is positioned in the field of executive/management development though itsbroader implications about evaluation will be of interest also to the higher educationsector, particularly in the context professional education and development.

Billions of dollars are invested annually in management development (Reade andThomas, 2004). Organisations invest based on the premise that by developing theirstaff there will be an ultimate return for the organisation, whether in improved profit,productivity, morale, loyalty, or a combination of factors.

Significant effort is also invested in assessing the impact of management developmentprogrammes through means other than formal assessment, which is rare in this sector.The emphasis tends to be less on what an individual student has achieved and more onhow well a programme has met its stated objectives with the sponsoring organisation asthe ‘client’ of such assessment.

Formal programme assessment is generally called evaluation. Numerous evaluationmodels exist, by far the most dominant and well used being Kirkpatrick’s four-levelmodel first articulated in the late 1950s (Kirkpatrick, 1959a; 1959b; 1960a; 1960b) andfurther developed since (Kirkpatrick, 1998). Alternatives to Kirkpatrick have attemptedto be more systemic and all-encompassing, involving, for example, variables pertainingto individual capability and propensity for development, the organisational climate intowhich an individual returns post-programme and level of top management support(Pawson and Tilley, 1994; Holton, 1996; Collins and Denyer, 2008; D'Netto et al., 2008).

Despite the abundance of theoretical models of evaluation, which seem to be goodexplanatory models, in practice most are rarely used (Holton and Naquin, 2005).Therein lies a conundrum: there is a desire for organisations to invest wisely in thedevelopment of their managers; surely one way of assuring sage future investment is toassess how well a programme has delivered on its objectives; and yet in practice, itappears that formal evaluation is sparsely used, or used at a very basic level. Thislament is not new, and whilst the drive from HRD researchers for more effort to beinvested in developing more sophisticated models of evaluation has been an ongoingtheme for many years, Holton and Naquin (2005) propose that we are barking up thewrong tree.

They argue that most models of evaluation have developed from the field of education,with scant attention paid to the literature on organisational decision-making. Mostevaluation models are derived from normative decision theory (Beach and Connolly,2005) which has been shown not to work in practice in organisational contexts. As such,whilst great effort may be invested in evaluation, it is not being done in a way thatsupports the decisions an organisation has to make about a programme.

In my research I have set out to test the assertions made by Holton and Naquin (2005).I have collected empirical data from ten case study organisations on how key decisionsabout a management development programme are made, and how programmeevaluation is used to inform decision-making. Organisations have been selected fromwithin the portfolio of Ashridge’s private sector clients.

I completed my data collection at the end of May, and am in the early stages of analysis.As such, I will be in a position to reveal more about my findings during my presentationat the InCULT conference than is contained within this paper. This paper should be seenas work-in-progress. It focuses mainly on the literature informing my choice of empiricalwork and describes the methodology for this work.



3

Section 1: Setting the scene

1.1 Introducing management and leadershipdevelopment

Management and leadership development are concerned with promoting, encouragingand assisting the expansion of knowledge and expertise required to optimisemanagement and leadership potential and performance (Brungardt, 1996).The purpose,ultimately, of such development is the improved effectiveness of the individuals withinan organisation (Bramley, 1999) in order to maintain, if not enhance, its performance(Herling, 2000; Collins and Holton, 2004).

Development can take a variety of forms, including formal training programmesalongside action learning, mentoring, job assignments, on-the-job experiences, feedbacksystems, exposure to senior executives and leader-follower relationships (McCauley etal., 1998; Suutari and Viitala, 2008). On-the-job experience is considered by some tobe the primary learning arena for managerial learning (Verlander, 1992) and there hasbeen a shift away from programmes and towards experience-based methodologies(Suutari and Viitala, 2008). Having said this, many contemporary programmes havesophisticated designs, including components such as action learning, mentoring andwork-based projects alongside formal inputs (Doh and Stumpf, 2007; Gibb, 1994;Dovey, 2002) with experiential methods being increasingly emphasised in favour of adidactic approach (Farrington, 2003).

Programme content can be classified in various ways, a useful heuristic being Hogan andWarrenfeltz’s (2003) domain model of competencies which outlines four inter-relatedareas of focus: Intrapersonal skills, interpersonal skills, leadership skills and businessskills.

Figure 1. Interpretation of Hogan & Warrenfeltz’s (2003) domain model of executive education

Another way of categorising programmes is by audience. Simply put, programmes areeither ‘open’ or ‘customised’ (Lippert, 2001; Tushman et al., 2007). Within thisdistinction there is some blurring, for example, where an open programme is customisedto a particular sector, or a single client organisation reserves all the places on an open



4

programme. However, in the main, open programmes are those that advertise astandard set of learning objectives, to be achieved through a standard design and areattended by individuals from a range of organisations. Customised programmes, incontrast, are usually developed in close liaison with a single client organisation anddesigned to address a specific organisational need. So, the degree of customisation isone dimension along which programmes can vary. A second, more complex to unravel,d imension is that of ind ivid ual versus team, or other ‘unit’ focus ( Tushman et al., 20 0 7) .

The design of management and leadership development programmes has varyingdegrees of sophistication and, as discussed above, programmes may include a range ofdevelopmental ex periences over and above formal programmatic elements. Within aprogramme, a range of teaching and learning techniques and methods tend to beemployed, including lectures, discussion, case analysis and simulations ( Lippert, 20 0 1) ,learning journals, ex periential ex ercises, psychometric tests and 360 -degree assessment( Vicere, 1996) .

In sum, management and leadership development is manifested in a variety of forms.With a focus on formal training programmes, their emphasis may be on one, or acombination of, domains of skill, categorised as interpersonal, intrapersonal, leadershipand business. In terms of programme types, a stand ard classification is ‘open’ or‘customised ’, with a focus that varies from ind ivid ual through to organisationaldevelopment. Within any programme type, a range of teaching and learningmethodologies are employed, resulting in designs that vary from simple to complex .

1.2 What is evaluation?

A variety of conceptions of evaluation exist. One useful definition sees evaluation as thesystematic collection of data to determine success in terms of quality, effectiveness orvalue (Goldstein, 1986; Goldstein, 1986; Hannum and Martineau, 2008). It occurs whenspecified outcome measures are conceptually related to intended learning objectives(Kraiger et al., 1993). Evaluation arguably should encompass the total value of a trainingintervention in social and financial terms (Talbot, 1992) and as such, be distinguishedfrom validation, which is limited to assessing achievement of objectives. However, thisdefinition and distinction are not commonly agreed upon, and do not serve to reflect thebreadth of what evaluation can offer.

By and large, in practice evaluation is distinguished from ‘research’ in that the emphasisis on measuring results, whilst research seeks to understand relationships amongvariables or describe phenomena as a means of developing knowledge that can begeneralised and applied (Hannum and Martineau, 2008). However, evaluation can alsocontribute to theory-building (Burgoyne and Singh, 1977) and a number of models ofevaluation attempt to incorporate an explanatory dimension in addition to puremeasurement, including the so-called systemic models of evaluation (Warr et al., 1970;Stufflebeam, 1989) and other models incorporating contextual factors (Holton, 1996;D'Netto et al., 2008; Baldwin and Ford, 1988). In determining the extent to whichevaluation can be considered ‘research’ it is important to consider purpose.

1.3 Why evaluate?

In order to make decisions about evaluation designs and methodologies it is important toclarify the purpose of any evaluation. There are a range of reasons as to whyevaluation might be important, succinctly categorised by Easterby Smith (1986) as



5

Proving, Improving, Learning or Controlling. Whether evaluation is to ‘prove’ or‘improve’ is at the heart of the d istinction between summative and formative evaluation ,where summative evaluation is about proof and formative evaluation is aboutimprovement (Michalski and Cousins, 2001).

The purpose of evaluation can be complicated due to a lack of goal clarity of theintervention itself. Burgoyne and Singh (1977) argue that a vicious circle is createdwhen training lacks clear goals, leading to a lack of certainty as to how and whether thegoals have been achieved, and the absence of a coherent evaluation strategy. Eachfactor reinforces the others and, they argue, possibly the only point at which this circlecan be broken is through the somewhat back-to-front process of developing a coherentevaluation strategy, which drives the effort to elicit goal clarity.

An analysis of evaluation from a decision-making perspective (Holton and Naquin, 2005)concluded that most evaluation models fit within a rational-economic framework whichdecision-making research has shown do not work in practice. Thus, if the purpose ofevaluation is as a means to making decisions about investment in training anddevelopment, then the majority of models are inadequate. Models that incorporatedecision-making theory more fully are called for, with Holton’s (1996) model and theEvaluative Learning for Inquiry in Organizations (EILO) model (Preskill and Torres, 1999)cited as exemplars.

1.4 Evaluation models from the training domain

Kirkpatrick is undoubtedly the name most strongly associated with evaluation of training.His four-step evaluation model has been in circulation since the late 1950s (1959a;1959b; 1960a; 1960b) and dominates the evaluation industry (Holton, 1996; e.g. Alligerand Janak, 1989) in terms of its direct use and its influence on other models andframeworks. The model has four ‘steps’ – now more often referred to as ‘levels’:reaction, learning, behaviour and results, with a fifth level sometimes added, focusing onultimate value of training (Holton, 1996; Hamblin, 1974; Phillips, 1995).

1. Reaction is defined as participants’ immediate response in terms of liking of orfeelings about a programme.

2. Learning refers to the ‘principles, facts and techniques understood and absorbed’by participants

3. Behaviour is about the on-the-job use of what has been learned4. Results relates to end goals, such as achievement of results, reduction of costs,

reduction in turnover and absenteeism, increased productivity or improvedmorale.

Alliger et al (1997) developed an augmented version of Kirkpatrick’s four-levelframework primarily to facilitate their meta-analysis of correlations between the levels.This modified framework segmented the definitions of the four levels, subdividing levelone into Reactions as affect, i.e. to do with liking, Reactions as utility judgements, to dowith perceived usefulness, and a category that encompassed both, Combined reactions.Level two, learning, was similarly subdivided, this time into Immediate post-trainingknowledge, Knowledge retention, i.e. a similar test of knowledge applied at some laterdate, and Behaviour/skill demonstration – as demonstrated in a training rather than on-the-job context. Levels three, transfer, and four, results, were defined similarly toKirkpatrick’s original model.

Whilst not concerning evaluation as such, Baldwin and Ford’s (1988) conceptualframework of the learning transfer process is fundamental to the development of latermodels of evaluation, and to the understanding of the influences on trainingeffectiveness. Transfer, the degree to which trainees effectively apply the knowledge,



6

skills and attitudes gained in a training contex t to the job ( Newstrom, 1984, in B aldwinand Ford, 1988) is argued to be a function of more than the original learning ( Atkinson,1972) , and for transfer to have occurred “learned behaviour must be generalised to thejob context and maintained over a period of time on the job” . In their model of thetransfer process, B aldwin and Ford specify three constituent phases: Tr a i n i n g i n p u t s ,Tr a i n i n g o u t p u t s and C o n d i t i o n s o f t r a n s f e r .

A contender to Kirkpatrick is the Holton model which was informed by B aldwin andFord’s ( 1988) model for transfer of training. Holton ( 1996) proposes an integrativeevaluation model that, he claims, accounts for the impact of primary and secondaryintervening variables ( p.7) . Like B aldwin and Ford, he notes that factors ex ternal totraining design will affect the impact of training. Holton’s model proposes three primaryoutcome measures: l e a r n i n g , i n d i v i d u a l p e r f o r m a n c e a n d o r g a n i s a t i o n a l r e s u l t s , whichshow similarities with Kirkpatrick’s levels two, three and four. However, unlikeKirkpatrick, Holton recognises the potential influences of trainee reactions, motivation tolearn and ability on l e a r n i n g o u t c o m e s , and motivation to transfer, transfer conditions, orenvironment, and ability on p e r f o r m a n c e o u t c o m e s .

Whilst not referring to Holton’s ( 1996) or Baldwin & Ford’s ( 1988) models, D’Netto et al( 20 0 8) cover similar territory with their theoretical model. They divide managementdevelopment into a three-stage process which comprises needs assessment, thedevelopment programme, and evaluation. B y taking the organisation’s perspective,whereby the organisation is often not in a position to have control over the delivery of aprogramme itself, they focus purely on what happens before and after the programme.In their model they include a n t e c e d e n t c o m p o n e n t s and p o s t - p r o g r a m m e c o m p o n e n t s( p.4) . Antecedent components are: organisational learning culture, individual initiative,top management support and the programme’s link to corporate strategy, whilst post-programme components include evaluation, line manager support and opportunities touse the skills learned ( p.4) .

Critics of evaluation argue that it is frequently based on a ‘deficit model’ of trainingwhereby a programme is ‘administered’ to a population and an assessment made ofwhether or not, and by how much, the intervention has worked. Collins and Denyer( 20 0 8) argue that such approaches cannot answer crucial ‘why’ questions which wouldhelp to ex plain the intricacies and contex tual factors that lead to variation in the successof a given training programme. Arguably, models such as Holton’s ( 1996) and D’Nettoet al’s ( 20 0 8) incorporate an ex planatory element in terms of specifying potentialconfounding variables beyond the design and delivery of a programme itself, and thereare also several so-called systems models of evaluation that have been developed whichemphasis the importance of contex t. These include the CIRO ( Contex t, Input, Reaction,Outcome) , CIPP ( Contex t, Input, Process, Product) and CIM O ( Contex t, Intervention,M echanism, Outcome) approaches ( Warr et al., 1970 ; Stufflebeam, 1989; and Denyer etal., 20 0 8, respectively) .

1.5 Evaluation designs

A number of models have been presented, which provide overarching, and perhapsidealised, theoretical frameworks for understanding evaluation. Research design dealswith the practical aspects of how an evaluation is conducted, and alongside magnitudeand criteria collected, is one of the core components of an evaluation strategy .M agnitude is a trade-off decision between the cost and benefit of larger evaluations oversmaller ones in terms of reliability of findings. Training criteria, in Tannenbaum andWoods’ ( 1992) categorisation, range from reactions through to attitude change, learning,behaviour and tangible results, reflecting Kirkpatrick ( 1959a; 1959b; 1960 a; 1960 b) andother influences. Research design sits on a spectrum from simple through to complex ,with simplistic designs being easy to implement, yet yielding data of limited value, and



7

complex designs having the potential to deliver insightful findings, but being met withthe challenges of participant resistance and difficulty of implementation.

A hierarchical classification of research design places pre-experimental, quasi-experimental and experimental designs (Tannenbaum and Woods, 1992) in increasingorder of sophistication. Pre-experimental designs include case study work or post-training comparison of two groups (a control and the experimental group).

Experimental designs, in contrast, require random assignment of individuals into trainingor control group and, often, the collection of pre- and post-training data, a designreferred to as Pretest-Posttest with control (PPWC) or, if only retrospective data iscollected, Posttest only with control (POWC) (Collins and Holton, 2004). In practice, trueexperimental designs have significant problems of implementation and run the risk of‘the tail wagging the dog’ whereby the stringent requirements of evaluation takeprecedence over, for example, the needs-based selection of participants and individualswho could potentially benefit from a programme are excluded because they are allocatedto the control group (Kearns, 2005)(Preskill and Torres, 1999). The question of purpose,“what is the evaluation for?” needs to be borne in mind at all times, and in the majorityof cases, pragmatism and business needs will take priority over academic purity.

Quasi-experimental designs provide rigour over and above that enabled by pre-experimental designs, but do not go so far as random assignment. They may involvecollecting data from participants who are yet to attend a programme and comparing thiswith participant data, and/or collecting data at multiple time-points before, during andafter a programme, a design referred to as Single Group Pretest-Postest (SGPP) (Collinsand Holton, 2004). SGPP designs often receive academic condescension for their weakcontrols and threats to internal and external validity, but in reality, are often used toevaluate training programmes (Carlson and Schmidt, 1999).

1.6 Methods of evaluation data collection and analysis

Closely related to design are the methods of data collection and analysis that enable adesign to be realised. Methods of data collection will also be significantly influenced bythe chosen model of evaluation and its purpose.

Surveys or so-called ‘happy sheets’ are the most widely used method of collecting dataon in-programme, or immediate post-programme participant feedback (Tannenbaum andWoods, 1992) and capture Kirkpatrick’s level one: Reactions. This is not fully reflectedin the academic literature, which privileges more sophisticated studies of evaluation fortheir potential to contribute to theory development (Alliger and Janak, 1989; Arthur Jr etal., 2003). Questionnaires, semi-structured interviews, with participants and other keystakeholders, focus groups (Hannum, 2004) and a wide range of bespoke and pre-existing measurement instruments (see Collins, 2002) are frequently used methodswhilst observation (Collins and Denyer, 2008; Collins and Holton, 2004; Hayes, 2007),360 degree feedback (Rosti and Shipper, 1998), journal analysis, site visits (Russon andReinelt, 2004), ethnography (Tanton and Fox, 1987) and repertory grid are amongstother methods that are deployed.

For outcomes at the system level, various objective measures may be used, such asemployee retention, turnover and business results, alongside other measures collectedas part of standard HR process, for example, employee satisfaction surveys andbalanced scorecards (Sirianni and Frey, 2001).

Triangulation of data sources is not uncommon, and is likely to yield more robust resultsby providing confirmatory/conflicting evidence about effectiveness.



8

Section 2:What the latest literature says about evaluation

Following a wide scan of relevant literature, as outlined above, I conducted a focusedsystematic review of the post-20 0 0 literature on evaluation in management andleadership development. Several meta-analyses conducted prior to 20 0 0 hadaggregated and analysed the evaluation literature, but I found no such study publishedsince 20 0 0 . I was intrigued to find out what the key questions being addressed bycontemporary researchers were in order to position my own study in a niche worthy ofex ploration.

Here I will simply provide an overview of the themes emerging from the systematicreview, rather than full details of the review methodology and findings. The purpose ofpresenting this summary is to set the scene for the empirical work which follows. 31studies were included in the review, following a rigorous selection process.

2.1 Health sector bias and private sectorunderrepresentation

A review of the post-20 0 0 literature on the evaluation of management and leadershipdevelopment programmes shows dominance across published peer-reviewed papers ofthe public sector and, specifically, the health sector. This sector skew is thought to be aresult of the disposition in the health sector toward evidence-based decision-making, arequirement to demonstrate value for public investment and for evidence to be madeavailable in the public domain, and perhaps also the orientation of researchers towardacademic outputs. The private sector is dramatically under-represented and this isthought to be due to the lack of incentive to publish evaluation studies whose intent isbusiness-focused. Further to this, commercial sensitivity may be a good reason not topublish evaluation findings, and when evaluation is done to satisfy a business need,drivers such as pragmatism and resource minimisation, perhaps coupled with shortage ofresearch skills, may mean that the methodology and results fall short of the standardrequired for academic publication.

2.2 Popularity of customised, single-cases

There was a dominance of customised programmes over open programmes in thestudies under review. M any of the studies were conducted with a single programmeunder scrutiny and the focus of evaluation was on the ex tent to which that programmewas effective, and in what ways, relating it back to the theoretical underpinnings of itsdesign and delivery. This narrow focus is clearly beneficial if the goal is to understand indetail what works and why in a particular situation, but it does seem that there is anopportunity being missed for comparative evaluations between programmes.

2.3 Lack of attention paid to stakeholder dynamics

For any evaluation there are a number of potential stakeholder groups and, as M ichalskiand Cousins ( 20 0 1) pointed out, whilst different groups might value a similar set oftraining outcomes, the relative importance placed on these outcomes as criteria forevaluation varies. In the studies under review, a range of stakeholder dynamics wasevident, but generally little, if any, attention paid to the relationship between the authorsof a study, the commissioners, the evaluators and the programme designers anddeliverers. In the minority of cases, all these relationships were made ex plicit, withex amples of programme deliverers taking responsibility for the evaluation and thewriting up of a stud y in close liaison with the programme’s commissioners, or acad emics



9

working in partnership with commissioners to research a programme’s impact. Thestakeholder group which seems, by and large, to have been ex cluded from the designprocess is that of programme participants. The ex ceptions are the few studies that tookan action inquiry stance, purposefully embracing participant reflection as a meaningfuloutcome of evaluation.

2.4 Limitations of evaluation models for decision-making

An additional reason to pay more attention to stakeholders comes from Holton andNaquin’s ( 20 0 5) critical analysis of evaluation models. They argue that one of thereasons that evaluation models are not widely used is that most are derived from arational-economic framework and that rational-economic models do not work in practicefor making decisions. They set bounded rationality models in contrast, which have as “afoundational assumption that decision makers have neither the time not resources toconduct complex ROI-type evaluations” and “individuals use limited pieces of informationto find a satisfactory resolution rather than an optimal decision” (p.265). Also favouredby Holton and Naquin are naturalistic evaluation models, involving collaborative,participatory and learning-oriented approaches, which, through the close involvement oforganisational stakeholders, are most likely to result in a decision-making process whichis natural for the organisation and leads to change. Evaluations which most closelyresemble the idealised type for decision-making as described by Holton and Naquin( 20 0 5) are those which involved multiple stakeholder groups in their design and deliverythrough various action inquiry, participative inquiry or action research approaches, ofwhich there were several in this review.

2.5 Limited reference to theoretical models in empiricalstudies

The background literature synopsis that preceded this systematic review identified arange of models of evaluation. Perhaps unsurprisingly given Holton and Naquin’s (2005)critical analysis of theoretical models of evaluation, virtually no reference was made inthe review articles to any of these models, or others. Only Kirkpatrick, Pawson andTilley’s ( 1994) realist stance and Harper and Beacham’s ( 1991) impact evaluation wereex plicitly discussed, and only in a handful of cases. Rather than aligning their designwith a particular m o d e l of evaluation, other studies focused on a specific researchapproach, such as appreciative inquiry, intrinsic and instrumental case study or actionlearning, using the argument that it could capture the complex ity of a situation. Thus itseems that whilst theoretical models, which attempt to disentangle complex ity into thesuite of variables which impact upon the success of a programme, have largely beenignored, certain methodologies applied in practice cover more or less the same ground.

2.6 Prevalence of Single Group Post-test Only design

Evaluation designs involving a control or comparison group were in the small minority,and the most popular design choice by far was Single Group Posttest Only ( SGPO) .Whilst Single Group Pretest Posttest ( SGPP) designs – and by implication, to an evengreater ex tent, SGPO designs - have been critiqued in the past for methodologicalweaknesses, this study confirms earlier research that showed that, in practice, suchdesigns are often used ( Carlson and Schmidt, 1999; Sackett and M ullen, 1993) . It issuggested that a couple of reasons underpin this reality: Firstly, control/ comparisondesigns are derived from a positivistic ex perimental mindset, whereas single groupdesigns focusing on the study group only are more closely aligned with realist philosophyand ex planatory intent. It has already been noted that most of the studies in this reviewwere concerned with a particular programme, thus it should come as no surprise thatSGPP and SGPO designs were in the majority. Secondly, there are substantial practical



10

difficulties of manipulating a management programme into an ex perimental situation:ethical considerations of involving individuals who are not intended to benefit from aprogramme, and the nonsense of assigning managers at random to control orex perimental groups without taking individual and organisational needs into account( Kearns, 20 0 5) .

2.7 Assessment of multiple outcome levels

In terms of what was evaluated, programmes in the review tended to be evaluated attwo or three outcome levels ( of a possible max imum of six ) where levels are categorisedas relating to knowledge, behaviour/ ex pertise and systemic results/ performance and areeither subjectively or objectively assessed. Amongst the handful of studies evaluating ata single level, none relied solely on knowledge outcomes. This implies two things:firstly, knowledge acquired is not a sufficient outcome of a management or leadershipdevelopment ex perience and, secondly, knowledge acquisition is not a reliable prox y forprogramme effectiveness, demonstrating an appreciation of the derailing effects thatstand between learning and change at behavioural or systemic levels. Collins and Holton( 20 0 4) found that behavioural outcomes were found in the majority of studies. Theirfindings are supported by this review, but are in conflict with industry studies conductedin the 1960 s and 70 s ( Catanello and Kirkpatrick, 1968; Kirkpatrick, 1978) which showeda clear prevalence of knowledge evaluation only.

2.8 Subjective outcome assessments more common thanobjective ones

An interesting pattern emerged with subjective outcome measures being twice asprevalent as objective measures. It is probably fair to say that in situations whereprogramme objectives are fuzzy, and there is only a vague notion of the correlationbetween training and performance, it is easier to opt for subjective accounts ofimprovement or change than objective ones which require respondents to be preciseabout the ex tent to which specific attitudes or behaviours have changed. Thus thedemands on programme commissioners and designers are less onerous if subjectivemeasures are to be used, and perhaps evaluators are doing them a disservice in the longrun by not challenging them up front to articulate the specific objectives of a programmeand how its design is intended to deliver these. On a practical point, often the subjectiveassessments are made by participants, an easier group to access due to ex istingrelationships and contact points, than a wider group of stakeholders. That is not to saythat subjective measures are necessarily easier to implement and analyse than objectiveones. In fact, they are often more time-consuming to interpret and synthesise data frombecause they tend to be more open and wide-ranging. They can also provide morevaluable and relevant data.

2.9 ‘Horses for courses’ approach: no single right way toevaluate

In sum, the systematic review identified that a multiplicity of evaluation approaches,designs and methods have been used in recent literature on the effectiveness ofmanagement and leadership development programmes. In each case, the resultantevaluation is a consequence of its purpose, who the stakeholders driving it are and whatresources are available, which lead into choices about design, what outcomes aremeasured and what methods are used to collect and analyse data. Whilst there arecertainly some studies that have been conducted at a higher standard than others, thereis no single ‘right’ way to evaluate. The ‘horses for courses’ situation very much reflectsthat evaluation of management and leadership development is complex and contex t-dependent.



11

Section 3: Empirical work

A number of interesting avenues for further researched emerged from the systematicliterature review. The one that sparked my interest most was to ex plore the role ofevaluation in how decisions are made about investment in management and leadershipdevelopment, since this seems to be an under-researched area. There is a lot ofresearch about different types of evaluation and about programme effectiveness, but notmuch about how evaluation is used in practice to inform the way an organisation thinksabout commissioning and running management programmes.

Conducting empirical research in this area addresses one of Holton & Naquin’s ( 20 0 5)suggestions for future research, “Researchers need to conduct naturalistic decisionresearch to understand how HRD decisions are really made in organisations” (p.277). Italso aims to test out whether Holton and Naquin are right about the limitations ofex isting evaluation models for decision-making, and whether, perhaps, ‘evaluation’needs to be re-conceptualised as a decision-making process.

3.1 M ethodology

I adopted a case study approach for data collection. The unit of analysis for each casewas a single management or leadership development programme, ten in total. My twomethods of data collection were interviews and document analysis.

A case study is deemed a suitable research methodology in situations ‘where a how orwhy question is being asked about a contemporary set of events over which theinvestigator has little or no control’ (Yin, 2009, p.13). This fitted well with the researchquestion and context for this study. Whilst Yin (2009) is critical of the inadequacy ofSchramm’s (1971) definition of a case study as trying to ‘illuminate a decision or set ofdecisions: why they were taken, how they were implemented and with what result’, thisdefinition captured rather well my intention for this work.

Yin identifies three types of case: Explanatory, descriptive and exploratory, each ofwhich can include a range of data collection methods. My purpose is aligned mostclosely with the explanatory case in that I am trying to explain how evaluation is used inorganisational decision-making. There was also a substantial descriptive, in terms ofcapturing basic information about a programme and the organisational context, but thisserved more to provide context than being the main thrust of the research.

3.1.1 Selecting cases

I selected cases using Ashridge’s 2009/10 private sector tailored client list. I wanted toensure that the cases were based on current or recent programmes to maximise therecall accuracy and level of detail provided by interviewees. Of the clients on the 2009list, I looked for companies that had incurred a minimum spend of £50,000. Iapproached the Ashridge client directors of these companies and, through a process ofelimination, either by the client director not being in a position to allow access, or theclient themselves turning down the opportunity to be involved, I reached a total of 10client organisations. This included three companies that had not been on my initial list,but were proposed by a client director in conversation, one programme that was run byCranfield rather than Ashridge (although the client in question also runs programmeswith Ashridge), and one programme that had lapsed since 2008 but will be run again in2010.

The client organisations included in the data collection process were: British MedicalJournal (BMJ) Group, Barclaycard, Catlin Holdings, Danfoss, Japan Tabacco International



12

( J TI) , Kuwait Petroleum International ( KPI) , M ott M acdonald, Novelis, OM V and TetraPak.It is likely that I will ex clude the B M J Group and KPI from my analysis ( see 3.1.4 below) .

3.1.2 Interviews

I conducted semi-structured or non-standard interviews ( Healey and Rawlinson, 20 0 4)interviews of approx imately 45 minutes with two or three key stakeholders per case asindicated in Figure 2 below. I started with the key client with whom Ashridge hascontact, and during this interview asked for a suggestion of another key stakeholder tointerview. Nominations were put forward for individuals such as a senior businesssponsor of the programme, an HR colleague or a senior HR figure if this was someoneother than the key Ashridge client. I followed up all suggestions, with success in mostcases. I complemented client interviews with an interview with the Ashridge clientdirector for each case study programme.

In an ideal world I would have liked to have interviewed 4-6 individuals per case study,and developed fewer cases studies, to get a rich multi-stakeholder perspective of eachprogramme. However, I quickly realised that this level of access was not feasible. Thiswas particularly so because the case study programmes were generally targeted towardssenior management. M any of the key stakeholders were senior ex ecutives and boardmembers, to whom access is very difficult to negotiate.

Figure 2. Interviewee roles for each case study. Typically three interviews were conducted, and some of theroles above held by the same person.

Interviews were conducted either face-to-face, when feasible, or by phone. The semi-structured nature of the interview allowed for particularly interesting strands of inquiryto be pursued, whilst providing a template to ensure that at least a framework of similarinformation was collected for each cases.

The interviews covered the full range from the initial conception and commissioning of aprogramme to its design, development, implementation, and subsequent evaluation.Particular emphasis was placed on evaluation, both in terms of formal approaches andinformal approaches. B oth factual and subjective questions were posted. Factualquestions included those about the programme design, duration, target population andwhat evaluation processes were in place. Questions that drew out more subjective

Programme

AshridgeProgramme

Director

ProgrammeCommissioner

ProgrammeManager

Businesssponsor

Senior HR



13

responses included those asking about whether or not the programme was a success,what and who influenced programme decisions and what the likely factors would be indeciding to stop or postpone a programme.

All interviews were recorded and transcribed verbatim.

3.1.3 Document analysis

Documentary secondary data (Saunders et al., 2007) includes written data in the form ofmeeting notes, minutes, reports and can also include non-written materials, such asrecordings of meetings {{346 Saunders,M. 2007}}. Only written documents, and only afew key documents at that were used in this work.

The purpose of analysing written documents was to provide background factual detailsabout a programme in preparation for interviews so as to appear professional with theinterviewee, and also to provide confirmatory evidence of data provided by interviewees.

Documents reviewed include:

Programme aims and objectivesProgramme timetable and designEvaluation reports, including description of evaluation design, methodologyand findings.

3.1.4 Data analysis

23 interviews were conducted. All have been transcribed verbatim and imported intoNVivo 8 qualitative analysis software. It is likely that three of these interviews will beexcluded from the final analysis. This is where either I was unable to get more than asingle stakeholder perspective on a programme (one case - KPI) or because the clientorganisation turned out to be a very different profile, i.e. much smaller, than all theother case study organisations (one case – BMJ Group).

First, I free-coded three interview transcripts, taken from three different case studyorganisations. I used the free codes to build a coding tree. I am currently in theprocess of working through and coding each interview transcript according to the tree,adapting and building on the initial framework as I go. At the time of writing, I havecoded eight of the 20 interviews I intend to include in my final analysis. I have reacheda point where I need to slightly revise my coding tree structure and will then continuecoding the remaining 12 interviews.

My intention, once all coding is complete, is to cross-reference and query the data in avariety of ways. This is likely to include: By case, in order to build a picture of each casestudy; by stakeholder type, to see if client directors, programme managers and seniorbusiness/HR stakeholders have a different attitude toward evaluation; by decision type,to look at who is involved in which kinds of decisions about a programme and what sortsof information they use to make decisions; and by evaluation type, to see if certainapproaches to evaluation appear to be more influential when it comes to decision-making, and how formal and informal evaluation co-exist.

I intend to complement the interview data with evidence about each programme and anyevaluation thereof, particularly in providing a short description of each case studyprogramme.



14

3.2 Findings

At the time of writing it is too early to discuss any findings from my empirical datacollection. However, by the time of the InCULT conference I hope to have completedcoding, and progressed my analysis to a point where I can draw some tentativeconclusions about the role of evaluation in how decisions are made about managementand leadership development programmes.

I will be interested in the aud ience’s reaction to my work, and I look forward to someconstructive criticism and suggestions for further work.



15

References

Alliger, G. M. and Janak, E. A. (1989), "Kirkpatrick's Levels Of Training Criteria: Thirty Years Later", PersonnelPsychology, vol. 42, no. 2, pp. 331-343.

Alliger, G., Tannenbaum, S.,I., Bennett, J. W., Traver, H. and Shotland, A. (1997), "A meta-analysis of therelations among training criteria", Personnel Psychology, vol. 50, no. 2, pp. 341-358.

Arthur Jr, W., Bennett Jr., W., Edens, P.,S. and Bell, S.,T. (2003), "Effectiveness of training in organizations: Ameta-analysis of design and evaluation features", Journal of Applied Psychology, vol. 88, no. 2, pp. 234-245.

Atkinson, R. (1972), "Ingredients for a theory of instruction", American Psychologist, vol. 27, pp. 927-931.

Baldwin, T. T. and Ford, J. K. (1988), "Transfer of Training: A Review and Directions for Future Research",Personnel Psychology, vol. 41, no. 1, pp. 63-106.

Beach, L. R. and Connolly, T. (2005), The Psychology of Decision Making, Second edition ed, Sage, ThousandOaks, California.

Bramley, P. (1999), "Evaluating effective management learning", Journal of European Industrial Training, vol.23, no. 3, pp. 145-153.

Brungardt, C. (1996), "The making of leaders: a review of the research in leadership development andleadership education", The Journal of Leadership Studies, vol. 3, pp. 81-95.

Burgoyne, J. G. and Singh, R. (1977), "Evaluation of Training and Education", Journal of European IndustrialTraining, vol. 1, no. 1, pp. 17-21.

Carlson, K. D. and Schmidt, F. L. (1999), "Impact of experimental design on effect size: Findings from theresearch literature on training", Journal of Applied Psychology, vol. 84, no. 6, pp. 851-862.

Catanello, R. and Kirkpatrick, D. L. (1968), "Evaluating training programs - the state of the art", Training &Development, vol. 22, no. 5, pp. 2-9.

Collins, D. (2002), "Performance-Level Evaluation Methods Used in Management Development Studies From1986 to 2000", Human Resource Development Review, vol. 1, no. 1, pp. 91-110.

Collins, D. B. and Holton, E. F. (2004), "The Effectiveness of Managerial Leadership Development Programs: AMeta-Analysis of Studies from 1982 to 2001", Human Resource Development Quarterly, vol. 15, no. 2,pp. 217-248.

Collins, J. and Denyer, D. (2008), "Leadership, learning and development: A framework for evaluation", inTurnbull James, K. and Collins, J. (eds.) Leadership Learning: Knowledge into Action, Palgrave, London.

Denyer, D., Tranfield, D. and Ernst van Aken, J. (2008), "Developing Design Propositions through ResearchSynthesis", Organization Studies, vol. 29, pp. 393-413.

D'Netto, B., Bakas, F. and Bordia, P. (2008), "Predictors of management development effectiveness: anAustralian perspective", International Journal of Training & Development, vol. 12, no. 1, pp. 2-23.

Doh, J. P. and Stumpf, S. A. (2007), "Executive Education: A View From the Top", Academy of ManagementLearning & Education, vol. 6, no. 3, pp. 388-400.

Dovey, K. (2002), "Leadership development in a South African health service", The International Journal ofPublic Sector Management, vol. 15, no. 6/7, pp. 520-533.

Easterby-Smith, M. (1986), Evaluation of Management Education, Training and Development, Gower,Aldershot.



16

Farrington, B . ( 20 0 3) , " Action-centred learning" , Industrial and Commercial Training, vol. 35, no. 2/ 3, pp. 112-118.

Gibb, S. ( 1994) , " Evaluating mentoring" , Education & Training, vol. 36, no. 5, pp. 32-39.

Goldstein, I. ( 1986) , Training in organizations: Needs assessment, development and evaluation, B rooks/ Cole,M onterey, CA.

Hamblin, A. ( 1974) , Evaluation and control of training, M cGraw-Hill, New Y ork.

Hannum, K. ( 20 0 4) , " B est practices: Choosing the right methods for evaluation" , Leadership in Action, vol. 23,no. 6, pp. 15-20 .

Hannum, K. and M artineau, J . ( 20 0 8) , Evaluating the impact of leadership development, Pfeiffer, SanFrancisco.

Harper, J . and B eacham, T. ( 1991) , Training Performance Measurement: A guide to Some PracticalApproaches, NHS Training Authority, B ristol.

Hayes, J . ( 20 0 7) , " Evaluating a Leadership Development Program" , Organization Development Journal, vol. 25,no. 4, pp. 89-95.

Healey, M . J . and Rawlinson, M . B . ( 20 0 4) , " Interviewing techniques in business and management research" , inWass, V. J . and Wells, P. E. ( eds.) Principles and PRactice in Business and Management Research,Dartmouth, Aldershot, pp. 123-146.

Herling, R. ( 20 0 0 ) , " Operational Definitions of Ex pertise and Competence" , Advances in Developing HumanResources, vol. 2, no. 1, pp. 8-21.

Hogan, R. and Warrenfeltz, R. ( 20 0 3) , " Educating the modern manager" , Academy of Management Learning &Education, vol. 2, no. 1, pp. 74-84.

Holton, E. ( 1996) , " The flawed four-level evaluation model" , Human Resource Development Quarterly, vol. 7,no. 1, pp. 5-21.

Holton, E. and Naquin, S. ( 20 0 5) , " A critical analysis of HRD evaluation models from a decision-makingperspective" , Human Resource Development Quarterly, vol. 16, no. 2, pp. 257-280 .

Kearns, P. ( 20 0 5) , Evaluating the ROI from Learning, Chartered Institute of Personnel and Development,London.

Kirkpatrick, D. L. ( 1998) , Evaluating training programs: The four levels, B erret-Koehler, San Fransisco.

Kirkpatrick, D. L. ( 1978) , " Evaluating in-house training programs" , Training & Development, vol. 32, no. 9, pp.6-9.

Kirkpatrick, D. L. ( 1960 a) , " Techniques for evaluating training programs: Part 3 - B ehavior" , Journal of ASTD,vol. 14, no. 1, pp. 13-18.

Kirkpatrick, D. L. ( 1960 b) , " Techniques for evaluating training programs: Part 4 - Results" , Journal of ASTD,vol. 14, no. 2, pp. 28-32.

Kirkpatrick, D. L. ( 1959a) , " Techniques for evaluating training programs" , Journal of ASTD, vol. 13, no. 11, pp.3-9.

Kirkpatrick, D. L. ( 1959b) , " Techniques for evaluating training programs: Part 2 - Learning" , Journal of ASTD,vol. 13, no. 12, pp. 21-26.

Kraiger, K., Ford, J . K. and Salas, E. ( 1993) , " Application of cognitive, skill-based, and affective theories oflearning outcomes to new methods of training evaluation" , Journal of Applied Psychology, vol. 78, no. 2,pp. 311-328.



17

Lippert, R. L. ( 20 0 1) , " Whither Ex ecutive Education?" , Business & Economic Review, vol. 47, no. 3, pp. 3-9.

M cCauley, C., M ox ley, R. and Van Velsor, E. ( 1998) , The center for creative leadership handbook of leadershipdevelopment, J ossey B ass, San Francisco.

M ichalski, G. and Cousins, B . ( 20 0 1) , " M ultiple perspectives on training evaluation: probing stakeholderperceptions in a global network development firm" , American Journal of Evaluation, vol. 22, no. 1, pp.37-53.

Newstrom, J . ( 1984) , " A role-taker/ time-differentiated integration of transfer strategies" , Vol. August, 1984,Toronto, Ontario, Toronto, Ontario, .

Pawson, R. and Tilley, N. ( 1994) , " ' What works in evaluation research?' " , British Journal of Criminology, vol.34, pp. 291-30 6.

Phillips, J . ( 1995) , " Return on Investment - B eyond the four levels" , Holton, E. ( ed.) , in: Academy of HRD, .

Preskill, H. and Torres, R. ( 1999) , Evaluative Inquiry for Learning in Organizations, Sage, Thousand Oaks,California.

Reade, Q. and Thomas, D., ( 20 0 4) , Critics question value of leadership training.

Rosti, R.,T. and Shipper, F. ( 1998) , " A study of the impact of training in a management development programbased on 360 feedback" , Journal of Managerial Psychology, vol. 13, no. 1/ 2, pp. 77-89.

Russon, C. and Reinelt, C. ( 20 0 4) , " The Results of an Evaluation Scan of 55 Leadership DevelopmentPrograms" , Journal of Leadership & Organizational Studies, vol. 10 , no. 3, pp. 10 4-10 7.

Sackett, P. R. and M ullen, E. J . ( 1993) , " B eyond formal ex perimental design: Towards an ex panded view" ,Personnel Psychology, vol. 46, no. 3, pp. 613-628.

Saunders, M ., Lewis, P. and Thornhill, A. ( 20 0 7) , Research Methods for Business Students: Fourth Edition,Fourth edition ed, FT Prenctice Hall, Harlow.

Schramm, W. ( 1971) , " Notes on case studies of instructional media projects" , December, Washington,Washington, .

Sirianni, P. M . and Frey, B . A. ( 20 0 1) , " Changing a Culture: Evaluation of a Leadership Development Programat M ellon Financial Services" , International Journal of Training & Development, vol. 5, no. 4, pp. 290 -30 1.

Stufflebeam, D. ( 1989) , " ' The CIPP model for program evaluation' " , in Striven, M . and Stufflebeam, D. ( eds.)Evaluation models, Kluwer-Nijhoff, B oston.

Suutari, V. and Viitala, R. ( 20 0 8) , " M anagement development of senior ex ecutives" , Personnel Review, vol. 37,no. 4, pp. 375-392.

Talbot, C. ( 1992) , " Evaluation and Validation: A M ix ed Approach" , Journal of European Industrial Training, vol.16, no. 5, pp. 26-32.

Tannenbaum, S. I. and Woods, S. B . ( 1992) , " Determining a Strategy for Evaluating Training: Operating WithinOrganizational Constraints" , HR. Human Resource Planning, vol. 15, no. 2, pp. 63-81.

Tanton, M . and Fox , S. ( 1987) , " The Evaluation of M anagement Education and Development: ParticipantSatisfaction and Ethnographic M ethodology" , Personnel Review, vol. 16, no. 4, pp. 33-40 .

Tushman, M .,L., O' Reilly, C.,A., Fenollosa, A., Kleinbaum, A.,M . and M cGrath, D. ( 20 0 7) , " Relevance and Rigor:Ex ecutive Education as a Lever in Shaping Practice and Research" , Academy of Management Learning &Education, vol. 6, no. 3, pp. 345-362.

Verlander, E. G. ( 1992) , " Ex ecutive Education for M anaging Complex Organizational Learning" , HumanResource Planning, vol. 15, no. 2, pp. 1-18.



18

Vicere, A. A. ( 1996) , " Ex ecutive Education: The Leading Edge" , O r g a n i z a t i o n a l D y n a m i c s , vol. 25, no. 2, pp.67-81.

Warr, P., B ird, M . and Rackham ( 1970 ) , E v a l u a t i o n o f M a n a g e m e n t Tr a i n i n g , Gower, Aldershot.

Y in, R. K. ( 20 0 9) , C a s e s t u d y r e s e a r c h : F o u r t h e d i t i o n , Fourth edition ed, Sage publications, Thousand Oaks,CA.