using regression in customer experience...

6
Using Regression in Customer Experience Analysis 1 Using Regression in Customer Experience Analysis We draw upon our experience to examine the following key topics within the context of customer experience analysis How regression analysis works; What confidence intervals are; How regression analysis should be interpreted; How to interpret changes over time. White Paper written by John Bee, Director, White Space Insight June 2011

Upload: dominh

Post on 19-Apr-2018

227 views

Category:

Documents


4 download

TRANSCRIPT

Using Regression in Customer Experience

Analysis1

Using Regression in Customer Experience

AnalysisWe draw upon our experience to examine the following key topics within the context of customer experience analysis

• Howregressionanalysisworks;• Whatconfidenceintervalsare;• Howregressionanalysisshouldbeinterpreted;• Howtointerpretchangesovertime.

WhitePaperwrittenbyJohnBee,Director,WhiteSpaceInsightJune2011

Using Regression in Customer Experience

Analysis2

RegressionanalysisisusedasstandardbyWhiteSpacein most customer experience and consumer strategy projects.Whenusedproperly,itcanhelppredictfuturetrendsoridentifykeydriversofcustomerbehaviour,butif used incorrectly it can lead to dangerously inaccurate findingsandrecommendations.

This white paper explores some of the key themes and challenges consultants and their clients need to address whenusingandinterpretingregressionanalysis.

Inparticular,wedrawuponourexperiencetoexaminethe following key topics:

• How regression analysis works;

• What confidence intervals are;

• How regression analysis should be interpreted;

• How to interpret changes over time.

1. How Does Regression Work?

The main aim of using regression analysis in customer experience work is to estimate which factors affect overallsatisfactionorspend.Thiswillallowacompanytofocustheirresourcesonthekeydriversofsatisfaction/spend,andnotwaste these resourcesonareaswhichmightseemimportantbutinfactarenot.

Regression works by calculating a line of best fitbetween survey respondents’ answers to one ormore‘input’ questions and a single ‘output’ question. Byusingthislineofbestfit,youcanthenpredictwhatanygiven respondent’s response to the ‘output’ questionwill be, based on the answers they give to the ‘input’question(s).

Aswithmoststatistics(includingthemostbasicmeasuressuchasaverages),regressionisonlyeverabletoprovidean estimation of the real relationships which exist within apopulation.Itisthereforesubjecttoadegreeoferror,with results subject to revisionasmoredatabecomesavailable.

Introduction

Regression’s accuracy depends on several factors,including:

The sample size: •

Larger sample size = more accurate regression •

The sample structure: •

Tighter sample structure = more accurate regression •

The level of natural spread in the data• (i.e.howcloselythedatapointsarescatteredaroundthelineofbestfit):

Less spread = more accurate regression•

Using Regression in Customer Experience

Analysis3

Regression Worked Example

Forexample,the‘output’measuremightbe‘Overallsatisfaction’,andthe‘input’factorsmightbe‘Satisfactionwithproduct’and‘Satisfactionwithservice’.Takingtheexampleofjustone‘input’factor,‘Satisfactionwithproduct’,wecancomparehowcustomersratesatisfactionwithproductversussatisfactionoverallbyplottingallsurveyresponses:

Ascanbeseeninthediagramabove,thereisapossiblepatterninthedata,withahigherscorein‘Satisfactionwithproduct’coincidingwithahigherscorein‘Overallsatisfaction’.UsingRegressionwecanfitalineofbestfittothisdatatohelpuspredicthowanincreasein‘Satisfactionwithproduct’mightaffect‘Overallsatisfaction’.

Thegradient(slope)ofthislinetellsushowmuchofanincreasein‘Overallsatisfaction’thereis,onaverage,if‘Satisfactionwithproduct’goesupbyoneunit.Thisvalueiscalledtheregressioncoefficient.Inthecaseabove,thevalueis0.69,whichmeansthatforanincreasein‘Satisfactionwithproduct’of1,onaveragetherewillbeanincreasein‘Overallsatisfaction’of0.69.

Using Regression in Customer Experience

Analysis4

Testing for Significance

In addition to creating a best fit model, the regressionprocess also calculates whether this line is statistically significant.Thisisimportantbecauseyoucanplotalineofbestfitthroughanyrandomlyplottedpointsonagraph–thekeyistoknowwhethertheyhaveameaningfulpattern,orwhetherthevariablesareactuallylinked.

Significancetestsareperformedusingdifferencesbetweenthe regressionmodel’spredictedvalues (the line)and theactualvalues. If thedistancesare too large, themodelorcoefficientswillberejected.Inthemodelabove,themodelandthecoefficientwerejudgedtobestatisticallysignificantata95%levelofconfidence.Thismeansthat,19timesoutof 20, the relationship between ‘Satisfactionwith product’and ‘Overall satisfaction’would not be caused by chance(i.e.itisverylikleythatachangeinonevariableiscausingachangeintheother).

Multiple Regression

Although interesting, regression on a single factor is oflimited use in a real world situation in which multiple factors influencebehaviour.Thetruepowerofregressionbecomesapparent when the influence of more than one factor isassessed,e.g.satisfactionwithproduct,service,customerservicesetc.

Usingmultipleregressiontechniquesitispossibletomodelhowalargenumberoffactorsaffectoverallsatisfactionandtheirrelativeinfluences.Itisalsopossibletoidentifythosefactorsthatdonothaveastatisticallysignificanteffectandremovethem.

2. What are Confidence Intervals?

Withoutanextremelylargesamplesize,regressionanalysisisonlyeverable toapproximatelyestimatetowhatextentspecified ‘input’ variables affect an ‘output’ variable. Inreality,thetruescaleofthiseffectmaybebiggerorsmaller.Itisthereforeusefultoknowwhatthelikelymaximumandminimumscaleof theeffectmightbe (i.e. the largestandsmallest values the regression coefficients are likely tohave).

To do this, regression analysis can provide a confidenceinterval, expressed as amaximum value and aminimumvalue,which, in 95%of cases,will contain the true valueofthecoefficient.Thismightlookasfollows,assessingthe

impactofsatisfactionwithproductandserviceonoverallsatisfaction:

The conclusions we would draw from this chart are as follows:

•Themostlikelytruevalueofthecoefficientfor‘Product’is 0.28. Toa95% level of certainty, the true value liesbetween0.20and0.36

•Themostlikelytruevalueofthecoefficientfor‘Service’is 0.21. Toa95% level of certainty, the true value liesbetween0.12and0.30

•‘SatisfactionWithProduct’and‘SatisfactionWithService’arebothhighlylikelytobesignificantdriversofcustomersatisfaction

•Itismostlikelythat‘SatisfactionWithProduct’isamoreimportantcausethan‘SatisfactionwithService’.However,becausetheconfidenceintervalsoverlapwecannotsaythistoahighlevelofcertainty–thereisagoodchanceifthesurveywasrepeatedwithadifferentsample,thevalues(andorder)wouldbedifferent.However,19timesoutof20(95%ofthetime)theywouldliewithintheconfidenceintervalsreported

Using Regression in Customer Experience

Analysis5

What determines the width of a confidence interval, and how should confidence intervals be used?

Ifthesamplesizeissmall,theconfidenceintervalwillusuallybewide,asthereislittledataonwhichtobasecalculations.Ifthesamplesizeisbig,theconfidenceintervalwillusuallybenarrower,asthereismoredatatoworkwith,tomakeamoreaccurateprediction.

Thisdoesnotmeanthatitisnotuseful(andusual)toreportthesingleestimatedcoefficientvalues.Onthebasisofthesampledataavailable,thisisthebestestimatepossibleofwhat the true value is. This is therefore the valueWhiteSpaceusuallyreporttoclients.

However, it can sometimes be useful to also considerand report the confidence interval. This is for 2 reasons:

1)Itprovidesanindicationofthehighestandlowestvaluesofthetruevalue

2) It provides a useful way of working out whether thedifferences between two different regression coefficientsshouldbeconsideredsignificant.Thisisusefulwheneither comparingtwodifferentregressioncoefficientsinthesamesurveywave,orwhencomparingoldandnew regressionversionsofthesamecoefficientindifferentsurveywaves

Whencomparing regressioncoefficients, if theconfidenceintervals of two regression coefficients overlap, there is apossibility that the true valuesmayactually be the same.Whenconfidenceintervalsoverlap,WhiteSpacethereforegenerallyconsiderbothfactorstobeofsimilarimportance,whilstnotingthatthevaluethatishigherisstillmorelikelytobethemostimportant.

3. How Should Regression Analysis Be Interpreted?

Regression analysis, though a very useful tool, canbe difficult to interpret partly due to the complexityof the outputs, and partly due to the probabilisticreasoning uponwhich it is based. Making a regressionrelevant to a business requires understandingboth its strengths and limitations. The key points tointerpreting the results of a regression are listed below.

1.Theresultsmustmakepracticalsense

As with all analysis, the results of a regression shouldbesensecheckedagainst therealityofabusiness. Forexample, it is possible for a regression to showa factoras significant, because it is very important for a smallproportionof thecustomerbase,butdoesnotaffect therest of the customer base. In this case, it makes littlebusinesssensetomakesuchafactorabusinesspriorityeven if it is themost important factor in the regression.Alternatively,regressionmayindicatethatacertainfactoris very important, but the client may have no practicalabilitytochangeinthisarea.Thiswouldreducetheoverallsignificanceoftheresult,andtheextenttowhichwewoulddrawuponitindecisionmaking.

2.Confidenceintervalsshouldbeconsidered,aswellasactualcoefficients

The actual coefficient provides a very useful precisenumber,whichcanbeusedasabasisformodellingandprediction. However, it should be considered as part ofitsconfidence interval (even if,ultimately,only theactualcoefficient is reported). This allows us to assess themaximum andminimum amount a given ‘input’ factor islikelytohaveonthe‘output’measure,andprovidesasolidbasistocomparemovementofcoefficientsovertime.

Forthisreason,itisoftendifficulttoidentifyasinglemostimportantpriority,asoverlapofconfidenceintervalsmeansthattwocouldbeinterpretedasbeingofsimilarimportance(eveniftheactualcoefficientvaluesareclearlydifferent).

3.Whencomparingregressionanalysesovertime,whatremains constant is as important as what changes

When looking at a series of regression analyses of the samecustomerbase,itiskeyfromabusinessperspective

Using Regression in Customer Experience

Analysis6

toidentifywhatremainsconstant,togainincreasedcertaintythatthisis(orisnot)trulyakeydriverofsatisfaction.Repeatedindependent surveys with regression analysis are a goodwaytovalidatechoiceofbusinessprioritiesfromapreviouswave.Therewilloftenbechangesinaregressionfromonewavetothenext,butoftenthemostsignificantfactorsdonotchange. If there isachange in themostsignificant factors(beyondtheboundariesof theconfidenceintervals) furtheranalysis should be performed to try to identify the reasonwhy.

Changeswillnaturallyoccurfromonewavetothenextduetoavarietyoffactors.Someofthesemaybe‘natural’andnotindicateachangeincustomerneeds(suchasvariationin sample), some may be due to temporary changes incustomerneeds(e.g.seasonality)andsomemaybeduetolongertermchangesincustomerneeds(e.g.unemploymentincreasingpricesensitivity).However,thekeyisinidentifyingsignificantchanges,outsideoftheconfidenceintervals,thathaverealbusinessrelevance.

4. How Should Changes Over Time Be Interpreted?

Because of the complex nature of regression analysis,requiringstraightlinestobefittedtooftenhighlyspreaddatapoints,theamountofnaturalvariationincoefficientscanbequitelargeovertime(muchmoresothanformeans,wherethemathsinvolvedismuchsimpler).

However, coefficientsareunlikely to change tobe smalleror larger than theboundariesof their confidence intervals,unlesstheunderlyingcausesofsatisfactionhavechanged.The only other potential cause of thiswould be a changeinsamplingmethodand/orsamplestructure(whichwouldnormallyleadtoachangeincoefficientsbecausethenatureofthesurveydatawouldbefundamentallydifferent).

WhiteSpace therefore expect to see a level of change incoefficientsovertimenaturally,duetothenatureofsamplingand thecomplexityof theanalysis involved. However,weusually only pay close attention to these changes when they arebeyondtheboundariesoftheconfidenceintervals.

5. Conclusion: A Best Practice Approach

This white paper has demonstrated that regression analysis isbothextremelypowerfulbutalsocomplex tosetupandinterpretaccurately.Thisis incontrasttohoweasyit istorun the actual test procedure on most software packages

suchasSPSS.Thiscreatesaseriousriskofitbeingruninaccuratelybypeoplewhocanrunthetestprocedurebutdon’tknowhowtosetitupproperly.Inturn,thiscanleadto false conclusions which seem highly compelling at a surfacelevel,butareactuallyhighlydangerous.

In order to avoid this pitfall, consultantsand clients should ask the following 5 keyquestions when using or interpreting regression:

1) Whatprocesshasbeenusedtoruntheregression, fromdatapreparation todevelopmentof thefinal model?

2) Whataretheconfidenceintervalsforthecoefficientsin thefinalmodel?

3) Istheoriginalsamplesizeandstructurestatistically valid?

4) Does the person running the regression analysis understandthestatisticalprinciplesinvolved?

5) Arethefinaloutputsclearlypresentedand explained so that senior decision makers with no backgroundinstatisticscanunderstand/use them?

In summary, it is critically important toensure regressionanalysis is set up, run and interpreted correctly, basedaround these key questions. When this is done, thefull potential of regression to drive improvement ofthe customer experience can be safely unlocked.

For further information on the themes contained within this whitepaper,pleasecontact

JohnBee,Director [email protected]

About White Space

White Space is one of the UK’s leading providers ofresearchconsultancyservices,bridging thegapbetweentraditionalmarketresearchandmanagementconsultancy. We use cutting edge research methods and consultancy tools to help blue chip clients find answers to strategicquestions.Ourclientsincludemanyoftheworld’sbiggestcompanies,globalprivateequityhouses, toptierstrategyconsultancies and smaller organisations ambitious tosucceed.Over85%ofthemrepeatpurchase–tofindoutwhy,gotowww.whitespaceinsight.com