1
SplunkITSISandboxGuidebookSTARTHERE....................................................................................................................................................................................................................................21-FlyOvertheProduct..............................................................................................................................................................................................................32-PreparefortheJourney:CoreConcepts.......................................................................................................................................................................43-TourtheGlassTables.........................................................................................................................................................................................................134-TroubleshootwithGlassTablesandDeepDives...................................................................................................................................................165-TourtheDeepDive.............................................................................................................................................................................................................246-TourtheNotableEventViewer.....................................................................................................................................................................................267-TourtheServiceAnalyzer................................................................................................................................................................................................27
DocumentRevisionHistoryDate Notes2015Nov29 Initialversion..dmillis2015Dec03 Filledoutfirst4chapters..dmillis2015Dec07 Completed"tour"chapters..jlebaugh,dmillis2016Mar15 UpdatesforITSI2.1
2
STARTHEREWelcometotheITSISandboxPlaybook!Itisintendedasatravelguidetohelpyouexplorethefeatures,capabilitiesandpossibilitiesofITServiceIntelligence,usingyournewSplunkITSIOnlineSandbox.IfyoudonotalreadyhaveanITSISandbox,gototheITSIHomepage(http://www.splunk.com/en_us/products/premium-solutions/it-service-intelligence.html)andclickthegreen"FreeOnlineSandbox"button.Itonlytakesafewminutes!Theplaybookcontainsaseriesofchapters,orexercises,tofacilitatetheexplorationofITSIandillustratehowitcouldbeusefulinactual"realworld"environments.ThestudentshouldalreadyhaveabasicunderstandingofcoreSplunk,especiallyhowtocreatesearchesandreports.Thisplaybookshouldnotbeconsidered"realtraining";pleaseseeSplunkEducation(http://www.splunk.com/view/education/SP-CAAAAH9)forin-depthcoursesonITSIandothertopics."Fly-Over"and"Tour"chaptersshowfeaturesandcapabilities,inlessdetailandmoredetail,respectively."Divein"chaptersgointothemostdetailabouthowtosetupandconfigure.Otherchapterscoverhowtocreatenewcomponents,howtouseITSItotroubleshootproblemsquickly,andhowtomock-upvisualizationsforyourownhigh-valueservices.AlthoughtheITSISandboxisnotsetuptoallowoutsidemachine-datatobebroughtin,itdoescontainaneventgeneratortosimulatetheeventswhichmightbeseeninatypicalITenvironment,includingfailurescenarios.Italsocontainsanumberofpre-builtKPIs,services,GlassTablesandothergoodiestomakethejourneymoreinteresting.Generally,thechaptersarelaidoutwiththemorebasicconceptsandexercisesfirst,andmoreadvancedtopicslater.Studentscanskipchaptersandjumparoundastheycareto;eachchapterliststherecommendedpre-requisitechapters.Ultimately,thepurposeofthisplaybookisallowstudentstoworkwithandunderstandthefullcapabilitiesofITServiceIntelligence,andexplorehowITSIcouldhelpsolveactual,useful,high-valuechallengesintheirownITenvironments.
3
1-FlyOvertheProductForthetravellerwhoisinahurry,whowantsthe30,000-footview,thisisthesectionforyou!Itisalsothebestplacetobegin,forthestudentwhoislargelyunfamiliarwithITServiceIntelligence.
Instructions1. AfterloggingintoSplunk,clickon"ProductTour"
2. Clickthroughthe11slidestopreviewservices,entities,KPIs,thresholding,DeepDives,Multi-KPIAlerts,NotableEventsandtheServiceAnalyzer
3. Thesetopics,andmore,arecoveredinmoredetailinthefollowingchapters
4
2-PreparefortheJourney:CoreConceptsBeforewebeginthejourney,itishelpfultounderstandafewcoreconceptsofITServiceIntelligence.
ITServiceIntelligence–CoreConcepts
Service RequestsResponses
Web
TechnicalServices Services
RequestsResponses
MobileAPI/Middleware
RequestsResponses
DNS
SupportDesk RequestsResponses
CustomerTransacBons
RequestsResponses
BusinessServices
Conceptually,aServiceisa“blackbox”whichwesendrequestsandexpectresponses.
Includeslower-level(technical)andhigher-level(business)
5
PacketNetwork
HypervisorandHosts
RBMDBs
StorageTier
APIServices
WebServices InITSI,aServiceisalogicalgroupofKPIswhich
describeitshealthandstatus;theytypicallyspan
mulGpleITdomains
ITServiceIntelligence–CoreConcepts
Service RequestsResponses
Web
TechnicalServices Services
CustomerTransac9ons
Web
CustomerTransac9ons
RequestsResponses
BusinessServices
MobileAPI/
Middlew
are
SupportDesk
DNS
6
ITServiceIntelligence–CoreConcepts
Service RequestsResponses
Web
TechnicalServices
PacketNetwork
HypervisorandHosts
RBMDBs
StorageTier
APIServices
WebServices
Web
KPI:NumberofrequestsKPI:ErrorrateKPI:AverageresponseFmeKPI:ServicerCPUloadKPI:ServernetworkI/Ferrors
KPIs
KPIsincludebothmetricvaluesandhealthscoresofthevalues.A
Service’sCompositeHealthscoreisdeterminedbythehealthscoresofitsKPIsanddependentservices
HealthScore
7
ITServiceIntelligence–CoreConcepts
26
AHealthScoreisascoreform0-100(0beingcri5caland100beingnormal)thathelpsdeterminethehealthofaService.ItiscalculatedbasedonallKPIsimportanceanditsstatus(e.g.green,orange,red),onceeveryminute.
AKeyPerformanceIndicator(KPI)isaSplunksavedsearchcreatedwithintheITSIUIthathelpsmonitora
specificfieldlikeCPU,Memory,NumberofErrorsandsoon.KPIsarecontainedwithinServices.
ServiceAnalyzer–Autogeneratedfilterableand5ledviewofServicehealthscoresandKPIs
8
ITServiceIntelligence–CoreConcepts
27
AGlassTableisacustomizablefreeformdrawingdashboardstoviewHealthscoresand
KPIsofchoicewithvisualtoolstocreatecontextwithlivewidgets
GoDeepertoaDeepDiveView
9
ITServiceIntelligence–CoreConcepts
28
DeepDive–SwimlaneanalysisdashboardtoshowKPIindicators
over:meforinves:ga:ons
10
ITServiceIntelligence–CoreConcepts
29
Mul5KPIAlerts–Visualtooltocreatecorrela0onsearchesbasedonKPIs
11
NotableEvents
30
NotableEventsaregeneratedbycorrela4onsearchesthatindicateservicedegrada4on.TheyarelikeNotableEventsinESbuthaveaslightlydifferent
fieldsetTheCorrela4onsearchesaregeneratedeitherthroughthecorrela4onsearchUIorMul4KPIAlertUI.
12
ITSIrepresentsanewwayofdealingwithITServicechallenges:
• Data-drivenapproachusesALLITData-events,metrics,logs,structured,
unstructured,from-the-device,from-the-wire,etc.
• Service-awarenessprovidesactionableinsightsintohigh-visibilityservices
• Customizedcontextualvisualizationscanbetailoredforanypersonorgroup:
highlytechnicaltobusiness-oriented
• Mitigateproblemsbeforetheyimpactcustomers
13
3-TourtheGlassTablesGlassTablesareanewtypeofdashboardwhichallowITSIservices,KPIsandhealthscorestobevisualizedinhighly-customizableways.GlassTablescanbetailoredtoshowverydetailedtechnicalviews,orhigher-levelbusinessviewswithcustomer/revenue-relevantmetrics.Fromthetechnical"soldiersinthetrenches"toexecutivemanagement,GlassTablescanbecraftedtoshowservices,servicerelationships,transactionflows,healthscores,keybusinessmetricsandothercontentwhicharerelevanttotheusers.Andthey'realotoffuntobuild,too!ThissectionshowsanumberofexampleGlassTables.
Instructions1. NavigatetotheGlassTablelistbyclickingon'GlassTables'inthetopmenubar
2. FromthelistofGlassTables,clickonaTitletoviewthatGlassTable
14
3. Select"BusinessStatus(Medium)"
ThisGlassTableshowsthe"bottomline"statusofanonlinestore,includingoverallhealth,revenueandcheckoutsmetrics.Itcouldbeusedbyserviceowners,executivemanagementorotherswhoneedtoquicklyunderstandthe"bigpicture".
4. Select"OnLineTransactionService"
ThisGlassTableshowsadetailedviewofacustomer-facingservice,includingtransactionflow,componentrelationshipsanddependencies,andcriticalhealthscoresandmetricsofkeyservicepointsalongtheway.Itmakesexcellentuseofapre-existingdrawing,withliveITSI"widgets"placedstrategicallyontop.ThisGlassTablewouldhelpfulforNOC,Tier1&2andsimilarsupportpersonnelwhoneedtounderstandthecomplexrelationshipsofalltheservicecomponentssupportinganimportantbusinessservice.
5. Select"EndtoEndHealth(Medium)"
ThisGlassTableshowsastreamlinedviewofacustomer-facingservice--the"onlinestore"summarizedinthe"BusinessStatus"GlassTable.Thisviewprovidesmoredetailoftheunderlyingtechnicalservices,theirdependencies,andtheoveralltransactionflow.ItusesnativeGlassTabledrawingtools,aswellasservice&KPIwidgetswhichdisplayhealth&metricvalueslive(updatingovertime).Thesewidgetshaveconfigurabledrill-downcapabilities,includingtheabilitytonavigatetoother,even-more-detailedGlassTables.Forexample,ifyouclickonthewidgetnextto"WebTier",youwillnavigateto...
6. "WebTier(Medium)"
ThisGlassTablerepresentsamoredetailedvisualizationoftheKPIs,overallWebTierhealthscore,andthehealthscoreofitsdependentservice,"Middleware".SuchGlassTablesallowtechnicalpersonneltoquicklytroubleshootproblemsbybeingabletodrilldowntothedetailedtechnicalmetricswhichmatter.
7. Select"EndtoEndHealth(Medium)"(again)
Severaldrill-downoptionsareavailablewhenawidgetisclicked.Clickonthewidgetnextto"Database";thiswillnavigatetoaDeepDive.
15
GlassTablesallowservices,dependencies,healthscores,KPIsandothercriticalinformationtobevisualizedinacontextualwaythatistrulymeaningfultothetargetedaudience.Thisallowsuserstoquicklysize-upservicedeliveryhealthandwhennecessary,efficientlyisolateproblems.
16
4-TroubleshootwithGlassTablesandDeepDivesThissectiondescribesapossibleproblemscenario,andhowITSIcouldbeusedtoefficientlytroubleshoottofindrootcause.ThiswouldtypicallybedrivenbyaNOCorTier1/2supportperson.We'regoingto"setup"thefailurescenarioandfirstseehowGlassTablescanacceleratethetroubleshootingprocess,thencontinueisolatingrootcausewithDeepDives.
Pre-RequisitesYoushouldalreadybefamiliarwith:
• CoreConcepts(Ch.2)• GlassTables(Ch.3)
Abouttheeventgenerator...InordertomaketheITSISandboxmoreinterestingtoplayin,aneventgeneratorisincludedwhichcontinuouslygeneratesasimulatedstreamofrealisticmachineevents,includingwebaccess,database,linuxmetrics(fromthe*nixTechnologyAdd-onApp)andothers.Includedinthisstreamofeventsareafailurescenario,showingasequenceoffailuresandresultingservicedegradations,repeatingapproximatelyonceperhour.Typically,theinitialfailuresoccuratabout30minutespastthehour(XX:30),andresetbackto"OK"aroundthetopofthehour(XX:00).However,theeventgenerator(eventgen)timingisnotprecise.Thefailurescenariosmayoccuratslightlydifferenttimesfromhourtohour,andmayvaryfromsandboxtosandbox.Thus,withintheSandbox,itisimpossibletopredictexactlyhowthehealthscoresandKPIswillappear,duringanyspecifichour.Thismakesitdifficulttosetupa"clean"failuresimulation.Pleasepardonanyeventgeninconsistencies.WedecidedtoputmostofoureffortintodevelopingITSI--notaneventgenerator.
17
Instructions1. NavigatetotheGlassTablecalled,"EndtoEndHealth(Medium)":
a. Clickon"GlassTables"intheuppermenubartonavigatetothepage,"SavedGlassTables"b. Clickon"EndtoEndHealth(Medium)"tonavigatetothisGlassTable
Asnotedabove,itisnotpossibletopredictexactlywhenafailurescenariooccurswithinaparticularpasthour.However,interestingthingsusuallyoccuraroundXX:40toXX:50.2. Modifytheviewtimebyclickingonthetimepickerintheupperrightcorner.Inthepop-upwindow,typeinanexplicit
timefromthepast,suchasXX:50.0fromtheprevioushour(orthehourbeforethat,etc).BesuretousethecorrectHH:MM:SS.sssformat(example:"22:50:00.0")
18
3. Inafewseconds,thecolorsofthewidgetswillchange,toindicatetheirstatesatthatparticulartimeinthepast.Trydifferenttimesinthepasttoseehowthisworks.
4. Forthepurposesofthistroubleshootingexercise,imaginethatyourGlassTablelookslikethefollowing:
19
5. Thescenario:CustomerCarehasinformedusthatcustomersarecallingtocomplainwhentheytrytopurchasethroughtheOnlineStore;theyareseeingslowresponseandoccasionalerrors.Theproblemsseemtobeaffectingbothweb-basedandmobile-basedcustomers.
6. Basedonjustthereportsthatthecustomer-facingweb-basedserviceishavingproblems,mostsupportpersonswouldbegintroubleshooting"fromthetop"--theweb&mobiletiersinthiscase.Ifnoobviousproblemsarefound,theywouldproceeddowntheservicedependencytree--tothemiddlewaretier,etc.
7. ButusingaGlassTablesuchas"EndtoEndHealth"providesinstantandcontext-relevantvisibilityintoservicehealthscoresandimportantKPIs,allinoneplace.Intheaboveexample,whichsupportingtierseemstobeindistress?(Database)Bybeingabletovisualizetherelevantservicesandtheirhealthscores,wehavetheabilitytoimmediatelyfocusourtroubleshootingontheareaswhicharedegraded.Thiscansavehugeamountsoftimeandgreatlyreducethetimerequiredtofindrootcause.
8. OnyourSandboxGlassTable,clickonthewidgetbeneath"Database"todrilldownintotheDatabasetiertocontinuethetroubleshootingexercise.(Select"LeaveThisPage"ifprompted)
20
(Nowin"DBDeepDive")9. Clickonthe">"nextto"Focus"tocollapsetheservicetreenavigatorpanelontherightside;wewillexplorethisfeature
later.10. ChangethePrimaryTimeRangeto"Last90minutes"byclickingonthetimepickerinthelowerleftcorner:
a. Inthe"Relative"section,typein"90"andselect"MinutesAgo"b. Clickon"Apply"
(TheDeepDivenowdisplaysacoupleofperiodsof"mostlygreen"and"mostlyred")
21
11. Positionthepointerneartheleftsideofa"mostlygreen"section,thenclick-holdanddragrighttoincludemostofa"mostlyred"sectionaswell.Releasethemousetozoomintothistimeperiod.Atthispoint,yourDeepDiveshouldlooksimilartothis:
12. WearelookingattheAggregatedHealthScore(topswimlane)andKPIsfortheDBService,acrossarangeoftimewhich
showstheservicemovingfrom"healthy"to"nothealthy".13. Slowlymouseovertheswimlanestocomparevaluesatvariouspointsintime.14. Toggleon/offthe"EnableThreshold"intheupperleftcorner,tocomparetheswimlaneswithandwithoutthethreshold
colors/statesoverlaid.15. NotethattheServiceHealthScoreinthetopswimlaneisanaggregationoftheservice'sKPIsanddependentservices,
rangingfrom100-0.Whendidthehealthscorebegintodeteriorate,andwhichKPI(s)mayhavebeenpartoftherootcause?
22
16. Clickonthename-boxfor"DBHostsUsedDiskSpacePercent",thendragitupwardstorepositionthisswimlane.17. Mouseoverthename-boxfor"DBHostCPULoadPercent"torevealthe"optionswheel",thenselectittoviewavailable
options.Select"DeleteLane"to(temporarily)removethisswimlanefromourDeepDive.
18. Clickonthedarkerbluetilewithinthe"DBErrors"swimlanetoreveal"rawerrors"fromtheunderlyingSplunksearch.
Clickon"HideEvents"todismiss.19. Mouseoverthe"DBHostsUsedDiskSpacePercent"swimlane,intheplacewhereitgoesfromgreentored.Notethehigh
&lowmetricvaluesshownfortheswimlane,andthatthismetrichasgoneto100%,indicatingthatafilesystemisfull.20. Mouseoverthename-boxfor"DBHostsUsedDiskSpacePercent"torevealthe"optionswheel",thenselectittoview
availableoptions.Select"LaneOverlayOptions".Whenthepopupappears,select"EnableOverlays-Yes",then"Done".
(Severalcoloredlineshavenowreplacedthesingleblacklineinthisswimlane)
23
21. Clickanywherewithinthe"DBHostsUsedDiskSpacePercent"swimlanetorevealanoptionspopup.Select"AddOverlay
asLane".
(Threenewswimlanesareaddedatthebottom,representingtheseparateKPIvaluesfortheindividualentities(hosts)whichcomprisethisKPI)22. Whichhost/serverissufferingfromafilesystem-fullcondition?(mysql-02)
24
5-TourtheDeepDiveDeepDivesallowKPImetricsandhealthscorestobecomparedinside-by-sideswimlanes,whichallowstrendsandcorrelationstobemoreeasilyandquicklydiscovered.ThischapterexploresDeepDivesandhowtheycanbeused.
Pre-RequisitesYoushouldalreadybefamiliarwith:
• CoreConcepts(Ch.2)• Troubleshooting(Ch.4)alsogoesintoDeepDives
Instructions1. NavigatetotheDeepDivecalled,"DBDeepDive":
a. Clickon"DeepDives"intheuppermenubartonavigatetothepage,"SavedDeepDives"b. Clickon"DBDeepDive"tonavigatetothisDeepDive
2. Clickonthe">"nextto"Focus"tocollapsetheservicetreenavigatorpanelontherightside;wewillexplorethisfeaturelater.
3. SelectanarbitrarytimerangebyclickingonthePrimaryTimeRangemenuoptionatthebottomright;itfunctionslikeastandardSplunksearchbartimepicker
4. Zoomintoatightertimerangeinthecurrentviewbyclick-holdinganywhereintheswimlanes,thendragginghorizontallytoselecttherange.
5. Togglethethresholdhealthscorecolorsbyclickingon"EnableThreshold"intheupperleftcorner.6. Clickonthe"<"above"Focus"toopentheservicetreenavigatorpanelontherightside.
a. Clickonaservicenodetonavigateupanddownthedependencytreeofservicesb. Afterclickingonaservicenode,notethatthoseservice'sKPIsarelistedbelow.c. Clickonthe"+"onalistedKPItoaddittothecurrentswimlanesd. Clickonthe">"nextto"Focus"tocollapsetheservicetreenavigatorpanelontherightside
25
7. Mouseoverthename-boxforanyswimlanetorevealthe"optionswheel",thenselectittoviewavailableoptions:
8. Thestudentisencouragedtoexploretheseoptions,whicharecoveredinmoredetailat
http://docs.splunk.com/Documentation/ITSI/latest/User/DeepDives9. Click-holdonthename-boxforanyswimlane,thendragitverticallytorepositionthisswimlane.10. Clickonthedarkerbluetilewithinthe"DBErrors"(orany"event"-style)swimlanetoreveal"rawerrors"fromthe
underlyingSplunksearch.Clickon"HideEvents"todismiss.11. TosaveaDeepDiveaftermodifyingthelayoutand/orvisualizationoptions,clickonthe"Edit"menuoptionintheupper
rightcorner,thenselect"Save"12. Tocomparethecurrenttimerangeagainstadifferenttimerange,clickon"Compareto..."inthelowerleftcorner,then
selectacomparisontimerange.ThiscauseseachKPItodisplaytwinswimlanes:primarytimerangeabovecomparisontimerange.Notethatwhenmousingovertheswimlanes,thetimedisplayatthetopnowshowsbothtimes.
13. Todismissthe"twin"lanesdisplay,deselectthecheckboxnextto"Compareto..."inthelowerleftcorner
26
6-TourtheNotableEventViewerMulti-KPIAlertsarecorrelationsearcheswhichcancombineanyKPIstocreatemeaningful,actionablealerts,usingmultiplecorrelationfactorssuchKPIthresholdindications,lengthoftimeinthisstate,time-of-day,andothers.Multi-KPIalertscanfindnotjust"failures",butearly"canaryinthecoalmine"indicationsthattheserviceisbecomingunstable;itispossibletofindproblemsBEFOREtheyimpactcustomer-facingservices.WhenaMulti-KPIalertfires,itcreatesaNotableEvent;itcouldalsoexecuteascriptand/orsendemail.ANotableEventislikea"Poorman'strouble-ticket",usefulfortacticaltriageofproblemsorpotentialproblems.TheNotableEventViewerhastheabilityfilterNotableEventsbasedonvariouscriteria,suchasSeverity,Status,Serviceandothers.ItalsoallowsNotableEventstobemodified,tochangeOwner,Severity,Status,and/oraddcomments.Notableeventscanalsohaveworkflowactionsassociatedwiththem,toallowanoperatortheabilitytoquicklyhittroubleshootingoptions,firemitigationscripts,oropena"real"IncidentManagementtrouble-ticket.
Pre-RequisitesYoushouldalreadybefamiliarwith:
• CoreConcepts(Ch.2)
Instructions1. NavigatetotheNotableEventsViewerbyclickingon"NotableEventsReview"intheuppermenubar2. FiltertheNotableEventsbySeverity:ClickonacoloredSeveritybartoselect/deselectthatparticularseverity.Severities
whichhavebeendeselectedappearasgray.3. FilterbyStatus,Owner,Service,TimeRange,Name("Title")orfreeformsearchcriteriabyusingtheinputoptionsinthe
upperleftquadrant.4. Click"Submit"to(re)applysearchfiltercriteria5. EditoneormoreNotableEventsbyselectingtheircheckboxesontheleftside,thenselect"Editallselected"
a. ChangeStatus,SeverityorOwnerb. Addfreeformcommentsc. Click"Done"
6. ViewavailableworkflowactionsforaNotableEventbyclickingthe"V"under"Actions"
27
7-TourtheServiceAnalyzerTheServiceAnalyzerisa"BigPicture"viewofallservices,andthe"mostinteresting"KPIs(i.e.,KPIswithdegradedhealthscores).Itis"nofrills",designedforNOCs,Tier1/2support,andotherswhoneedahighlevelviewofallservices/KPIs,orasubset.
Pre-RequisitesYoushouldalreadybefamiliarwith:
• CoreConcepts(Ch.2)
Instructions1. NavigatetotheServiceAnalyzerbyclickingon"ServiceAnalyzer"intheuppermenubar2. ClickonanyservicetodrilldownintoadeepdivecontainingtheKPIsforthatservice
a. Noticethedeepdivehasbeenbuiltforyouonthefly,containingalltheKPI’sassociatedwiththatservice3. NavigatebacktoServiceAnalyzer
WhatifIonlywanttoseeafewservicesorKPIs?4. Clickinthe"Selectservice(s)tomonitor"boxtoselect&showonlycertainservices5. Clickonthe"OptionWheel"nextto"Top...Services"tocontrolhowmanyservicesareshown6. Clickonthe"OptionWheel"nextto"Top...KPIs"tocontrolhowmanyKPIsareshown,andtoselectwhichKPIsareshown
Iwanttocombine&drilldowninto"interesting"KPIsandServiceHealthScores...7. Tocreateanad-hocDeepDive:
a. MouseoveroneormoreServiceorKPItiles,thenselectthecheckboxintheupperrightcornerofthetile(selectatleastthreetiles)
b. Click"DrilldownonSelection"