delivering*integrated* monitoring*@ge*capital* · 2017-10-13 · private cloud architecture (mc-1)...

37
Copyright © 2014 Splunk Inc. Thiru Venkat Sr. Enterprise Architect Tim March Middleware EA Leader Delivering Integrated Monitoring @GE Capital

Upload: others

Post on 20-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Delivering*Integrated* Monitoring*@GE*Capital* · 2017-10-13 · Private Cloud Architecture (MC-1) Static Cluster 1 Static Cluster 2 Static Cluster N VM Node 1 VM Node 2 WAS CELL

Copyright  ©  2014  Splunk  Inc.  

Thiru  Venkat  Sr.  Enterprise  Architect  

Tim  March  Middleware  EA  Leader        

Delivering  Integrated  Monitoring  @GE  Capital  

Page 2: Delivering*Integrated* Monitoring*@GE*Capital* · 2017-10-13 · Private Cloud Architecture (MC-1) Static Cluster 1 Static Cluster 2 Static Cluster N VM Node 1 VM Node 2 WAS CELL

Disclaimer  

2  

During  the  course  of  this  presentaIon,  we  may  make  forward-­‐looking  statements  regarding  future  events  or  the  expected  performance  of  the  company.  We  cauIon  you  that  such  statements  reflect  our  current  expectaIons  and  

esImates  based  on  factors  currently  known  to  us  and  that  actual  events  or  results  could  differ  materially.  For  important  factors  that  may  cause  actual  results  to  differ  from  those  contained  in  our  forward-­‐looking  statements,  

please  review  our  filings  with  the  SEC.  The  forward-­‐looking  statements  made  in  the  this  presentaIon  are  being  made  as  of  the  Ime  and  date  of  its  live  presentaIon.  If  reviewed  aSer  its  live  presentaIon,  this  presentaIon  may  not  contain  current  or  accurate  informaIon.  We  do  not  assume  any  obligaIon  to  update  any  forward-­‐looking  statements  we  may  make.  In  addiIon,  any  informaIon  about  our  roadmap  outlines  our  general  product  direcIon  and  is  subject  to  change  at  any  Ime  without  noIce.  It  is  for  informaIonal  purposes  only,  and  shall  not  be  incorporated  into  any  contract  or  other  commitment.  Splunk  undertakes  no  obligaIon  either  to  develop  the  features  or  funcIonality  described  or  to  

include  any  such  feature  or  funcIonality  in  a  future  release.  

Page 3: Delivering*Integrated* Monitoring*@GE*Capital* · 2017-10-13 · Private Cloud Architecture (MC-1) Static Cluster 1 Static Cluster 2 Static Cluster N VM Node 1 VM Node 2 WAS CELL

Agenda  

!   Business  Case    !   Private  Cloud  !   Architecture  !   Business  AcIvity  Monitoring  !   Centralized  Log  Monitoring  !   Dashboard  Samples  

3  

Page 4: Delivering*Integrated* Monitoring*@GE*Capital* · 2017-10-13 · Private Cloud Architecture (MC-1) Static Cluster 1 Static Cluster 2 Static Cluster N VM Node 1 VM Node 2 WAS CELL

Business  Case  

Page 5: Delivering*Integrated* Monitoring*@GE*Capital* · 2017-10-13 · Private Cloud Architecture (MC-1) Static Cluster 1 Static Cluster 2 Static Cluster N VM Node 1 VM Node 2 WAS CELL

What  If  ??  

Detec%ng  problems  in  this  system  was  as  easy  as…   Detec%ng  problems  in  this  system?  

Page 6: Delivering*Integrated* Monitoring*@GE*Capital* · 2017-10-13 · Private Cloud Architecture (MC-1) Static Cluster 1 Static Cluster 2 Static Cluster N VM Node 1 VM Node 2 WAS CELL

20/20  Data…  Hindsight  &  Foresight  Outage  DiagnosIcs   PredicIve  Data  

Load  Balancing  

Timeline  Par%al  outage  11/1/13  9:30~  AM  outage  reported  to  helpdesk  10:10  AM    IT-­‐side  outage  call  opened  12:55  PM  Root  Cause  IdenIfied  1:15  PM  Issue  resolved      

2.5  hours  to  diagnose  issue  20  minutes  to  resolve….  

Understand  IT  system  problem  areas  

Improved  Capacity  Planning  

Reduced  Down-­‐Ime   Planning  &  PrevenIon  

Understand  &  Predict  peak  volume  for  risk  &  ops  Staff  accordingly,  hourly,  daily,  monthly...  

Use  data  to  predict  &  Prevent  outages  

Page 7: Delivering*Integrated* Monitoring*@GE*Capital* · 2017-10-13 · Private Cloud Architecture (MC-1) Static Cluster 1 Static Cluster 2 Static Cluster N VM Node 1 VM Node 2 WAS CELL

Voice  of  the  Customer…  

A  lot  of  'me  goes  into  understanding  the  true  nature  of  the  error  and  which  system  is  actually  throwing  the  error.    There  are  always  asks  on  the  IT  Outage  Management  calls  for  

'What  is  the  exact  error  message?’  and  we  wait  for  someone  to  login  and  tell  us  what  it  is.    20/20  simply  eliminates  this  

problem.    

Re:  Access  to  Error  &  Log  entries    via  Splunk  

Direct  visibility  to  the  status  of  a  specific  transac'on  Direct  visibility  to  which  system  it  currently  'resides  in'  or  'is  

stuck  at'  A  huge  plus  in  reducing  BTTR  around  'Customer  Cri'cal'  'ckets  

 

Re:  Transac'on  level  monitoring  &    error  messages  

Dashboard  visibility  to  the  health  of  various  applica'ons  at  the  transac'onal  level    

 RE:  Service  Delivery  

Tradi'onally  a  data  issue  or  an  xml  issue  would  be  reported  by  a  business  user  before  Service  Delivery  is  even  aware  of  it.    The  idea  of  knowing  when  an  issue  

occurs  without  having  to  wait  for  a  customer  to  report  it  -­‐  thereby  understanding  your  true  Failed  Customer  

Interac'ons  would  be  key  to  the  next  level  of  value  add.    

Re:  Failed  Customer  Interac'ons  

Page 8: Delivering*Integrated* Monitoring*@GE*Capital* · 2017-10-13 · Private Cloud Architecture (MC-1) Static Cluster 1 Static Cluster 2 Static Cluster N VM Node 1 VM Node 2 WAS CELL

Today’s  World…  Rallying…  

‒  Business  team  submits  a  ServiceNow  Request  “…a  transac'on  has  not  come  through  the  system  in  the  last  30  minutes”  

‒  Service  Delivery  opens  an  IT  Outage  Management  and  begins  to  collect  various  system  owners  onto  a  Bridge  to  troubleshoot  problem  

Troubleshoo%ng…  

‒  Each  team  uses  the  various  technologies  at  their  disposal  to  trace  which  system  is  failing.  This  can  someImes  take  up  to  30  minutes  or  longer  

‒  At  which  point  business  has  now  been  stopped  for  approximately  1  hr  

Enlightenment…  

‒  Problem  system  has  been  idenIfied  and  appropriate  team  is  taking  correcIve  acIon  to  address  the  issue  

Page 9: Delivering*Integrated* Monitoring*@GE*Capital* · 2017-10-13 · Private Cloud Architecture (MC-1) Static Cluster 1 Static Cluster 2 Static Cluster N VM Node 1 VM Node 2 WAS CELL

What  are  ConInuous  Insights?  

!   The  “What  ”  …  ‒  Monitoring  a  single  transacIon  across  disparate  systems    

!   The  “Why  ”  …  ‒  Monitor,  measure,  and  minimize  “Failed  Customer  InteracIons”    

!   The  “How  ”  …  ‒  Business  AcIvity  Monitoring  (Top-­‐Down)  ‒  Centralized  Logging  Indexing  &  AnalyIcs  (Bolom-­‐Up)  

Page 10: Delivering*Integrated* Monitoring*@GE*Capital* · 2017-10-13 · Private Cloud Architecture (MC-1) Static Cluster 1 Static Cluster 2 Static Cluster N VM Node 1 VM Node 2 WAS CELL

Why  Splunk?  

10  

!   Proven  operaIonal  intelligence  and  log  monitoring  soluIon  with  tons  of  features  &  benefits  

! Splunk  is  in  GE  for  more  than  5  years  primarily  used  for  Security  and                  IT  Risk  monitoring  

!   As  a  corporate  policy,  Splunk  is  installed  on  all  our  servers  through  the  hardware  provisioning  process  

!   Dedicated  IT  Risk  team  manages  the  enIre  Splunk  infrastructure  !   ApplicaIon  monitoring  gets  much  easier  with  this,  just  add  to  the  forwarder  configuraIon  and  develop  dashboards  

Page 11: Delivering*Integrated* Monitoring*@GE*Capital* · 2017-10-13 · Private Cloud Architecture (MC-1) Static Cluster 1 Static Cluster 2 Static Cluster N VM Node 1 VM Node 2 WAS CELL

GE  Capital’s    Private  Cloud  

Page 12: Delivering*Integrated* Monitoring*@GE*Capital* · 2017-10-13 · Private Cloud Architecture (MC-1) Static Cluster 1 Static Cluster 2 Static Cluster N VM Node 1 VM Node 2 WAS CELL

Servers  Storage  Networking  

VirtualizaIon  

Management  

Security  

Middleware  Palerns  

Automated  Deployment   ApplicaIons  

Fewer  “moving  parts”  for  speed  &  stability  

We’re  not  in  the  “IT  integraIon  business”  

Challenge:  People  &  processes,  not  tech  

Install App  Servers

Install OS’

Install Physical Servers

Configure network

Configure security

Debug!

GE  Capital  Private  Cloud  –  IPAS  

Page 13: Delivering*Integrated* Monitoring*@GE*Capital* · 2017-10-13 · Private Cloud Architecture (MC-1) Static Cluster 1 Static Cluster 2 Static Cluster N VM Node 1 VM Node 2 WAS CELL

Dev/Test  Cloud  Group  

Dev  PaGern   QA  Prod  PaGern  

Produc%on  Cloud  Group  

QA  Prod  PaGern  

Enterprise  Architecture  

PaGern  Library  

Developer    

Dev,  QA,  &  Prod  provisioned  in  minutes  Speed,  Consistency  &  Repeatability  with  Palerns  

Page 14: Delivering*Integrated* Monitoring*@GE*Capital* · 2017-10-13 · Private Cloud Architecture (MC-1) Static Cluster 1 Static Cluster 2 Static Cluster N VM Node 1 VM Node 2 WAS CELL

Private Cloud Architecture (MC-1)

Static Cluster 1

Static Cluster 2

Static Cluster N

VM Node 1

VM Node 2

WAS CELL

IHS 1

F5

IHS 2

ODR 1

ODR 2

APP 1 APP 1

APP 2 APP 2

APP n APP n

DMgr ODR WEB

Plugin gen’d from ODR

Cluster and copied to NFS Share for WEB Tier read

NFS Share

Plugin-cfg.xml Plugin-cfg.xml (cached) Provisioning Patterns:

GECA-MC1-STATIC-ODR-V1.0 GECA-MC1-STATIC-APP-V1.0

Health Policies: GCPercentage Memory Leak Excessive Memory Usage Max CPU Server Age, eMail Notofication

APP

VM

VM

DMgr VM

Splunk Splunk

Page 15: Delivering*Integrated* Monitoring*@GE*Capital* · 2017-10-13 · Private Cloud Architecture (MC-1) Static Cluster 1 Static Cluster 2 Static Cluster N VM Node 1 VM Node 2 WAS CELL

ConInuous  Insights  Architecture  

Page 16: Delivering*Integrated* Monitoring*@GE*Capital* · 2017-10-13 · Private Cloud Architecture (MC-1) Static Cluster 1 Static Cluster 2 Static Cluster N VM Node 1 VM Node 2 WAS CELL

CI  Architecture  –  Touchless  

Page 17: Delivering*Integrated* Monitoring*@GE*Capital* · 2017-10-13 · Private Cloud Architecture (MC-1) Static Cluster 1 Static Cluster 2 Static Cluster N VM Node 1 VM Node 2 WAS CELL

IT  ApplicaIons  Monitoring    

Web  

Events  

 Applica%on  

Events  

 Database  

Events  

Centralized  Log  Indexing  and  Analy%cs  

Search & Analysis Predictions Action items FCI Dashboard

Log  events  are  captured  across  IT  systems  conInuously  to  provide  business  and  IT  criIcal  perspecIves  and  dashboards  

External    Services  and  Apps  

Events  

Page 18: Delivering*Integrated* Monitoring*@GE*Capital* · 2017-10-13 · Private Cloud Architecture (MC-1) Static Cluster 1 Static Cluster 2 Static Cluster N VM Node 1 VM Node 2 WAS CELL

GE  Capital  Splunk  Architecture  

IBM  Pure  Forwarders  

Deployment  Server  

Indexers  

Search  Head  

Browser  

Page 19: Delivering*Integrated* Monitoring*@GE*Capital* · 2017-10-13 · Private Cloud Architecture (MC-1) Static Cluster 1 Static Cluster 2 Static Cluster N VM Node 1 VM Node 2 WAS CELL

Business  AcIvity  Monitoring  

Page 20: Delivering*Integrated* Monitoring*@GE*Capital* · 2017-10-13 · Private Cloud Architecture (MC-1) Static Cluster 1 Static Cluster 2 Static Cluster N VM Node 1 VM Node 2 WAS CELL

Business  AcIvity  Monitoring  ApplicaIon  components  are  drawn  in  user  understandable  graphical  diagram    

Business  Monitor  shows  each  step  of  the  transacIon  in-­‐progress                

Page 21: Delivering*Integrated* Monitoring*@GE*Capital* · 2017-10-13 · Private Cloud Architecture (MC-1) Static Cluster 1 Static Cluster 2 Static Cluster N VM Node 1 VM Node 2 WAS CELL

Outage  DetecIon  &  Response  

Service  Delivery  Ac%on:  A  component  or  sub-­‐system  failure  is  highlighted  in  red  so  that  service  delivery  can  engage  the  right  team  to  get  the  outage  resolved  

Page 22: Delivering*Integrated* Monitoring*@GE*Capital* · 2017-10-13 · Private Cloud Architecture (MC-1) Static Cluster 1 Static Cluster 2 Static Cluster N VM Node 1 VM Node 2 WAS CELL

Centralized  Log  Monitoring  

Page 23: Delivering*Integrated* Monitoring*@GE*Capital* · 2017-10-13 · Private Cloud Architecture (MC-1) Static Cluster 1 Static Cluster 2 Static Cluster N VM Node 1 VM Node 2 WAS CELL

IT  &  Business  Use  Cases  Current  Scope  

Page 24: Delivering*Integrated* Monitoring*@GE*Capital* · 2017-10-13 · Private Cloud Architecture (MC-1) Static Cluster 1 Static Cluster 2 Static Cluster N VM Node 1 VM Node 2 WAS CELL

ApplicaIon  Development  !   Real-­‐Ime  access  to  logs  !   Used  by  both  developers  and  testers  !   Complete  transacIon  traceability  from  front-­‐end  UI  to  database/system  of  record  

!   Quickly  idenIfy  issue  root  cause  !   Troubleshoot  performance  bollenecks  ! Splunk  enabled  in  all  environments  from  development  to  producIon  and  DR  !   Monitor  various  background  resources  providing  in-­‐depth  view  to  applicaIon  errors  

Page 25: Delivering*Integrated* Monitoring*@GE*Capital* · 2017-10-13 · Private Cloud Architecture (MC-1) Static Cluster 1 Static Cluster 2 Static Cluster N VM Node 1 VM Node 2 WAS CELL

OperaIonal  Visibility  !   Hardware  components  monitoring    !   SoSware  components  monitoring  !   Hardware  uIlizaIon  and  capacity  monitoring  !   CorrelaIon  of  hardware  uIlizaIon  with  applicaIon  usage  palerns  !   ConInuous  proacIve  monitoring  and  alerts  !   OperaIons  teams  get  clear  understanding  of  the  incident  and  can  take  acIon  instantaneously  

Page 26: Delivering*Integrated* Monitoring*@GE*Capital* · 2017-10-13 · Private Cloud Architecture (MC-1) Static Cluster 1 Static Cluster 2 Static Cluster N VM Node 1 VM Node 2 WAS CELL

Performance  Monitoring  &  Support  !   Wing-­‐to-­‐wing  view  of  issues/impact  propagaIon  across  various  sub-­‐systems  using  top-­‐down  and  bolom  up  approaches  

!   History  to  understand  performance  over  Ime  !   Drill-­‐down  to  see  specific  transacIons  not  meeIng  performance  goals  

!   Configured  alerts  to  noIfy  when  performance  degrades  below  defined  threshold  

!   Provides  business  with  heat  maps,  operaIonal  analyIcs  and  SLA  monitoring  thru  dashboards  

!   Support  can  idenIfy  problems  in  seconds  to  minutes  !   Engage  right  support  resources  to  resolve  the  issue  

Page 27: Delivering*Integrated* Monitoring*@GE*Capital* · 2017-10-13 · Private Cloud Architecture (MC-1) Static Cluster 1 Static Cluster 2 Static Cluster N VM Node 1 VM Node 2 WAS CELL

ImplementaIon  Stat  

27  

!   ApplicaIon  Log  Sources  –  Business  ApplicaIon  –  ApplicaIon  Servers,  Web  Servers,  FileNet,  Siebel  etc.,  –  Oracle  database  –  IBM  Worklight  Mobile  &  Tea  Leaf  

!   ApplicaIon  Environments  –  ProducIon  (51)  –  QA  (56)  –  Development  (58)  –  SIT  (22)  –  DR  (8)    

Page 28: Delivering*Integrated* Monitoring*@GE*Capital* · 2017-10-13 · Private Cloud Architecture (MC-1) Static Cluster 1 Static Cluster 2 Static Cluster N VM Node 1 VM Node 2 WAS CELL

Expected  Benefits  &  Cost  Savings  !   ReducIon  in  troubleshooIng  Ime  for  P1  issues  by  25%  !   ReducIon  in  “Customer  CriIcal”  issues:  

–  ReducIon  in  troubleshooIng  Ime  :  50-­‐75%  –  Improvement  in  correct  team  engagement  –  40%  Effort  reducIon  per  incident  from  the  current  average  of  73  minutes  to  

43  minutes  

!   34%  ReducIon  in  support  Ickets  for  log  requests  !   Enables  true  value  add  work  across  associated  teams  !   SoS  dollars  savings  for  Business  (lost  business,  etc.)  

Page 29: Delivering*Integrated* Monitoring*@GE*Capital* · 2017-10-13 · Private Cloud Architecture (MC-1) Static Cluster 1 Static Cluster 2 Static Cluster N VM Node 1 VM Node 2 WAS CELL

Dashboard  Samples  

Page 30: Delivering*Integrated* Monitoring*@GE*Capital* · 2017-10-13 · Private Cloud Architecture (MC-1) Static Cluster 1 Static Cluster 2 Static Cluster N VM Node 1 VM Node 2 WAS CELL

20/20  –  Integrated  Dashboard  

Page 31: Delivering*Integrated* Monitoring*@GE*Capital* · 2017-10-13 · Private Cloud Architecture (MC-1) Static Cluster 1 Static Cluster 2 Static Cluster N VM Node 1 VM Node 2 WAS CELL

IPAS  Dashboard  

Page 32: Delivering*Integrated* Monitoring*@GE*Capital* · 2017-10-13 · Private Cloud Architecture (MC-1) Static Cluster 1 Static Cluster 2 Static Cluster N VM Node 1 VM Node 2 WAS CELL

CriIcal  Errors  

Page 33: Delivering*Integrated* Monitoring*@GE*Capital* · 2017-10-13 · Private Cloud Architecture (MC-1) Static Cluster 1 Static Cluster 2 Static Cluster N VM Node 1 VM Node 2 WAS CELL

ApplicaIon  Dashboard  

Page 34: Delivering*Integrated* Monitoring*@GE*Capital* · 2017-10-13 · Private Cloud Architecture (MC-1) Static Cluster 1 Static Cluster 2 Static Cluster N VM Node 1 VM Node 2 WAS CELL

Search  and  Export  Logs  

Page 35: Delivering*Integrated* Monitoring*@GE*Capital* · 2017-10-13 · Private Cloud Architecture (MC-1) Static Cluster 1 Static Cluster 2 Static Cluster N VM Node 1 VM Node 2 WAS CELL

Dashboard  &  Email  Alerts  

Page 36: Delivering*Integrated* Monitoring*@GE*Capital* · 2017-10-13 · Private Cloud Architecture (MC-1) Static Cluster 1 Static Cluster 2 Static Cluster N VM Node 1 VM Node 2 WAS CELL

Special  Offer:  Try  Splunk  MINT  Express  for  Free!  Splunk  MINT  offers  a  fast  path  to  mobile  intelligence.  How  fast?    

Find  out  with  a  6-­‐month  trial*  

•  Register  for  your  free  trial:  hlp://mint.splunk.com/conf2014offer  

•  Download  the  Splunk  MINT  SDKs  •  Add  the  Splunk  MINT  line  of  SDK  code  and  publish**    

•  Start  gevng  digital  intelligence  at  your  fingerIps!    

*Offer  valid  for  .conf2014  a[endees  and  coworkers  of  a[endees  only.  

**Trial  allows  monitoring  of  up  to  750,000  monthly  ac've  users  (MAUs).  

 

36  

Page 37: Delivering*Integrated* Monitoring*@GE*Capital* · 2017-10-13 · Private Cloud Architecture (MC-1) Static Cluster 1 Static Cluster 2 Static Cluster N VM Node 1 VM Node 2 WAS CELL

THANK  YOU