vmware log insight
TRANSCRIPT
Insight into the World of Logs with VMware vRealize Log InsightIwan Rahabok, VMwareKarl Fultz, VMwareManny Sidhu
MGT7685
#MGT7685
• This presentation may contain product features that are currently under development.
• This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.
• Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
• Technical feasibility and market demand will affect final delivery.
• Pricing and packaging for any new technologies or features discussed or presented have not been determined.
Disclaimer
CONFIDENTIAL 2
Insight into the World of Logs With Log Insight
Keep in Touch!
Iwan ‘e1’ Rahabokvirtual-red-dot.info@e1_ang Linkedin.com/in/e1ang
Karl [email protected]/in/kfultz
Manny Sidhuvirtual10.com@mannySidhu2 Linkedin.com/in/
mannysidhu10
Hybrid Cloud(Private / Public)Physical Infrastructure
SOFTWARE-DEFINED DATA CENTER
Compute Network Storage
End-User Computing
Extensibility
Applications
Cloud Management Platform (CMP)
Virtualized Infrastructure
Introduction: Environment Landscape
VMware Logs
OS andApp Logs
200 ESXi Host + VMs = 200GB or 2B log events per day
Physical Infrastructure Logs
6
Primary Use Cases
Troubleshooting and Root Cause Analysis
• Follow the trail from vRealize Operations Manager to logs to get to root cause to an observed problem
• Identify the needle in the haystack in real time when troubleshooting a problem
Monitoring
Monitor metrics and events (performance & change) that are visible only in logs Identify problems proactively, ensure SLAs and comply to IT policies
Unstructured Data Warehouse
• Collect all the data in one place without the need for custom parsing, transformation of data
• Get full visibility across all your IT environment from a single place
Log Insight Technical Overview
Cloud / Data Center
Log Management
OSLogs
VCLogs
AppLogs
SystemStats
SecurityLogs
API Syslog
Analyze• Can analyze any unstructured time-series data,
configuration etc.• Automatically identifies structures in the data then uses
machine learning to group data
Scale• Central, scale-out store (no-SQL) for all collected logs• Configurable retention and archiving• Maintenance free
Best for SDDC• Queries, alerts, fields, charts in the vSphere Content Pack
CONFIDENTIAL 8
Contents• Use Cases
– Audit & Compliance & Configuration– Performance and Capacity
• Customer Sharing
• Log Management Platform
CONFIDENTIAL 9
Audit, Compliance, Configuration• Auditor related queries
– Who modified what and when– Who snapshot which VM and when– Who changed VM power status (on/off) and when
• License compliance (e.g. Oracle)
vCenter Tasks vCenter Events
CONFIDENTIAL 10
CONFIDENTIAL 11
CONFIDENTIAL 12
CONFIDENTIAL 13
CONFIDENTIAL 14
CONFIDENTIAL 15
CONFIDENTIAL 16
CONFIDENTIAL 17
CONFIDENTIAL 18
vCenter Tasks vCenter Events
CONFIDENTIAL 19
CONFIDENTIAL 20
CONFIDENTIAL 21
CONFIDENTIAL 22
CONFIDENTIAL 23
CONFIDENTIAL 26
Who Snapshot What VM?• vCenter tracks the data
CONFIDENTIAL 27
OOTB dashboard.Grouped by VM Name and Snapshot Operations Type
CONFIDENTIAL 28
You can know what time the snapshot was created or consolidated
CONFIDENTIAL 29
Who did the snapshot?
A Jedi did
CONFIDENTIAL 30
Example from Production Environment
CONFIDENTIAL 31
http://virtual-red-dot.info/monitoring-changes-to-vmware-vsphere-template/
vCenter Template• Who changed what template and when?
• Who converted which VM into template? Vice versa?
CONFIDENTIAL 32
Where have what VMs run on?
CONFIDENTIAL 33
Tracking that Oracle VM. We which ESXi Host it was running on in any day
CONFIDENTIAL 34
Performance and Capacity• Which VM hit high CPU Usage and when?
• Detailed storage latency from vmkernel
• Is vMotion impacting performance?
• When does what VSAN event happen?
• Network device monitoring
CONFIDENTIAL 35
Which VM Hit High CPU Usage and When?
CONFIDENTIAL 36
Which VM Needs More CPU?• This is the badly sized environment
• A lot of VM hit high CPU Usage in just a period of 1 week
CONFIDENTIAL 37
Detailed Storage Latency
CONFIDENTIAL 38
Detailed Storage Latency
Zooming into May 17 – 23. We also exclude all the Magnetic Disk. Device ID naa.55* is SSD, while naa.5000* is magnetic.
SSD latency is high.
CONFIDENTIAL 39
Checkpoint Firewall: CPU Temperature
Customer SharingManny Sidhu
41
Business Requirements• Auditing for Privileged User Access Management (PUAM)
• Auditing for Change Compliance
• Ability to search and export Logs entries (even after vCenter has rolled over historical logs)
Technical Requirements• Need to simplify troubleshooting for a large vSphere environment. (This was a major
requirement for the Operations Teams).
Architecture
CONFIDENTIAL 42
43
Heavily loaded regional Datacenter Log Insight instance
Equates to approx. 285 Million events per day
Auditor’s dashboard
45
Business Outcomes / Value Achieved so far…• Auditors now have visibility into Privileged User Access Management (PUAM) changes.
(Tracks any changes or spikes in activity made by PUAM users)
• Auditors now have visibility into Change Compliance events taking place within the vSphere environment
• Ability to export logs as csv files has fulfilled one of the specific audit requirements
• Reduced work effort required by audit team to sift through logs
Technical Outcomes / Value achieved so far• BEST PART – makes troubleshooting lots easier
– Got a Problem host? Log in to Log Insight console, plug in hostname, filter by name (HA event etc), adjust time interval, fix problem - DONE
– No need to generate log bundles. VMware Support remote in, take a look at the Log Insight console and away they go!
– Content packs (Vblock, SRM, SQL, etc..)– Email alerts– Centralized logging (including potential do to SNMP trap forwarding and integration with vR Ops)
• Super slick, very responsive interface
Lessons Learnt• Best to create a cluster for larger deployments (think HA and load distribution)
• Ensure QoS over remote links or keep tabs on utilization somehow
• Size them right!
• Should have deployed this sooner!
CONFIDENTIAL 47
“Adjusted” to present situation
Log Management Platform
CONFIDENTIAL 49
Log Management Platform (LMP)• A platform for logs and events from all sources
– vSphere and beyond
• Why?– Your environment is healthy. Sure. What do the logs say?
• Benefits– No need to upload log– Assisted Analysis via content pack– Help in mastering the products you’re managing– Long term archival– Portal for specific roles– All the use cases we’ve covered.
CONFIDENTIAL 50
Properties of Enterprise LMP• Remote office handling
• Easy to create dashboard and use
• Scalability
• Rich OOTB content pack
• HA
• DR– Active/Active instances at the application layer. This saves DR testing as it becomes irrelevant.
• Predictable cost– Logs can go out of hand, as apps can generate excessive logs when hitting errors/bugs
CONFIDENTIAL 51
Active Active LMP • Distributed
• Scalable– Log beyond vSphere
• Special users– Auditors– Security Team
CONFIDENTIAL 52
CONFIDENTIAL 53
Auditor Log Insight• The disk consumption is very low
• This is showing that it only has 4m events in the past 22 days
• We can keep the data on-line for years
CONFIDENTIAL 54
Distributed or Centralized• Not something we change half way. Decide carefully.
• They are the opposite. The benefits of Distributed is the disadvantage of Centralized.Vice versa.
• Benefit of Distributed– A lot more bandwidth friendly– Smaller VM
• Less performance impact on other VMs in the same cluster. Without scale-out architecture, the VM can be 8-16 vCPU VM
• Easier to back up and restore– Less risky. A failure in 1 instance does not render all unavailable.
• Disadvantage of Distributed– More Log Insight instance to manage (deploy, secure, update, upgrade)– More storage and resource required
CONFIDENTIAL 55
Remote (ROBO) Office• Syslog is chewing up WAN link
• Log is lost if WAN Link is down
• Syslog is not encrypted
• Need to tag logs so we know which DC it comes from
CONFIDENTIAL 56
Operations: Support (GSS)• Speed up resolution of SR
– VMware BCS/MCS can simply webex and analyse the logs at Log Insight– Analysis is sped up as Log Insight makes query, charting, etc. easier
• Higher chance that the log is not rotated
• Encourage self-service– Since the webex session is done together with customer engineer, this encourages knowledge transfer
during the joint analysis. This speeds up future analysis and troubleshooting as customer engineer can setup alert.
• Caveat– GSS still requires logs to be uploaded, especially for non MCS/BCS customer
Learn More
Try the Hands-on Lab. Nothing to download!
57
Visit the website for resources, 60-day free trial,
evaluation guide, and purchasing information.
@VMLogInsight
www.vmware.com/products/vrealize-log-insight
vmware.com/go/vRealize-Ops-Insight-HOL
loginsight.vmware.com/
Website:
Hands-on Lab 1701, 1710 :
Log Insight Community:
Insight into the World of Logs with VMware vRealize Log InsightIwan Rahabok, VMwareKarl Fultz, VMware
MGT7685
#MGT7685