vcm4528 tips and tricks with vcenter log insight (new!) · •vsphere and content pack dashboards...

65
Tips and Tricks with vCenter Log Insight (NEW!) Michael White, VMware VCM4528 #VCM4528 1

Upload: others

Post on 17-Apr-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Tips and Tricks with vCenter Log Insight (NEW!)

Michael White, VMware

VCM4528

#VCM4528

1

!2

Problem: Operate and Troubleshoot a Complex System

VMware Logs

OS and App Logs

200 ESXi Host + VMs = 200GB or 2B log events per day

Physical Infrastructure Logs

For a specific example of this think of an SRM install. It will have: !• Two vC • Two sets of hosts – each with one or more ESXi • Two SRM servers, • Perhaps vSphere Replication will be in use, so that means more appliances likely at least two but more is possible • There will be at least 1 SRA in each site, but perhaps more, • There will be at least two storage array • And there will be protected applications. !Problems can occur in any layer and symptoms can be seen in any layer. Searching for clues across the entire stack is much eaiser when you have one UI. And you do do things like search a time range across the stack!

2

!3

This is the solution. But what is it?

3

!4

Introducing VMware vCenter Log Insight

▪ VMware’s New Log Analytics Solution • Make sense of all your log data

• Best for vSphere logs, extensible to OS, app, storage and networking device logs

• Easy-to-use virtual appliance

• Simple and predictable pricing model

▪ Key Use Cases • IT Operations – Troubleshooting, Monitoring,

Root Cause Analysis

• Security Monitoring, Compliance, Business Transaction Monitoring, …

▪ Available Now! • 60-day Trial: www.vmware.com/try-vmware

4

!5

Agenda

▪ Install ▪ Configure ▪ Reporters ▪ Tagging ▪ Content Pack ▪ Scalability ▪ Examples ▪ Demo ▪ Miscellaneous ▪ The End and Thank you! ▪ (Appendix)

Lots of stuff in the Appendix – really lots and the notes field where this is written sometimes has important details.

5

!6

Install Tidbits

▪ Use FQDN for name during deploy

▪ Before power on, add disk

▪ Add 100 GB to start and figure out what you need (we’ll help)

▪ Have at least one source configured before install

▪ No spelling checker in the Network info area – double check!

▪ Data-core should be what you added + 97GB – this is storage for events

Full details and walk through of Install is in Appendix LI is a virtual appliance. !The screenshot above – taken after logging in at the console and using df –h for disk usage (human readable) – is showing 197G for data-core which is where all of the storage occurs for events sent to vC Log Insight. If you see 97 instead of 197 – that means you didn’t add storage. Poweroff and check your settings for disk, and restart. Disk change occurs at start.

6

!7

Configure

▪ Once installed, we need to configure for use

▪ Before you start configuring, change root password at console – this will enable SSH support.

Remember full install info / screenshots in Appendix. Old password is blank. Note odd parameters of password?

7

!8

Configure – Continued

▪ Now connect to the vC Log Insight URL

Only comes up first time. If you need to configure again look in the Admin menu.

8

!9

Configure – Continued

This email is not used, currently, for anything. But still point it at an operational distribution list – just in case.

9

!10

Configure – Continued

Add your license and use the Set Key button

Remember full install info / screenshots in Appendix. After you hit Set Key you will know if the license key is good and for how long. Or for how many devices. BTW, we had a bug in LI and our license for a while and that meant we saw Site in the License Type field no matter what. That was fixed in 1.5 TP 2 I believe. Now you should see OSI and we do not enforce either.

10

!11

Configure – Continued

This email address will get system messages like one month before rolling data to deletion / archive or deletion. It is important to check and leave checked the send weekly trace as it is anonymous and will help improve the product!!

11

!12

Configure – Continued

Time is critical to vC LI, and it will do its best with it but it helps to have correct time first! Everywhere. For western Canada use the NTP servers - 132.246.168.148,128.100.56.135,136.159.2.9 !Make sure to use the test button!

12

!13

Configure – Continued

You need this to work. System generated messages, and alerts, will be sent out. So make sure to use the Send Test button.

13

!14

Configure – Continued

Make sure to Test. This account for vC Ops integration needs to be admin level and I have had the best luck with the admn account. Don’t forget to push the Enable button. You will need to configure vC connections after you are configured and can log in. We are working to improve that. !

14

!15

Configure – Continued

This is to allow you to use AD groups or users as your authentication source. Notice the odd format of the AD account? We are working to improve that. This account is not an admin level account and is in the Domain Users group in AD. !

15

!16

Configure – Continued

More info on this is ahead.

16

!17

Configure – Continued

Almost finished, just need to restart.

17

!18

Configure – Continued

Here we are. And you see why now I like to have a source working before I install / configure.

18

!19

Configure – Continued

▪ Now we need to configure our connection to vC,

▪ Log into LI, and access the Administration area and change to vSphere as seen below.

Make sure to test!. Use vC read only account for vC. LI collects Events, Alarms, and Tasks from vC and can retain them longer than the vC database. For now, please only connect in here up to two vC’s. That limit will be increased in the future. It is a soft limit and if you have a very beefy LI machine you likely will not have an issue if you add more. !The second option will configure your ESXi hosts. The little help popup will tell you a bit more. But if you already have this configured no need to worry about it.

19

!20

Sources

▪ Whole stack is key!

▪ Storage – some easier than others

▪ Networking – Cisco, vCNS – both easy

▪ ESX(i) – easy

▪ vCenter (vC) – harder

▪ vCenter Server Appliance (vCSA) – easy but with a catch

▪ View – can send only events but not not anything else – so treat like Windows vC

▪ Things to know • Links in Appendix

• ESXi stops reporting when interrupted – needs attention

If ESXi has a network interruption with the syslog server, it will stop sending syslog traffic. OK, but when network comes back ESXi doesn’t start sending again. We are working on fixing this. But a link in the Appendix will detect this and alert, or execute a script to fix. View can forward events but not the rest of View logs, or PCoIP logs. We will improve that in the next several releases.

20

!21

Sources – Continued

▪ Things to know – Continued • Windows is harder – need to use a forwarder – I use Datagram

• When using a forwarder log location is key – Check Appendix for locations

Screen above is showing vC, but with the View log locations you could be monitoring View! Or even VNX log files if you know the location of them on the Control Station. You can use Snare or CGYWIn or do something with a script. I would recommend Datagram as seen here, or Snare / Epilog. CYGWIN means installing Linux on Windows and while that may have advantages it is a little overkill I think. !It should be noted that I currently use Datagram but it is a product that is dead and not being improved. Snare for Windows, and Epilog have been improved, in fact at my recommendation – I think – but I have not tested them yet. !http://www.intersectalliance.com/projects/SnareWindows/ !http://www.cygwin.com/

21

!22

Sources – Continued

It is hard to see but the highlighted field is the log file location. Suggest Settings will examine log file and make suggestions about how things should be configured.

22

!23

Tagging

▪ Important for when you have one host or VM with many log files being sent to LI

▪ Doing a search will normally search all of the log files from a host

▪ If you use tagging, you can do a search on host AND tag, and assuming one tag per log file you can do a much more granular search which is quicker and more applicable

In the future VMware may do this for you in some of our products.

23

!24

Tagging – Continued – No Tagging – on a vCSA

# vpxd source log

source vpxd {

file("/var/log/vmware/vpx/vpxd.log" follow_freq(1) flags(no-parse)); file("/var/log/vmware/vpx/vpxd-alert.log" follow_freq(1) flags(no-parse)); file("/var/log/vmware/vpx/vws.log" follow_freq(1) flags(no-parse)); file("/var/log/vmware/vpx/vmware-vpxd.log" follow_freq(1) flags(no-parse)); file("/var/log/vmware/vpx/inventoryservice/ds.log" follow_freq(1) flags(no-parse)); }; # Remote Syslog Host destination remote_syslog { udp("a.b.c.d" port (514)); ;

# Log vCenter Server vpxd log remotely

log { source(vpxd); destination(remote_syslog); };

So a search will search all log files, even when specified by host, as there is no way to separate search's or confine them to s specific file. !BTW, this goes at the bottom of the syslog-ng.conf file which is in /etc/syslog-ng.conf and you will need to do service syslog restart after you do the edits.

24

!25

Tagging – Continued

So using the tags looks like:

!▪ file("/var/log/vmware/vpx/vpxd.log" follow_freq(1) log_prefix(“VC_APP: “) flags(no-parse)); ▪ file("/var/log/vmware/vpx/vpxd-alert.log" follow_freq(1) log_prefix(“VC_ALERT: “) flags(no-parse)); ▪ file("/var/log/vmware/vpx/vws.log" follow_freq(1) log_prefix(“VC_VWS: “) flags(no-parse)); ▪ file("/var/log/vmware/vpx/vmware-vpxd.log" follow_freq(1) log_prefix(“VC_VMW_VPX: “) flags(no-parse)); ▪ file("/var/log/vmware/vpx/inventoryservice/ds.log" follow_freq(1) log_prefix(“VC_IS: “) flags(no-parse));

So at the top is the updated lines for syslog-ng.conf, and the bottom is a tail of the syslog file showing the events with the tags. !Thanks to William Lam for the screenshot. Check out his blog on this here: !http://www.virtuallyghetto.com/2013/05/how-to-add-tag-log-prefix-to-syslog.html

25

!26

Tagging – Continued – Normal

So when using the tags to search looks like:

!;

Note how we are searching for a particular host, which since it has many log files, we are searching just all of them except for one, for a particular condition. The value of this means we can search for errors in all log files except for one, or maybe just search for errors in one log file. Can make things faster but also with less distractions.

26

!27

Content Packs

▪ A Content Pack provides best practices and knowledge about the logs

▪ It consists of: Queries, alerts, dashboards and field extractions

▪ VMware and our partners are working on Content Packs

▪ vSphere Content Pack • Ships out of the box • Knowledge about ESXi and vCenter Server logs as well as

vC Alarms, Events & Tasks • It consists of: Queries, alerts, dashboards and field

extractions • Divided into functional categories • ESX, Storage and vCenter including Alarms • vSphere and Content Pack dashboards are NOT editable –

users can clone them into their workspace

When working in the Content Pack area, you can in fact export your own dashboard as a content pack. The whole thing goes, no option to filter or take only a subset. Including alerts is quite cool. Check the Alerts that come with vSphere for two great examples. The vSphere Content Pack has two most excellent alerts so make sure you always check out the content pack via the Admin area to see what it has. And check out the vSphere alerts!

27

!28

Content Packs – Continued

This is the vSphere Content Pack in the Content Packs area. Note – this screen is from post 1.0. But if you don’t see it in your Log Insight you will soon!

28

!29

Content Packs – Continued

This is the vSphere Contact Pack in the dashboard area ready for use.

29

!30

Content Packs – Continued

This is the vSphere Contact Pack in the dashboard area ready for use.

30

!31

Announcing the Log Insight Content Pack Market Place

And more… !

https://solutionexchange.vmware.com/store/loginsight

Extend vCenter Log Insight with Content Packs from:

BTW, some of the Vendors on this list – View and Puppet had an issue in their Content Pack for a short time. So if you cannot import them you should download them again.

31

!32

Scalability – Guidelines

▪ Watch ‘outside’ of VM with your normal tools, i.e vC Operations Manager

▪ Watch ‘inside of vC LI with System Monitor

The virtual appliance ships with 2 vCPU and 8 GB of memory. !IOPS are query driven – the more queries you issue against the system, the more IOPS you will need. Please note that alerts count as queries when enabled.

32

!33

Scalability – Guidelines – Continued

Notice Storage, Memory and CPU here. Should watch outside using vSphere vC or vC Operations Manager, and inside with these tools.

33

!34

Scalability – Guidelines – Continued

Watch for dropped events here they can indicated a very busy LI VM or even network issues. But they are often the sign of an issue that is not seen in many places yet. Note that this will only show dropped TCP packages. UDP can be dropped before they even get to LI.

34

!35

Scalability – Guidelines – Storage

▪ In case we misjudge on storage, enable Data Archiving

▪ Remember that events, once in vC LI are rotated out as disk space usable is reduced – either to trash or Data Archiving (system alert) – first in, and first out

▪ If you have to import archived events, than use new instance of LI!

▪ Rough guide – 250 MB per day per ESX host, and 50 MB per day for other devices – retention time is decided by available storage and archiving

35

!36

Scalability – Guidelines – Storage

▪ You can enable Data Archiving on the Storage window in Administration. Once enabled you will be alerted when Archiving is about to occur. At that time you can add disk or not!

▪ Archiving occurs in 1 GB chunks but events remain in LI. Once storage is constrained events are retired – and are gone but archived copy still there.

The screenshot above shows configuring storage during post install configuration. But the screen is very similar to what would be seen in the Administration area. !LI will alert before events are deleted or retired. However, once archived there is no alerts until it is actually full. No warning alerts. Important note: If you restart LI that is an interruption for ESXi v5.x and they will stop logging to LI. The appendix that has a link to a blog article that can help you setup a vC alert on this occurring, and in fact restart the logging in the ESXi hosts automatically via a script.

36

!37

How Much Disk Space for 30 Days Retention?

▪ Gross estimate: 267 bytes/message

▪ This example: 23*267*60*60*24*30 = ~16GB per 30 days

▪ More accurate estimations can be found in runtime.log

▪ During failures, log volume will increase significantly • Overprovision!

Runtime.log will show estimated event size – than you times the number of events per second, than minutes, than hours, hours per day, days per month. And you get a very large number but that is what you will accumulate roughly on a 30 day month cycle.

37

!38

Examples – Bad Credentials

It turns out there that the host 10.140.50.122 is trying to login to the vCenter Server EVERY FEW SECONDS, using the Credentials MGMT\ADMINISTRATOR, but this login was failing due to bad credentials. It turned out the password was changed, and not updated on the solution (in this case vC Ops adapter).

38

!39

Examples – High Latency by Host

This illustration is showing that high latency exists. Here we are looking at the high latency by host, and as you can see we can see when the peak times are.

39

!40

Miscellaneous

▪ Support Log • UI – On the Appliance page of Settings Administration

• CLI – log in on console and execute ! loginsight-support • With every support call!

▪ Backup • VDP, VDPA, etc.

• Image

▪ vC Ops • Launch in Context

For vC Ops remember that most exciting is launch in context, and the reporting alerts in vC Ops is not all there yet. Make sure you test vC Ops and vC integration during configuration but you can also do it in Admin menu – VMware Integration. Backup is or should be done at the VM level using tools like VDP, VDPA, Veeam, etc.. Backup the whole VM and restore the whole VM.

40

!41

Miscellaneous

▪ vC Ops • Launch in Context

For vC Ops remember that most exciting is launch in context, and the reporting alerts in vC Ops is not all there yet. Make sure you test vC Ops and vC integration during configuration but you can also do it in Admin menu – VMware Integration.

41

!42

Miscellaneous – Continued – Alerts

• vC Ops option requires the integration enabled and email requires SMTP

• User alerts are different from system alerts

• The admin cannot disable individual alerts

When a user receives an alert, he has in the email message a link back to the query that caused the alert! Raise an alert has a number of choices and one of them will depend on the query type. The vSphere Content Pack comes with several cool alerts – one is vC server stops logging? Or red alarms in vC?

42

!43

Miscellaneous – Continued

▪ Upgrades / Updates • Will be a short outage

• Can update / upgrade in the UI – Administration \ Appliance area

• In-place which makes it easy

• Can also upgrade in CLI - get .rpm same place on vmware.com you got .ova

• SCP update file to LI in /tmp and execute with rpm –Uvh file_name

• Then test and check Settings \ About for new version – does it match?

The version / build in the RPM should match what is seen in the Settings \ About when you are complete. Note use of –U and NOT the expected –ivh? **Must use –Uvh and if you don’t will get errors in path. After the errors it is OK to do the –Uvh and continue. If you get an error 500 after the upgrade than restart the VM. This is not normally necessary.

43

!44

Miscellaneous – Continued

▪ Fixing IP issues • Not too hard but tricky – is

best to get it right!

• Install again correctly is great choice

• vApp modifications is other choice – make sure VM is off, and than Edit Settings – vApp Options

• Not aware of any other safe alternatives!

Do it right the first time is key. But vApp stuff works too. But best thing is to do FQDN and get IP info right!

44

!45

Demo

Important notes – in Interactive Analytics, using Custom Time Range, last week / day / hour, yesterday, day before and now all work. You can use mw* but not *mw, or mw? but not ?mw. Will show off in a UI tour – Interactive Analytics, and Peak at vSphere Content Pack, along with alerts and Health.

45

!46

Summary

▪ Source(s) working first

▪ Add disk at the beginning to avoid outages

▪ Ensure SMTP / vC / vC Operations connections good!

▪ Set a good system email address destination

▪ Monitor disk / processor carefully at first

▪ Use Data Archiving

▪ Most important – make sure your entire stack is reporting

▪ Update as often as you can!

46

!47

Other VMware Activities Related to This Session

▪ HOL: VMware Log Insight

▪ VMware Booth: VMware Cloud Operations

▪ Breakout Session: Deep Dive, VCM4445 – check schedule

▪ 5 Free License Trial available when you follow @vmLogInsight

▪ HOL:HOL-SDC-1301 VMware vCenter Log Insight - Unchained from the Allegory

47

THANK YOU

48

FILL OUT A SURVEY

Every Completed Survey Is Entered Into

a Drawing for a $25 VMware

Company Store Gift Certificate

49

Tips and Tricks with vCenter Log Insight (NEW!)

Michael White, VMware

VCM4528

#VCM4528

50

!51

Appendix

▪ Links • Configuring Remote Syslog on VMware products - http://sflanders.net/

2013/06/24/configure-remote-syslog-on-vmware-products/

• Datagram - http://www.syslogserver.com/syslogagent.html

• Release notes - http://www.vmware.com/support/log-insight/doc/log-insight-10-release-notes.html

• NetApp syslog - https://communities.netapp.com/docs/DOC-5048

• vCloud Suite - http://www.virtuallyghetto.com/2013/06/forwarding-logs-from-vcloud-suite-to.html - includes a script to help which include tagging!

• ESXi, syslog, and logins – great blog about how to capture logins – of different types in ESXi - http://blogs.vmware.com/vsphere/2013/07/capturing-logins-to-esxi-by-a-root-account.html

• Symmetrix - http://codyhosterman.com/2013/07/10/using-vmwares-vcenter-log-insight-with-symmetrix-vmax/

I use Datagram to forward Windows text files or event logs to syslog.

51

!52

Appendix – Continued

▪ Links – Continued • Detecting stopped ESXi syslog forwarding - http://www.virtuallyghetto.com/2012/07/

detecting-esxi-remote-syslog-connection.html - important, and I suggest using script option

• VM Monitoring log forwarding - http://www.virtuallyghetto.com/2013/07/a-hidden-vsphere-51-gem-forwarding_10.html

• Install and Admin Guide - http://www.vmware.com/pdf/log-insight-10-install-admin-guide.pdf

• Users Guide - http://www.vmware.com/pdf/log-insight-10-users-guide.pdf

• Security Guide - http://www.vmware.com/pdf/log-insight-10-security-guide.pdf

• Sample for firewall - http://www.virtualclouds.co.za/?p=740

• Sending Alerts to vC Ops - http://www.virtualclouds.co.za/?p=771

• Location of log files for VMware products – http://kb.vmware.com/kb/1021806

• LI community - http://loginsight.vmware.com

• Try it out - http://www.vmware.com/go/try-log-insight

The Detecting stopped ESXi syslog forwarding article is important to know. It talks about how to alert for it, but also how to trigger a script that restarts the ESXi syslog which is great.

52

!53

Appendix – Continued

▪ Links – Continued • http://sflanders.net/2013/09/23/log-insight-remote-syslog-architectures/

• http://sflanders.net/2013/11/04/sending-netflow-syslog/

• http://sflanders.net/2013/10/25/syslog-agents-windows/

• http://sflanders.net/2013/10/22/syslog-agents-linux/

• http://sflanders.net/2013/11/07/managing-fields-log-insight/

The Detecting stopped ESXi syslog forwarding article is important to know. It talks about how to alert for it, but also how to trigger a script that restarts the ESXi syslog which is great.

53

!54

Architecture Overview: Log Insight Deployment Option 1

Considerations: • Good for log

management greenfield • Less flexible as syslog-ng can

split the logs into multiple destinations (e.g. one to syslog one to local disk) but LI cannot. Some senders might still be able to split reporting

• One UI for everything! • Easy !

ESXi #1

ESXi #2 … ESXi

#n

No syslog-ng/rsyslog

Log Insight

Windows

Epilog or Datagram Syslog Agent for file-to-syslog

http://sflanders.net/2013/09/23/log-insight-remote-syslog-architectures/

54

!55

Architecture Overview: Log Insight Deployment Option 2

Considerations: • Requires managing

another syslog server • More flexible as syslog-ng

can split the logs into multiple destinations (e.g. one to syslog one to local disk)

• For large installations can be more scalable as you can have multiple levels of rollups (e.g. one for each “pod” or datacenter)

!

ESXi #1

ESXi #2 … ESXi

#n

Syslog relay

Using a syslog-ng/rsyslog relay

Log Insight

Windows

Epilog or Datagram Syslog Agent for file-to-syslog

http://sflanders.net/2013/09/23/log-insight-remote-syslog-architectures/

55

!56

Appendix – Continued

▪ Install Outline

▪ Working in vSphere Web Client

56

!57

Appendix – Continued

▪ Install Outline – Continued

▪ vSphere Web Client doesn’t see .ova by default (.ovl) so you need to switch to see it – should be different soon – maybe!

57

!58

Appendix – Continued

▪ Install Outline – Continued

▪ Most Important – use fully qualified domain name!

Very important. In fact do it for all virtual appliances from now on. Helps with a variety of things – including in vCSA with joining the AD or not if you did not do FDQDN.

58

!59

Appendix – Continued

▪ Install Outline – Continued

▪ Make sure to have enough space for now, and room to grow!

59

!60

Appendix – Continued

▪ Install Outline – Continued

▪ No spelling checker here – get it all right!!

Important here to get right info. Is hard to fix. Odd results if you get wrong info in IP spots.

60

!61

Appendix – Continued

▪ Install Outline – Continued

▪ No power on, as we need to adjust disk to start

61

!62

Appendix – Continued

▪ Install Outline – Continued

New disk – 100 GB to start. Good number to add as it should give good time to figure out what you really need.

62

!63

Appendix – Continued

▪ Install Outline - Continued

63

!64

Appendix – Continued

▪ Install Outline – finished!

Note the info for different kinds of keyboards!

64

THANK YOU

65