vmware advance troubleshooting workshop - day 2
TRANSCRIPT
Introduction to vSphere Networking
Day 2
VMware vSphere: Install, Configure, Manage
Content
• Virtual Networking
• Introduction to vSphere Standard Switches
• Troubleshooting of vSS
• Scenarios
• Introduction to vSphere Distributed Switches
• Troubleshooting of vDS
• NSX
Introduction to vSphere Standard Switches
Learner Objectives
By the end of this lesson, you should be able to meet the following objectives:• Describe the virtual switch connection types• Configure and view standard switch configurations, such as virtual machine
port group, VMkernel port, VLAN, and so on
About Virtual Networks
A virtual network is a network of virtual machines running on a physical machine that are connected logically to one another so that they can send data to and receive data from one another. Virtual machines can be connected to the virtual networks that you create when you add a network.
Types of Virtual Switch Connections
A virtual switch has specific connection types: • Virtual machine port groups• VMkernel port:
– For IP storage, VMware vSphere® vMotion® migration, VMware vSphere® Fault Tolerance, VMware Virtual SAN™, and VMware vSphere® Replication™
– For the ESXi management network
Virtual Switch
Production TestDev DMZ vSphere vMotion
Management
Uplink Ports
Virtual Machine Port Groups VMkernel Ports
Virtual Switch Connection Examples
More than one network can coexist on the same virtual switch. Or networks can exist on separate virtual switches.
Virtual Switch
Production TestDev iSCSIvSphere vMotionManagement
Virtual Switch Virtual Switch Virtual Switch Virtual Switch Virtual Switch
Production TestDev iSCSIvSphere vMotionManagement
Types of Virtual Switches
A virtual network supports these types of virtual switches:• Standard switches:
– Virtual switch configuration for a single host
• Distributed switches:– Virtual switches that provide a consistent network configuration for virtual machines
as they migrate across multiple hosts
Standard Switch Components
A standard switch provides connections for virtual machines to communicate with one another, whether they are on the same host or on different hosts.
VM1
VM2
VM3
VMkernel
Test VLAN 101Production VLAN 102IP Storage VLAN 103
Management VLAN 104
Management Network
IPstorage
VNIC VNIC VNIC VNIC
Viewing the Standard Switch Configuration
You can view a host’s standard switch configuration by clicking Networking on the Manage tab.
Delete the port group.
Display Cisco Discovery Protocol information.
Display port group properties.
About VLANs
Virtual Local Area Networks (VLANs) allow you to have many virtual networks running over a single physical network
Uses a standard format to “tag” Ethernet frames IEEE 802.1Q.This information in the header tells the network device which VLAN the frame belongs in.
While VLANs do logically separate traffic if someone has access to a network segment they can see all traffic on all VLANs in that segment
Virtual Switch
VM
VLAN 105
VLAN 106
VM
VMkernel
Physical Switch
Physical NIC
Trunk Port
VLAN RECOMMENDATIONS
VLAN RecommendationsThe number of VLANs you can use is oftendictated by your networking equipmentVLAN 0 is reservedVLAN 1 is usually the default but recommended not to useMost equipment can go over 4,000 VLANs, you may not be able to use them all at once
Make sure you know which (if any) VLAN on the physical uplink switch is set as the native VLANNative VLANs are sent out untagged which can cause problems
Suggested to only trunk the VLANs you actually needDoes require more administration to add a pass an additional VLAN
As mentioned, it is suggested not to use VLAN 1 unless necessary
VLAN TAGGING
There are three places where a frame can be tagged with a VLAN.
•All VLAN tagging of packets is performed on the physical switch.•ESXi/ESX host network adapters are connected to access ports on the physical switch.
•The portgroups connected to the virtual switch must have their VLAN ID set to 0.
External Switch Tagging
•All VLAN tagging of packets is performed by the virtual switch before leaving the ESXi/ESX host.
•The ESXi/ESX host network adapters must be connected to trunk ports on the physical switch.
•The portgroups connected to the virtual switch must have an appropriate VLAN ID specified.
Virtual Switch Tagging
•All VLAN tagging is performed by the virtual machine.•You must install an 802.1Q VLAN trunking driver inside the virtual machine.•VLAN tags are preserved between the virtual machine networking stack and external switch when frames are passed to/from virtual switches.
•Physical switch ports are set to trunk port.
Virtual Guest Tagging
VLAN Header
Network Adapter Properties
A physical adapter can become a bottleneck for network traffic if the adapter speed does not match application requirements.
Review of Learner Objectives
You should be able to meet the following objectives:• Describe the virtual switch connection types• Configure and view standard switch configurations, such as virtual machine
port group, VMkernel port, VLAN, and so on
Configuring Standard Switch Policies
Learner Objectives
By the end of this lesson, you should be able to meet the following objectives:• Explain how to set the security policies for a standard switch port group• Explain how to set the traffic shaping policies for a standard switch port group• Explain how to set the NIC teaming and failover policies for a standard switch
port group
Network Switch and Port Policies
Policies that are set at the standard switch level apply to all port groups on the standard switch by default.
Available network policies:• Security• Traffic shaping• NIC teaming and failover
Policies are defined at the following levels:• Standard switch level:
– Default policies for all the ports on the standard switch.
• Port group level:– Effective policies: Policies defined at this level override the default policies that are
set at the standard switch level.
Configuring Security Policies
Administrators can define security policies at both the standard switch level and the port group level:• Promiscuous mode: Allows a virtual switch or port group to forward all traffic
regardless of the destination.• MAC address changes: Accept or reject inbound traffic when the MAC
address has been altered by the guest. • Forged transmits: Accept or reject outbound traffic when the MAC address
has been altered by the guest.
Traffic-Shaping Policy
Network traffic shaping is a mechanism for limiting a virtual machine’s consumption of available network bandwidth.
Average rate, peak rate, and burst size are configurable.
Peak Bandwidth
Average
Out
boun
d B
andw
idth
TimeBurst Size = Bandwidth x Time
Configuring Traffic Shaping
A traffic-shaping policy is defined by average bandwidth, peak bandwidth, and burst size. You can establish a traffic-shaping policy for each port group and each distributed port or distributed port group:• Traffic shaping is disabled by default.• Parameters apply to each virtual NIC in the standard switch.• On a standard switch, traffic shaping controls only outbound traffic.
NIC Teaming and Failover Policies
Administrators can edit the NIC teaming and failover policy by configuring specific options.
Load-Balancing Method: Originating Virtual Port ID
The diagram shows routing based on the originating port ID, called virtual port ID load balancing.
Virtual NICs Physical NICs
VirtualSwitch
PhysicalSwitch
Load-Balancing Method: Source MAC Hash
The diagram shows routing based on source MAC hash.
VirtualNICs
PhysicalNICs
VirtualSwitch
Internet
PhysicalSwitch
Load-Balancing Method: Source and Destination IP Hash
The diagram shows routing based on IP hash.
Virtual NICs Physical NICs
VirtualSwitch
Internet
PhysicalSwitch
Detecting and Handling Network Failure
The VMkernel can use link status or beaconing or both to detect a network failure.
Network failure is detected by the VMkernel, which monitors the link state and performs beacon probing.
VMkernel notifies physical switches of changes in the physical location of a MAC address.
Failover is implemented by the VMkernel based on configurable parameters:• Failback: How the physical adapter is returned to active duty after recovering
from failure.• Load-balancing option: Use explicit failover order. Always use the vmnic uplink
at the top of the active adapter list.
SR-IOV (Single Root IO Virtualization)
SR-IOV is a specification that allows a single Peripheral Component Interconnect Express (PCIe) physical device under a single root port to appear as multiple separate physical devices to the hypervisor or the guest operating system.
SR-IOV uses physical functions (PFs) and virtual functions (VFs) to manage global functions for the SR-IOV devices. PFs are full PCIe functions that are capable of configuring and managing the SR-IOV functionality. It is possible to configure or control PCIe devices using PFs, and the PF has full ability to move data in and out of the device. VFs are lightweight PCIe functions that support data flowing but have a restricted set of configuration resources.
The number of virtual functions provided to the hypervisor or the guest operating system depends on the device. SR-IOV enabled PCIe devices require appropriate BIOS and hardware support, as well as SR-IOV support in the guest operating system driver or hypervisor instance.
Switch Discovery Protocol
Switch discovery protocols help vSphere administrators to determine which port of the physical switch is connected to a vSphere standard switch or vSphere distributed switch.
vSphere 5.0 and later supports Cisco Discovery Protocol (CDP) and Link Layer Discovery Protocol (LLDP). CDP is available for vSphere standard switches and vSphere distributed switches connected to Cisco physical switches. LLDP is available for vSphere distributed switches version 5.0.0 and later.
When CDP or LLDP is enabled for a particular vSphere distributed switch or vSphere standard switch, you can view properties of the peer physical switch such as device ID, software version, and timeout from the vSphere Web Client.
CDP
Cisco Discovery Protocol (CDP) enables vSphere administrators to determine which port of a physical Cisco switch connects to a vSphere Standard Switch or vSphere Distributed Switch. When CDP is enabled for a vSphere Distributed Switch, you can view the properties of the Cisco switch such as device ID, software version, and timeout.
LLDP
The Link Layer Discovery Protocol (LLDP) is a vendor-neutral link layer protocol in the Internet Protocol Suite used by network devices for advertising their identity, capabilities, and neighbors on an IEEE 802 local area network, principally wired Ethernet.
Jumbo FramesJumbo frames let ESXi hosts send larger frames out onto the physical network. The network must support jumbo frames end-to-end that includes physical network adapters, physical switches, and storage devices.
Before enabling jumbo frames, check with your hardware vendor to ensure that your physical network adapter supports jumbo frames. You can enable jumbo frames on a vSphere distributed switch or vSphere standard switch by changing the maximum transmission unit (MTU) to a value greater than 1500 bytes. 9000 bytes is the maximum frame size that you can configure.
Review of Standard Switch
If a virtual machine loses network connectivity, the cause of the problem might be anywhere from the virtual machine’s NIC to the ESXi host’s physical network.
Physical NICs
Virtual NIC
Virtual NIC
Virtual NIC
vmnic0 vmnic1
ESXiHost Management
Network
Review of Learner Objectives
You should be able to meet the following objectives:• Explain how to set the security policies for a standard switch port group• Explain how to set the traffic shaping policies for a standard switch port group• Explain how to set the NIC teaming and failover policies for a standard switch
port group
Network Troubleshooting
ESXCLI Command
To troubleshoot networking configurations from the ESXi command line, ESXCLI is the tool to use or PUTTY can be used.
There are a number of options available when running ‘esxcli’ in terms of network settings:
~ esxcli network
Netcat Command
Netcat can be used to test connectivity to and from your ESXi host.
~ nc -h
VMPING Command
You can test connectivity to remote ESXi host using the ping and vmkping utilities. Using vmkping to test connectivity via vMotion interfaces is a common practice. For example:
~ vmkping 192.168.1.20
OpenSSL Command
You can use the open ssl client present on an ESXi host to test connectivity to an ssl port – for example to vCenter or to another host. To do so:
~ openssl s_client -connect 192.168.1.100:443
TCPDUMP Command
This command is used to identify the packet flow in an NIC. To display packets on interface vmk0 you can run:
~ tcpdump-uw -i vmk0 | more
ESXCFG Command
The esxcfg-nics command provides information about the physical NICs in use by the VMkernel.
This prints the VMkernel name for the NIC, its PCI ID, driver, link state, speed, duplex, and a short PCI description of the card. It also allows users to set speed and duplex settings for a specific NIC.
~ esxcfg-nics <options> [nic]
PKTCAP-UW Tool
The pktcap-uw tool is an enhanced packet capture and analysis tool that can be used in place of the legacy tcpdump-uw tool. The pktcap-uw tool is included by default in ESXi 5.5.
vMA
The vSphere Management Assistant (vMA) allows administrators and developers to run scripts and agents to manage ESXi hosts and vCenter Server systems. vMA is a virtual machine that includes prepackaged software, a logging component, and an authentication component that supports non-interactive login.
As an alternative to esxcli, you can also use the vicfg-dns command from the vMA or vSphere CLI. Running the command without any parameters will display a host’s DNS configuration:
vi-admin@vma:~> vifptarget --set 192.168.88.134vi-admin@vma:~[192.168.88.134]> vicfg-dnsDNS Configuration Host Name esxi1Domain Name vmlab.localDHCP falseDNS Servers 10.0.0.1
Network Problem 1
As an initial check from VMware vSphere® ESXi™ Shell, ping a system that is known to be up and accessible by the ESXi host.
The ESXi host has intermittent or no network connectivity to other systems.
DCUI Command
Prompt
Identifying Possible Causes
If you know that your hardware is functioning correctly, take the top-down approach to troubleshooting, starting with the ESXi host configuration.
ESXiHost
Hardware(Network, Server)
The ESXi host network configuration is incorrect.The VLAN ID of the port group is incorrect.
The speed and duplex of the network links are not consistent.
The network link is down.NIC teaming is not configured properly.
Possible Causes
The network adapter or server hardware is not supported.
The physical hardware is faulty or misconfigured.Network performance is slow.
Possible Cause: ESXi Network Misconfiguration (1)
Verify that your ESXi host network is configured properly:• Check vSphere standard switches, vmnics, port groups, and VMkernel ports:
– In VMware vSphere® Management Assistant, use vicfg-vswitch –l– In vSphere ESXi Shell, use esxcfg-vswitch –l and esxcfg-vmknic –l
• Check VLAN IDs of port groups:– esxcli network vswitch standard portgroup list
Possible Cause: ESXi Network Misconfiguration (2)
Verify that your ESXi host network configuration is configured properly:• Speed and duplex:
– vicfg-nics –l
• Network uplink and NIC status (up or down):– esxcfg-nics –l– vicfg-nics –l– esxcli network nic list
Resolving ESXi Network Misconfiguration
Adjust settings in your ESXi network configuration that are not configured properly:• Standard switches, vmnics, port groups:
– Add standard switch: vicfg-vswitch –a vswitch#– Add port group: vicfg-vswitch –A pg_name vswitch#– Add uplink: vicfg-vswitch –L vmnic# vswitch#
• VLAN IDs of port groups:– esxcli network vswitch standard portgroup set –p pg_name -v vlan_ID
• Speed and duplex:– vicfg-nics –d duplex -s speed vmnic#
• Network link status (up or down):– Connect network adapters to the intended physical switch ports.
Possible Cause: NIC Teaming Misconfiguration
Verify that NIC teaming is configured properly.
Possible Cause: Unsupported or Faulty Hardware
Verify that you are not encountering the following ESXi network hardware issues:• The network adapter or server hardware is not supported:
– vicfg-nics –l– Verify that the network hardware is listed in VMware Compatibility Guide.
• The physical hardware is faulty or misconfigured:– lspci –p
Possible Cause: Slow Network Performance
Use esxtop (or resxtop) to view key network metrics that can help identify network performance problems.
The esxtop command is available in both vSphere ESXi Shell and VMware vSphere® Command-Line Interface.
Sample resxtop Output
Review of Virtual Machine Connectivity
If your virtual machine loses network connectivity, the cause of the problem might be in the physical layer, the virtual layer, or the guest operating system itself.
OS
APP
FIREWALLVirtualMachine
Virtual NICPort Groups
Uplink Ports
VirtualSwitch
Physical NICs
Network Problem 2
As an initial check, ping the virtual machine from another system.
If the ping command fails, ping other virtual machines on the same network to determine the scope of the problem.
The virtual machine has no network connectivity.
Identifying Possible Causes
Take a top-down approach to troubleshooting, from the guest operating system to the virtual machine and the ESXi host.
Application or Guest OS
VirtualMachine
ESXiHost
Possible Causes
The port group name does not exist.The virtual network adapter is not connected.
Underlying issues with ESXi network connectivity exist.Storage or resource contention on the ESXi host exists.
IP settings are misconfigured.The firewall in the guest OS is blocking traffic.
Possible Cause: IP Settings and Firewall Problems
IP settings and problems with firewalls might cause the problem.
Check IP settings to ensure that the TCP/IP settings in the guest operating system are correct.
The firewall in the guest operating system might be blocking traffic. Ensure that the firewall does not block required ports.
Possible Cause: Port Group Misconfiguration
The port group name that the virtual machine uses is incorrect:• View the standard switch port group names on the ESXi host:
– vicfg-vswitch –l
• Verify that the virtual machine is using the correct port group.
The virtual network adapter is not connected to the port group:• Verify that the network adapter is connected to the correct port group.
Possible Cause: ESXi Network Connectivity Problems
Storage or resource contention on the ESXi host can cause network connectivity issues:• Ensure that the virtual machine has no underlying issues with storage and that
it is not in resource contention.
Problems might exist with the ESXi host network, the port group ID, the speed or duplex settings, the physical network link, or the NIC teaming configuration.
To eliminate a NIC failure or physical configuration issue, connect the virtual machine to a virtual switch that uses NIC teaming.
Possible Cause: No Available Ports on Virtual Switch
If your vSphere version is earlier to version 5.5, you might encounter a problem. The virtual switch might not have an available port for the virtual machine to connect to:• This situation can occur during a VMware vSphere® vMotion® migration.
Verify the number of configured ports:
The vicfg-vswitch command also shows the number of used ports.
Resolving the Issue of Unavailable Ports on a Virtual Switch
If the virtual switch does not have available ports for virtual machines to connect to, resolve the issue in one of the following ways:• Increase the number of ports for the virtual switch and reboot the host to make
the changes effective.• Create a new virtual switch and spread the virtual machines and port groups
across the two switches.
This problem is not relevant to vSphere version 5.5 and above.
Network Problem 3
Another symptom is that the ESXi host is successfully added to the vCenter Server inventory but disconnects 30 to 90 seconds after the task completes.
The problem is that dropped, blocked, or lost heartbeat packets are occurring between vCenter Server and the ESXi host.
An ESXi host frequently disconnects from VMware vCenter Server™.
Heartbeat Communication Between vCenter Server and ESXi
The ESXi host sends a heartbeat to vCenter Server to signal that the host is accessible by the management network.
Windows
Firewall ManagementNetwork(vmk0)
ESXi
vCenter Server
Heartbeat Sent over UDP Port 902
Identifying Possible Causes
Take a top-down approach to troubleshooting, from the vCenter Server system to the ESXi host and the hardware.
vCenter Server
ESXiHost
Windows Firewall is enabled on the vCenter Server system, and UDP port 902 is blocked.
Possible Causes
vCenter Server is not using port 902 for receiving heartbeats, or the ESXi firewall is blocking that port.
Hardware(CPU, Memory,
Network, Storage)The network between ESXi and vCenter Server is congested.
Possible Cause: Port Blocked by Windows Firewall
If Windows Firewall is enabled and UDP port 902 is blocked, view the ports blocked by Windows Firewall.
To resolve this problem, adjust Windows Firewall settings:• If ports are not configured, disable Windows Firewall.• If the firewall is configured to affect ports, ensure that Windows Firewall is not
blocking UDP port 902.
Possible Cause: vCenter Server Not Using Port 902
By default, the vpxa agent on the ESXi host sends heartbeats to vCenter Server (vpxd) through UDP port 902.
A problem might exist if the host is configured to send heartbeats over a port other than 902.
Use the less /etc/vmware/vpxa/vpxa.cfg command on the host to determine the port that is used to send heartbeats:
…
Resolving the Use of a Port Other Than 902 (1)
If you prefer to use a nondefault port for heartbeats, ensure that the ESXi firewall is not blocking that port.
Contents of heartbeat.xml
Resolving the Use of a Port Other Than 902 (2)
Check the vCenter Server configuration to verify the port number used for heartbeats.
Possible Cause: Network Congestion
If the network between ESXi and vCenter Server is congested, dropped heartbeats might occur.
To verify whether your management network is congested, use a network packet analyzer:• You can use the resxtop utility or graphical views to analyze traffic.• The pktcap-uw command is an enhanced packet capture and analysis tool.• The tcpdump-uw command is a legacy network traffic capture tool:
– Available in vSphere ESXi Shell, based on the standard tcpdump utility– For example, to display packets on the VMkernel interface vmk0, run the command:
• tcpdump-uw –i vmk0
• Wireshark is a publicly available network analyzer:– Captures live network traffic– Displays packets with detailed protocol information
Resolving Network Congestion
Resolving network congestion has both short-term and long-term solutions.
Short-term solution to this problem:• Increase the timeout limit in vCenter Server to keep the ESXi host continuously
connected.
Long-term solution to this problem:• Resolve the underlying network congestion problems.• If using distributed switches, use VMware vSphere® Network I/O Control to
reprioritize traffic and increase the number of shares for management traffic.
Network Problem 4
This problem can occur if the ESXi host’s management network was misconfigured or manipulated from the command line.
For example, you can bring a physical network card up or down with the esxcli command:• esxcli network nic up –n vmnic0• esxcli network nic down –n vmnic0• esxcli network nic list
The ESXi host cannot be managed by vCenter Server.
Preventing Loss of Management Network Connectivity
vSphere network rollback prevents accidental misconfiguration of management networking and loss of connectivity:• For example, if you try to change the IP address of your management
VMkernel interface, VMware vSphere® Web Client returns the error message in the screenshot.
Host Networking Rollback
Rollback enables you to roll back to a previous valid configuration.
The host networking rollback is triggered when a network configuration change is made that disconnects the host
Several events can trigger a host networking rollback:• Updating DNS and routing settings• Updating the speed or duplex of a physical NIC• Changing the IP settings of a management VMkernel network adapter• Updating teaming and failover policies to a port group that contains the
management VMkernel network adapter
If a network disconnects for any of these reasons, the task fails and the host reverts to the last valid configuration.
Recovering a Lost Management Network: Standard Switch
If your management network is on a standard switch and you lose management network connectivity, the solution uses the Configure Management Network option in the DCUI.
Network Restore Options in the DCUI
To restore the network through the DCUI:
1. Select Network Restore Options.
2. Perform a full network restore.
3. Repair the Management network on a misconfigured standard or distributed switch.
The Restore Network Settings option deletes all the current network settings except for the Management network.
Review of Learner Objectives
You should be able to meet the following objectives:• Provide a network troubleshooting overview• Analyze and troubleshoot standard switch problems• Analyze and troubleshoot virtual machine connectivity problems• Analyze and troubleshoot management network problems
Key Points• Virtual network connectivity problems might occur with standard switches,
distributed switches, virtual machines, or management networks.• A virtual machine connectivity problem might exist in the physical layer, the
virtual layer, or the guest operating system.• The ping command is useful when troubleshooting ESXi host and virtual
machine connectivity issues.• When an ESXi host frequently disconnects from vCenter Server, heartbeat
packets are being lost between vCenter Server and the ESXi host.• vSphere network rollback prevents accidental misconfiguration of management
networking and loss of connectivity.• A good practice is to back up your distributed switch configuration with the
vSphere Web Client whenever you make a change to the configuration.• You can use the restore or the import function to reset the distributed switch
configuration.
Questions?