dc compute san - jive software · dc compute san. carlos lopez ccie san, dc #21063. david kester...

61
DC Compute SAN Carlos Lopez CCIE SAN, DC #21063 David Kester CCIE SAN #19555 Ed Mazurek CCIE SNA/IP, SAN #6448

Upload: trinhkiet

Post on 31-Aug-2018

230 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

DC Compute SANCarlos Lopez CCIE SAN, DC #21063

David Kester CCIE SAN #19555

Ed Mazurek CCIE SNA/IP, SAN #6448

Page 2: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 2TAC-Time

Introductions• Carlos Lopez

• CCIE SAN, DC #21063

• Technical Leader TAC

• Ed Mazurek

• CCIE SNA/IP, SAN #6448

• Technical Leader TAC

• David Kester

• CCIE SAN #19555

• Team Leader Storage Networking

Page 3: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

• Components

• Topology

• Nexus FC NPV vs FCoE-NPV

• Bugs

• MDS Slow Drain Troubleshooting Enhancements

Agenda

Page 4: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

Components

Page 5: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco Multi-Protocol Product Portfolio: SAN, LAN, and Compute

12+ Years of Proven NX-OS Operating System Cisco Prime Data Center Network Manager (DCNM)

Consistent, Simplified Features, Management, and Programmability

Cisco MDS9700

48x16G Line-Rate FC

LAN/SAN SAN 16G COMPUTE

Cisco UCS C-SeriesRack Servers

Cisco UCS B-SeriesBlade Servers

Cisco UCS 6200 Series FI

Cisco Nexus 9000Cisco Nexus 7000

Cisco Nexus 5600Cisco Nexus

5500

CiscoNexus 3000

CiscoNexus 2000

Cisco MDS9250i

Cisco MDS 9148S

48x10GE Line-Rate FCoE

Cisco MDS9396S

Nexus 5672UP-16G

Cisco MDS9718

24x40GELine-Rate FCoE

16G FC: Nexus 2348UPQ

Cisco UCS 6300 Series FI

Page 6: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco MDS 9000 Series 16G FC Director Switches

Cisco MDS 9706Director

Cisco 48x16G Line-rate FC Module

Cisco 48x10G Line-rate FCoE Module

Driving Innovations for the Next Decade with a complete 16G PortfolioDeploy Small, Medium, Large SANs with Cisco MDS 9000 Family

Cisco MDS 9710Director

Future Proof Reliable Multi-Protocol Flexibility Investment Protection Ease of Management

Cisco MDS 9718Director Cisco 24x40G FCoE

Line-rate FCoE Module

Page 7: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco MDS 9000 Series 16G FC Fabric Switches

Driving Innovations for the Next Decade with a complete 16G PortfolioDeploy Small, Medium, Large SANs with Cisco MDS 9000 Family

Cisco MDS 9148S 16G FC Fabric Switch

Cisco MDS 9396S 16G FC Fabric Switch

Cisco MDS 9250i 16G Multi-Service Fabric Switch

Pay-as-You-Grow Enterprise Class Features Reliability Multi-Protocol Flexibility Ease of Management

Page 8: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Cisco MDS 9700 Directors Comparison

9 RU

Hardware Feature MDS 9706 MDS 9710 MDS 9718Line Card slots 4 8 16Line rate port @ 16Gbps FC or 10 Gbps FCoE 192 384 768Line rate ports @ 40 Gbps FCoE 96 192 384Fabric Module slots (available / default) 6 / 3 6 / 3 6 / 6Sup Slots 2 2 2Fabric Module location Rear Rear RearAirflow Front to Back Front to Back Front to BackPower Supply slots 4 8 16Power Consumption (Typical/Max) 2425W / 2620W 4615W / 5020W 4742/8462W

14 RU

26 RU

Winning Points

• 32G FC line-rate ready• Interchangeable line cards• Redundant hardware • Common PSUs, Linecards• Single OS, Management• Better UCS Interoperability

Page 9: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Converged FEX ArchitectureEnabling Director-class resiliency at Converged Access

• Multi-Protocol Storage and Host Connectivity: FC, FCoE and IP

• Converged Architecture includes Cisco Data center portfolio: SAN(MDS 9700), LAN(Nexus 2k-7k) and Compute (UCS Chassis) accessibility

Page 10: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

MDS 9250i Multiservice Fabric Switch

FeaturesMulti-Protocol Support• 16G FC, 10GE FCoE, 1GE/10GE FCIP, iSCSIIntelligent Storage Services for FC and FCoE SANs• Fiber Channel over IP (FCIP)• IO Accelerator (IOA)• Data Mobility Migration (DMM)• Integrated Management via Data Center Network

Manager (DCNM)FICON Certified

BenefitsSingle Platform for deploying Storage Services across FC, FCoE and IP based Storage Area Networks (SANs)

• High-Bandwidth SAN Extension across MAN/WAN• Vendor independent array migration tools • Interoperate data between FC and FCoE arrays

Page 11: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

MDS 9250i Overview• Next Generation Multiservice Intelligent Services-oriented Fabric Switch

• Provides FCIP, IOA and DMM

• Integrated 40x16G FC, 8x10GE FCoE, and 2x1GE/10GE FCIP/iSCSI ports

• Enclosure: 2 RU; Redundant and hot-swappable power supplies and fan trays

Console

USBMgmt0

Page 12: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

High-Performance, Easy to Deploy, Enterprise-class Fabric Switch

Cisco MDS 9148S Fabric Switch

VERSATILE EASY TO USE ENTERPRISE-CLASS• Line-rate 16/8/4/2G FC Ports• Industry-leading port range

Start with 12-port baseScale up with 12-port licenseOr, full 48-port option available

• Automated Provisioning• Quick Configuration Wizard• Same OS and Management across

Industry’s broadest SAN Portfolio

• Non-disruptive software upgrades• Up to 32 Virtual SANs (VSANs)• Inter-VSAN Routing (IVR), QOS,

PortChannels, N-Port ID Virtualization (NPIV), N-Port Virtualization (NPV), Comprehensive Security

• Hardware-based slow-drain detection and recovery

Back

Dual Power Supplies and Fans for Enterprise-Class Availability

Front

48 x 16G FC Line Rate Performance12- to 48-ports in 12-port increments

1 RU

Page 13: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Introducing MDS 9396S 96-Port 16G Fabric Switch

Cisco MDS 9396S

Versatile • Start with 48-port base; Scale up with 12-port license Or full 96-port option available

Easy to Use• Automated provisioning• Quick Configuration Wizard• Same OS and management across industry’s broadest SAN portfolio

Enterprise-Class

• Dual-power supplies and fans, non-disruptive software upgrades• Up to 4095 B2B credits per port (MDS 9396S); up to 253 B2B credits per port (MDS 9148S)• Up to 32 Virtual SAN (VSANs)• Hardware-based slow-drain detection and recovery, Inter-VSAN routing, QoS,

PortChannels, N-Port ID Virtualization (NPIV), N-Port Virtualization (NPV) • Forward Error Correction, Link Encryption (FC TrustSec)

Industry’s Most Affordable 16G Fabric Switch Family

2 RU

Page 14: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

All ports in a port group can have maximum of 500 B2B credits

Enterprise license enables extended credits that means up to 4095 B2B credit per port in a port group

Port group can have maximum of 4150 B2B credits.

Best Practice: Avoid grouping all E ports in same port group/IOSlice Generic Formula: For every 1 KM distance with 1GB speed, we need .5 BB

credit for standard FC frame (2112 bytes). CLI command: show port-resources module 1

MDS 9396S B2B credits

Page 15: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

Topologies – UCS-FI N5K MDS

Page 16: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Page 17: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Separation makes sure that the design is highly available even when one of the fabrics goes down.

Page 18: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Redundant redundancy is not required

Page 19: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Page 20: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Working with TAC – Topology Commands

`show topology`

FC Topology for VSAN 1 :

Interface Peer Domain Peer Interface Peer IP Address(Switch Name)

---------------------------------------------------------------------

port-channel 6 0x62(98) port-channel 6 10.10.10.2 (sw201A)

20

Use the ‘show topology’ command to display the interswitch links

Page 21: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Working with TAC – Topology Commands

Core# show fcs ie

IE List for VSAN: 1----------------------------------------------------------------------------

IE-WWN IE Mgmt-Id Mgmt-Addr (Switch-name)

----------------------------------------------------------------------------

20:01:00:0d:ec:39:19:c1 S(Rem) 0xfffc0e 10.10.10.1 (sw204A)

20:01:00:0d:ec:39:1a:01 S(Rem) 0xfffc03 10.10.10.9 (sw202A)

20:01:00:0d:ec:fb:88:41 S(Loc) 0xfffc65 10.10.10.3 (sw200A)

20:01:00:2a:6a:8c:0b:01 S(Adj) 0xfffc62 10.10.10.2 (sw201A)

Loc = Local = this switchAdj = Adjacent = connected switchRem = Remote = more than one hop away 21

Use the ‘show fcs ie’ command to display all the switches in the VSAN

Page 22: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Core# show fcns database npv

------------------------VSAN 1------------------------NPV NODE-NAME :20:01:00:0d:ec:51:06:01NPV IP_ADDR :14.16.134.192NPV INTERFACE :port-channel 30CORE SWITCH WWN :20:00:00:0d:ec:24:ef:c0CORE INTERFACE :Po105

Working with TAC –Topology CommandsUse the show fcns database npv command to find Cisco NPV switches.

22

Page 23: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Working with TAC –Topology CommandsUse the show npv flogi-table command to shows where the device is attached and it’s uplink.

23

NPV# show npv flogi-table--------------------------------------------------------------------------------SERVER EXTERNALINTERFACE VSAN FCID PORT NAME NODE NAME INTERFACE--------------------------------------------------------------------------------fc1/9 1905 0x0d0060 10:00:00:00:c9:71:04:4e 20:00:00:00:c9:71:04:4e Po30

Page 24: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Working with TAC –Topology Commands

24

NPV# show npv internal info external-interface all | grep addr:

fabric mgmt addr: 10.17.150.20

fabric mgmt addr: 10.17.150.20

Note: Ensure both links are connected to the same upstream IP address. This tells you which upstream switch the UCS FI is connected to in case you need to connect to upstream switch to verify the connectivity.

Page 25: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

Nexus FC NPV vs FCoE-NPV

Page 26: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 26TAC-Time

Nexus FC NPV vs FCoE-NPVFC and NPV FCoE-NPV

FCoE

FC or FCoE

Nexus or MDS NPIV

N5K FCoE-NPV

FCoE FCoE Only

FCoE

FC or FCoE

Nexus or MDS NPIV

N5K NPV

FC or FCoE

FC

Both FC and FCoE

Page 27: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 27TAC-Time

Nexus FC NPV vs FCoE-NPVEnabling FC and NPV

First Enable FCoE and then NPV

N5K(config)# feature fcoeFC license checked out successfully fc_plugin extracted successfully FC plugin loaded successfully FCoE manager enabled successfully FC enabled on all modules successfully

N5K(config)# feature npvVerify that boot variables are set and the changes are saved. Changing to npv mode erases the current configuration and reboots the switch in npv mode.Do you want to continue? (y/n):y

FC or FCoE

Nexus or MDS NPIV

N5K NPV

FC or FCoE

FC FCoE

Also enable fcoe qos

Page 28: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 28TAC-Time

Nexus FC NPV vs FCoE-NPVEnabling FC and NPV

FC or FCoE

Nexus or MDS NPIV

N5K NPV

FC or FCoE

FC FCoE

Actually, after enabling fcoe, then NPV, What really happens…

N5K(config)# feature npvVerify that boot variables are set and the changes are saved. Changing to npv mode erases the current configuration and reboots the switch in npv mode.Do you want to continue? (y/n):y

When the switch is reloaded in the NPV mode, Some configuration is saved:

switchnamemanagement ip configuration and vrfboot variableusername / password detailsntp configurationcallhome configurationsnmp-server detailsfeature fcoe

Page 29: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 29TAC-Time

Nexus FC NPV vs FCoE-NPVEnabling FCoE-NPV

FC or FCoE

Nexus or MDS NPIV

N5K FCoE-NPV

FCoE

FCoE

Enable fcoe-npv

n5k(config)# feature fcoe-npvFCoE NPV license checked out successfully fc_plugin extracted successfully FC plugin loaded successfully FCoE manager enabled successfully FCoE NPV enabled on all modules successfully

No reload

No reconfiguration

fcoe qos still required

Page 30: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 30TAC-Time

Nexus FC NPV vs FCoE-NPV ComparisonFC NPV FCoE-NPV

Protocols FC and/or FCoE FCoE

License FC_FEATURES_PKG FCOE_NPV_PKG

Commandfeature fcoefeature npv

feature fcoe-npv

Write Erase Reload feature npv no

Nexus Models N5K, N6K N5K, N6K, N9K

FCoE QoS Required Required

Disable FKA on Core for ISSU No if uplink is FC yes

Page 31: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 32TAC-Time

Nested NPV – Can I connect two NPV Switches?

Cisco NPIV

CiscoNPV+NPIV

FC or FCoE

Cisco NPV

Cisco NPIV

CiscoNPV+NPIV

FC or FCoE

After enabling NPV, NPIV can also be enabled

Connecting two Cisco NPV switches is not supported

Unsupported Supported

3rd Party Vendor

Page 32: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

Bugs

Page 33: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 34TAC-Time

Frequent BugsCSCun41202 - Weak CBC mode and weak ciphers should be disabled in SSH server

Symptom: SSH servers on Cisco Nexus devices may be flagged by security scanners due to the inclusion of SSH ciphers and HMAC algorithms that are considered to be weak.

These may be identified as 'SSH Server CBC Mode Ciphers Enabled' and 'SSH Server weak MAC Algorithms Enabled' or similar.

Conditions: This issue applies to Cisco Nexus 7000, Cisco Nexus 5000 and MDS 9000 series switches. SSH functionality is enabled by default in Cisco NX-OS. The current SSH server status is displayed using the show ssh server command.

Page 34: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 35TAC-Time

Frequent BugsCSCun41202 - Weak CBC mode and weak ciphers should be disabled in SSH server

With the Fix: If an SSH client configured to use weak ciphers is used to log in to a Cisco device with this fix, the login may fail. The following messages are logged in the switch syslog:%DAEMON-2-SYSTEM_MSG: fatal: no matching cipher found: client 3des-cbc,blowfish-cbc server aes128-ctr,aes192-ctr,aes256-ctr - sshd

Reconfigure any SSH clients not to use weak ciphers like 3des-cbc or blowfish-cbc. DCNM uses SSH to manage Cisco devices and must be upgraded to at least 7.2(1) to work with devices with this fix.

Known-fixed-releases: 7.3(0)N1(1) 7.2(1)N1(1) 7.2(0)N1(1) 6.2(11c) 5.2(8g)

Page 35: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 36TAC-Time

Frequent BugsCSCue79881 - SNMP crashes on SNMP bulk get query

Symptom: The SNMP process may crash with the following messages displayed in the output of show logging log%KERN-2-SYSTEM_MSG: mts_is_q_space_available_new():1416:Total mtsbufsize 10070872 for sap 28, exceeds limit 15 perc of 67108864 - kernel

%KERN-2-SYSTEM_MSG: mts_acquire_q_space() failing - no space in sap 28, uuid 26 send_opc 3176, pid 3616, proc_name sctpt_rx_thr - kernel

%KERN-2-SYSTEM_MSG: [sap 28][pid 4406][comm:snmpd] sap recovering failed and so Killed - kernel

%SYSMGR-2-SERVICE_CRASHED: Service "snmpd" (PID 4406) hasn't caught signal 6 (core will be saved)

Page 36: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 37TAC-Time

Frequent BugsCSCue79881 - SNMP crashes on SNMP bulk get query

Conditions: This bug affects both Nexus and MDS switches. It has been observed when a monitoring device is using snmp-bulk-get requests on the entity-MIB for multiple FEX modules at one time, or if there is continuous polling from multiple polling stations on slow mibs.

Some examples of mibs that may be affected by continuous snmp bulk walk are: qos mibentity mibentity-fru mibbridge mib

Page 37: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 38TAC-Time

Frequent BugsCSCue79881 - SNMP crashes on SNMP bulk get query

Workaround: A possible workaround is configuring the no snmp-server counter cache enable command. This command prevents SNMP bulk gets from getting cached via the use of MTS buffers. This will prevent the MTS buffers from getting consumed and resulting in a process crash. The result of the command is that the interface table might be slower to update the statistics (since caching is disabled).

• Note: This workaround is only available on Nexus 7000 switches.

This defect is fixed in NX-OS releases 5.2.9, 6.1.4 (Nexus 7000), 5.2.8g (MDS), and 6.2.1 (Nexus 7000 and MDS), 6.0(2)N1(2a) (Nexus 5K and 6k)

Further Problem Description: A possible way of verifying if you are affected by this bug is to issue the command show system internal mts buffers summaryand check if notifications for sap 28 are increasing.

Page 38: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 39TAC-Time

Frequent BugsCSCus64671 - MDS 9700 show tech detail missing some commands

Symptom: MDS 9710 and MDS 9706 show tech detail missing some commands, like 'show running-config' and 'show startup-config'.

Conditions: MDS 9710 and MDS 9706 at NX-OS 6.2(11).

Workaround: Collect

• show tech-support all along with show tech-support detail

• Fixed in NX-OS 6.2(11c) and above.

Page 39: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 40TAC-Time

Frequent Nexus 5K/6K Bugs

Symptom:

In Cisco Nexus 5000 series switches, a disruptive upgrade with reason incompatible image causes the Unified Ports configured as FC ports to come up as Ethernet ports after upgrade.

However, the FC port configuration still exists in the running configuration.

Conditions:

Upgrade between any two incompatible images and the fc interfaces are unified interfaces requiring the slot and port commands,slot zport x - y mode fc

CSCuj87061 - Unified fc interfaces come up as Ethernet after disruptive upgrades

Backup the configuration before upgrading

Page 40: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 41TAC-Time

Frequent Nexus 5K/6K Bugs

Proactive Workaround:Do ISSU only between compatible images. Please check the result of install command for image compatibility.

Reactive Workaround:After the disruptive ISSU between incompatible images, do the following:a. copy startup-config bootflash:b. copy running-config startup-configc. reloadAfter reload:d. copy bootflash: running-confige. copy running-config startup-configNow the device should have the same configurations as before upgrade.

CSCuj87061 - Unified fc interfaces come up as Ethernet after disruptive upgrades

Consult Release Notes for non-disruptive upgrade path

Page 41: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 42TAC-Time

Frequent Nexus 5K/6K Bugs

Symptom: FC interfaces are not listed in IF-MIB snmp walk.

Device Manager is not working correctly with the Nexus 5548UP or 5596UP (GEM modules installed) when the expansion module ports are set to fibre channel mode.

Hovering over the ports with the mouse in Device Manager will display for example, "Ethernet 1/17 Status: failed".

Looking at the same ports via CLI will show that the ports are really in FC mode and not configured as Ethernet ports.

CSCup75270 - FC interfaces are not listed in IF-MIB snmp

Page 42: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 43TAC-Time

Frequent Nexus 5K/6K Bugs

Conditions: Nexus 5548UP or Nexus 5596UP running NX-OS 7.0(2)N1(1) with GEM Expansion module ports configured to operate in Fibre Channel modeSome ports are in Fibre Channel mode on the base chassis.

More Info: NX-OS 7.0(1)N1(1) and all previous software versions are not affected by this defect.This is an NX-OS bug, not a Device Manager bug.

Fixed in 7.0(6)N1 and above.

CSCup75270 - FC interfaces are not listed in IF-MIB snmp

Page 43: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 44TAC-Time

Nexus 5K/6K Bug

Resolution Summary:

1 Made the command “[no] trunk protocol enable“ hidden

2. Added appropriate warning message when the command is run on CLI

Fixed in 7.3(0)N1(1) 7.2(1)N1(1) 7.1(3)N1(1) 7.0(7)N1(1) 6.2(9)

5548-1(config-if)# no trunk protocol

Warning: This will globally disable the switch's ability to form any trunks and impacts existing trunk ports

Do you wish to continue(y/n)? [n]

CSCur10558 Trunk Protocol Enable does not show in running config when disabled

Page 44: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 45TAC-Time

Nexus 5K/6K Bug

[no] trunk protocol is not in the running config .. …or the show tech detail.

Tip: show tech will containshow port internal info all

You should always have ...Epp state: Enabled

Port Trunking Protocol (PTP) and Port Channel Protocol (PCP) use the EPP frame.

No practical reason to disable fibre channel trunking.

CSCur10558 Trunk Protocol Enable does not show in running config when disabled

Page 45: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 46TAC-Time

Cisco MDS NX-OS 7.3(0)D1(1) OUI EnhancementExample: Adding OUIs

Switch(config)# wwn oui 0x10001c

• OUI - A 24 bit globally unique number assigned by IEEE. • Port-channel functionality includes Cisco OUI check of peer switch.

Page 46: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

MDS Slow Drain Troubleshooting Enhancements

Page 47: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 48TAC-Time

MDS Slow Drain Troubleshooting Enhancements• NX-OS 6.2(9) and 6.2(13) added several enhancements

• system timeout no-credit-drop triggered at exact time by HW• TxWait• slowport-monitor• New port-monitor counters

• txwait• tx-slowport-oper-delay• tx-slowport-count

• show tech-support slowdrain

• DCNM Slow Drain Analysis

• TAC tool MDS_show_tech_slowdrain_analysis

• For a more comprehensive information see:• TAC-Time - SAN Congestion! Understanding, Troubleshooting, Mitigating in a Cisco

Fabric (2016 Las Vegas) • https://www.ciscolive.com/online/connect/sessionDetail.ww?SESSION_ID=90897&backBtn=true

Page 48: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

system timeout no-credit-drop

• no-credit-drop causes frames to be dropped immediately if the destination port is at 0 Tx credits for the time specified

• Previously no-credit-drop was triggered by SW process at 100ms intervals

• NX-OS 6.2(9) and later triggered by the HW at exact time the threshold is reached

• Should be used in conjunction with lowering congestion-drop threshold

• Recommended for F ports

• Can drastically improve ISL performance under slow drain conditions

• xxx_FORCE_TIMEOUT_ON/OFF counter

• By default no-credit-drop is not enabled

Triggered by HW at exact time

49TAC-Time

system timeout no-credit-drop 200 mode f

Page 49: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

TxWait enhancements

• txwait is a counter that increments every 2.5us when port is at 0 Tx credits and there are frames queued for transmit

• txwait * 2.5 / 1000000 = seconds of time the port was unable to transmit

• Only applies to the following:• MDS 9500 with generation 4 linecards:

• MDS 9000 Family 32-Port 8-Gbps Advanced Fibre Channel Switching Module (DS-X9232-256K9)• MDS 9000 Family 48-Port 8-Gbps Advanced Fibre Channel Switching Module (DS-X9248-256K9)

• MDS 9700 48-Port 16-Gbps Fibre Channel Switching Module (DS-X9448-768K9)• MDS 9148S 16G Multilayer Fabric Switch• MDS 9250i Multiservice Fabric Switch• MDS 9396S 16G Multilayer Fabric Switch

• Others will return zero

txwait

50TAC-Time

Page 50: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

TxWait enhancements - continued

txwait can be seen in the following:

• show interface counters• Raw value in 2.5us units

• show interface counters • Percentage Tx credits are available for last 1s/1m/1h/72h

• show process creditmon txwait-history• 60sec, 60min, 72hour graphs

• show logging onboard txwait

• SNMP fcIfTxWaitCount variable

51TAC-Time

Page 51: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

TxWait enhancements - continued

mds9710-1# show interface fc1/13 counters | i fc|waitfc1/136252650 2.5us Txwaits due to lack of transmit credits

6252650 * 2.5 / 1000000 = 15.631625 seconds

• Cumulative since the interface counters were last cleared

• The above indicates the MDS was not able to transmit for over 15 seconds since the counters were last cleared

txwait - show interface counters

52TAC-Time

Page 52: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

TxWait enhancements - continued

• Utilizes the underlying txwait counter

txwait - Percentage Tx credits are available for last 1s/1m/1h/72hMDS9710-1# show interface fc1/13 countersfc1/13…5 Transmit B2B credit transitions to zero2 Receive B2B credit transitions to zero557320 2.5us TxWait due to lack of transmit creditsPercentage Tx credits not available for last 1s/1m/1h/72h: 1%/5%/3%/2%32 receive B2B credit remaining128 transmit B2B credit remaining128 low priority transmit B2B credit remaining

53TAC-Time

Page 53: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Level 1: Latency - Troubleshooting

MDS9513# show logging onboard txwait module 4…---------------------------------Module: 4 txwait count

---------------------------------Notes:

- Sampling period is 20 seconds- Only txwait delta >= 100 ms are logged

-----------------------------------------------------------------------------| Interface | Delta TxWait Time | Congestion | Timestamp || | 2.5us ticks | seconds | | |-----------------------------------------------------------------------------| fc4/1 | 52927 | 0 | 0% | Wed May 27 13:20:12 2015 || fc4/1 | 2005222 | 5 | 25% | Wed May 27 13:19:52 2015 || fc4/1 | 105854 | 0 | 1% | Wed May 27 13:19:32 2015 || fc4/1 | 52926 | 0 | 0% | Wed May 27 13:19:12 2015 |

• Delta values recorded when they are more than 100ms in the 20 second interval

txwait - show logging onboard txwait

Recorded every 20

seconds only when >= 100ms

TAC-Time 54

Page 54: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Level 1: Latency - Troubleshooting

• Graphical display of time where Tx credits are not available

• Similar in format to cpu history

• 3 graphs per port• Last 60 seconds• Last 60 minutes• Last 72 hours

• Utilizes the underlying txwaitcounter

txwait-history mds9710-1# show process creditmon txwait-history module 1 port 13

TxWait history for port fc1/13:==============================

697 54 6994299 18 4780

0000000000000000000000000000000000290002900884000000000000001000 # ##900 # ##800 ## ##700 ## ##600 ### ###500 ### ## ###400 ### ## ####300 ### ## ####200 ### ## ####100 ### ## ####

0....5....1....1....2....2....3....3....4....4....5....5....60 5 0 5 0 5 0 5 0 5 0

Credit Not Available per second (last 60 seconds)# = TxWait (ms)

55TAC-Time

Page 55: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Level 1: Latency - Troubleshooting

• system timeout slowport-monitor <1-500> mode e|f – Must be configured!

• Events are captured every 100ms

• Last 10 events per port captured in slowport-monitor-events

• Logging onboard slowport-monitor-events captures more events

• Currently implemented for: • 9500 - Gen 3 LCs - DS-X9248-48K9 and DS-X92xx-96K9 modules • 9500 - Gen 4 LCs - DS-X9232-256K9 and DS-X9248-256K9 modules • 9700 & 9396S (Gen 5)• 9250i & 9148S

• Differences exist between Gen3, Gen4 and 9700/9250i/9148S/9396S

slowport-monitor

56TAC-Time

Page 56: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Level 1: Latency - Troubleshooting

• Gen5/9250i/9148S/9396S have enhanced HW capabilities

• Each 100ms interval the number of times Tx credits remained at 0 for the configured(admin) delay is counted.

• The average operational delay is determined – This is how long the port was at 0 Tx credits

• Recorded when at least one complete event occurred

• More events available via logging onboard slowport-monitor-events

slowport-monitor – 9700/9250i/9148S/9396S (Gen 5 LCs)MDS9710-1# show process creditmon slowport-monitor-events

Module: 01 Slowport Detected: YES=========================================================================Interface = fc1/13----------------------------------------------------------------| admin | slowport | oper | Timestamp || delay | detection | delay | || (ms) | count | (ms) | |----------------------------------------------------------------| 5 | 1300 | 20 | 1. 04/01/15 23:03:38.823 || 5 | 1296 | 19 | 2. 04/01/15 23:03:38.724 || 5 | 1291 | 19 | 3. 04/01/15 23:03:38.623 |…| 5 | 1256 | 19 |10. 04/01/15 23:03:37.923 |----------------------------------------------------------------

Configured delay(5ms)

Actual average delay

4 events in last 100msNote: Oper delay limited by no-credit-drop threshold

TAC-Time 57

Page 57: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 58TAC-Time

Slow Drain Alerting and Mitigation

• Port-monitor allows monitoring of several counters relating to slow drain• credit-loss-reco Credit loss recovery counter• lr-rx The number of link resets received by the fc-port• lr-tx Link resets transmitted by the fc-port• timeout-discards Timeout discards counter• tx-credit-not-available Credit not available counter(in 100ms increments)• tx-discards Tx discards counter• tx-slowport-count Number of slowport events• tx-slowport-oper-delay Slowport operational delay• txwait Amount of time at 0 Tx credits and packets queued• rx-datarate Rx data rate as a percentage of link speed• tx-datarate Tx data rate as a percentage of link speed

Port-monitor alerting

Note: There are other counters that are valuable and should also be considered for inclusion in monitoring but are not part of slow drain

New in 6.2(13)

Page 58: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Troubleshooting – Documentation for TAC

• Contains all the commands available that pertain to slow drain

• Contains “context” commands to understand the FC topology

• Contains name server commands to identify devices

• Contains active zonesets to understand device relationships

• Most useful when run from DCNM and gathered for the entire fabric• SAN Client -> Tools -> Run CLI Commands…

• When opening up a case with the TAC please have this available!

• Used for MDS_show_tech_slowdrain_analysis

show tech-support slowdrain

59TAC-Time

Page 59: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

DCNM Slow Drain Analysis

• DCNM 7.1(1) added Slow Drain Analysis

• DCNM 7.2(2) added improvements

• DCNM 10.0(1) added improvements

• Used for pulling fabric wide slow drain counters for a defined period of time

• Useful for ongoing slow drain problems

• Accessed from the Web Client Health -> Diagnostics -> Slow drain Analysis

60TAC-Time

Page 60: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 61TAC-Time

MDS_show_tech_slowdrain_analysis

Yellow indicates level

1 (latency)

Orange indicates level

2 (timeout drops)

Arrows indicate

direction of congestion

Green indicates

no congestion

Slow draining end device!

Red indicates level 3 (credit-

loss)

Page 61: DC Compute SAN - Jive Software · DC Compute SAN. Carlos Lopez CCIE SAN, DC #21063. David Kester CCIE SAN #19555. Ed Mazurek CCIE SNA/IP, SAN #6448

Thank you