nw18 - building a highly available network - don't bring me down

37
Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved. Rockwell Automation TechED 2015 @ROKTechED #ROKTechED PUBLIC INFORMATION NW18 Building a Highly Available Network Don’t Bring Me Down!

Upload: rockwell-automation

Post on 16-Aug-2015

190 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: NW18 - Building a Highly Available Network - Don't Bring Me Down

Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved.Rockwell Automation TechED 2015 @ROKTechED #ROKTechED

PUBLIC INFORMATION

NW18 – Building a Highly Available Network Don’t Bring Me Down!

Page 2: NW18 - Building a Highly Available Network - Don't Bring Me Down

Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved.Rockwell Automation TechED 2015 @ROKTechED #ROKTechED

Layers 8 & 9

2

Physical

Data Link

Network

Transport

Session

Presentation

Application

$$ Money $$

Politics

1

2

3

4

5

6

7

8

9

OSI 7 Layer Model

What are Hidden Layers 8 & 9?

Page 3: NW18 - Building a Highly Available Network - Don't Bring Me Down

Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved.Rockwell Automation TechED 2015 @ROKTechED #ROKTechED

Agenda

Redundant HSRP Pairs

VSS – Virtual Switching System

Switch Stacks

Requirements Gathering

Separate Parallel Redundancy (A/B)

Page 4: NW18 - Building a Highly Available Network - Don't Bring Me Down

Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved.Rockwell Automation TechED 2015 @ROKTechED #ROKTechED

What’s in the Rack?

Page 5: NW18 - Building a Highly Available Network - Don't Bring Me Down

Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved.Rockwell Automation TechED 2015 @ROKTechED #ROKTechED

Requirements Gathering

5

The availability of the Industrial Automation Control System (IACS) has a

direct correlation to:

Plant Uptime

Overall Equipment Effectiveness (OEE) of a manufacturing facility

Because the network is a key aspect of the overall system, these

requirements translate directly to the IACS network.

Must understand the Customer’s Application

Meeting the needs of the System

Page 6: NW18 - Building a Highly Available Network - Don't Bring Me Down

Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved.Rockwell Automation TechED 2015 @ROKTechED #ROKTechED

Requirements Gathering

6

Created alternative data communication paths

Eliminating single points of failures

Utilize resiliency techniques

Network Architectures

Dynamic Routing Protocols

Active Monitoring Integrated with Control System

Mean Time to Repair

Some key considerations for high availability include:

Page 7: NW18 - Building a Highly Available Network - Don't Bring Me Down

Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved.Rockwell Automation TechED 2015 @ROKTechED #ROKTechED

Requirements Gathering

7

The first step to having a network that has high availability and is resilient is to determine the network's required availability.

Network

Availability

Downtime per year Downtime per

Week95% 438 hours 8.4 hours

99% 87.6 hours 101 minutes

99.9% 8.8 hours 10 minutes

99.99% 52.6 minutes 1 minute

99.999% 5.3 minutes 6 seconds

Key Topic: What Does Down Time Mean in Your Environment?

Page 8: NW18 - Building a Highly Available Network - Don't Bring Me Down

Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved.Rockwell Automation TechED 2015 @ROKTechED #ROKTechED

High Availability – Multiple Elements

8

Power

Control / Process Level

Client / Server (TF01 – Introduction to Virtualization for Manufacturing)

VMWare

Stratus

Network Design and Architecture

Application Validation Testing – How will it perform?

Active Monitoring – Need to know when an issue occurs

Migration Planning – If Replacing Existing, How to Move to New Environment?

Page 9: NW18 - Building a Highly Available Network - Don't Bring Me Down

Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved.Rockwell Automation TechED 2015 @ROKTechED #ROKTechED

Why This Topic?

9

Power Plant

Migrated While Plant was Running!

Semiconductor Plant

Downtime could cost 100’s of Millions

Pharmaceutical

Network Anomalies Found After Production Started

Crystal Growth Plant

Network Upgrades Required After Startup

Salad Manufacturer

New Network Shutdown Plant

Page 10: NW18 - Building a Highly Available Network - Don't Bring Me Down

Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved.Rockwell Automation TechED 2015 @ROKTechED #ROKTechED

Design Goals

10

Network Anomalies Should Be Transparent To Production

How Will Switch IOS Or Firmware Versions Be Updated If Required

Insure Production

Network Uptime Must Be Higher Then Process Uptime

Does The Network Respond As Expected?

How Do I Know When There Was An Issue?

Do We Understand The Risks With The Desired Network Architecture?

Communications Are Transparent To The Control Layer

What Is The Budget?

Page 11: NW18 - Building a Highly Available Network - Don't Bring Me Down

Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved.Rockwell Automation TechED 2015 @ROKTechED #ROKTechED

Design Failure

11

Noted Dated 4-10-2008

Assessment Dated 3-2-2010

“Do Not Use This System Until

Further Notice. Using This System

May Lock Up Devices on the

Network.”

Page 12: NW18 - Building a Highly Available Network - Don't Bring Me Down

Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved.Rockwell Automation TechED 2015 @ROKTechED #ROKTechED

Switch Stacking

High Availability Method 1

Page 13: NW18 - Building a Highly Available Network - Don't Bring Me Down

Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved.Rockwell Automation TechED 2015 @ROKTechED #ROKTechED

Switch Stacking

13

High Availability Method 1 – Switches are Stacked Together

Multiple Physical

Switches

Single Logical Switch

Page 14: NW18 - Building a Highly Available Network - Don't Bring Me Down

Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved.Rockwell Automation TechED 2015 @ROKTechED #ROKTechED

Switch Stacking

14

Cisco StackWise technology provides a method for collectively utilizing the capabilities of

a stack of switches.

Switches can be added to and deleted from a working stack without affecting

performance.

Up to nine separate switches can be joined together.

32 Gigabit Per Second Interconnection Between Switches

Also Supports Power Stacking

Cisco StackWise Technology Features

Page 15: NW18 - Building a Highly Available Network - Don't Bring Me Down

Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved.Rockwell Automation TechED 2015 @ROKTechED #ROKTechED

Switch Stacking

15

Information is shared by every switch in the stack, creating a single switching unit. This includes:

Configuration Only one configuration file, which is distributed to each member in the stack.

Routing Tables

MAC Address Tables

Network Topology

Updates occur continuously through the stack interconnect.

Each stack of Switches has a single IP address and is managed as a single object. This single IP management applies to activities such as fault detection, virtual LAN (VLAN) creation and

modification, security, and QoS controls.

Cisco StackWise Technology Features

Page 16: NW18 - Building a Highly Available Network - Don't Bring Me Down

Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved.Rockwell Automation TechED 2015 @ROKTechED #ROKTechED

Switch Stacking

16

IOS Updates

Total time to complete the update on 2 Switch Stack ~ 65 minutes

Controller communication downtime ~ 29 minutes

Reasons an IOS update may need to be performed

Feature Set Updates / Performance / Reported Anomalies

Master Switch or Stack Fault

Adding More Switches to the Stack

SSH after a Failed Telnet

SFP – Hot Swappable

Configuration Changes – 1 Logical Switch, so a configuration change will impact both switches

Cisco StackWise Technology Considerations

Page 17: NW18 - Building a Highly Available Network - Don't Bring Me Down

Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved.Rockwell Automation TechED 2015 @ROKTechED #ROKTechED

Switch Stacking

17

Caveats

CSCub20474 (Catalyst 3560, C3560v2, C3750, and C3750v2 switches)

In a switch stack, multicast traffic can be lost for up to 60 seconds when the master

switch is reloaded. Because the platform does not support multicast non-stop-

forwarding (NSF), the time before traffic re-convergence after a switchover can vary.

Cisco StackWise Technology Documented Anomaly

Page 18: NW18 - Building a Highly Available Network - Don't Bring Me Down

Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved.Rockwell Automation TechED 2015 @ROKTechED #ROKTechED

Switch Stacking

18

Caveats

CSCtz87828 (Catalyst 2960-S, 3750, and 3750v2 switches)

When a cross-stack Etherchannel is used and one of its link is brought down or up, a

MAC address learned from this port-channel may either be prematurely cleared from

the table or not aged out.

The workaround is to use a single switch Etherchannel or to clear dynamically-learned

MAC addresses after links have been added to or removed from the channel.

Cisco StackWise Technology Failures

Page 19: NW18 - Building a Highly Available Network - Don't Bring Me Down

Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved.Rockwell Automation TechED 2015 @ROKTechED #ROKTechED

Switch Stacking

19

Caveats

CSCuj16899 (Catalyst 2960 and 3750v2 switches)

System memory may get exhausted on standalone switches with 64 MB of DRAM and

stackable switches with 128 MB of DRAM when 802.1x authentication is enabled

concurrently with other features. A switch stack of Catalyst 3750v2 switches with more

than five members may exhaust system memory and become inoperable.

The workaround is to limit stacks to five members or fewer.

Cisco StackWise Technology Failures

Page 20: NW18 - Building a Highly Available Network - Don't Bring Me Down

Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved.Rockwell Automation TechED 2015 @ROKTechED #ROKTechED

Switch Stacking – Demo 1

20

Topology of High Availability Method 1 – Switch Stacking

Page 21: NW18 - Building a Highly Available Network - Don't Bring Me Down

Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved.Rockwell Automation TechED 2015 @ROKTechED #ROKTechED

Hot Standby Router Protocol HSRP Peers

High Availability Method 2

Page 22: NW18 - Building a Highly Available Network - Don't Bring Me Down

Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved.Rockwell Automation TechED 2015 @ROKTechED #ROKTechED

Hot Standby Router Protocol (HSRP)

22

Two or more Layer 3 switches or routers can act as a single, virtual layer 3 switch

or router.

They share a Virtual IP address and a Virtual MAC address.

The virtual IP address is the host’s default gateway.

One HSRP router can assume the routing responsibility of another if a router

goes out of commission for either planned or unplanned reasons.

HSRP peers continually exchange status messages.

Routing change-overs are transparent to hosts, but can impact traffic flows.

HSRP Features

Page 23: NW18 - Building a Highly Available Network - Don't Bring Me Down

Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved.Rockwell Automation TechED 2015 @ROKTechED #ROKTechED

Hot Standby Router Protocol (HSRP)

23

HSRP States

State Definition

Initial This is the state at the start. This state indicates that HSRP does not run. This state is entered through a configuration change or

when an interface first becomes available

Learn The router has not determined the virtual IP address and has not yet seen an authenticated hello message from the active router. In

this state, the router still waits to hear from the active router.

Listen The router knows the virtual IP address, but the router is neither the active router nor the standby router. It listens for hello

messages from those routers.

Speak The router sends periodic hello messages and actively participates in the election of the active and/or standby router. A router

cannot enter speak state unless the router has the virtual IP address.

Standby The router is a candidate to become the next active router and sends periodic hello messages. With the exclusion of transient

conditions, there is, at most, one router in the group in standby state.

Active The router currently forwards packets that are sent to the group virtual MAC address. The router sends periodic hello messages. With

the exclusion of transient conditions, there must be, at most, one router in active state in the group.

Page 24: NW18 - Building a Highly Available Network - Don't Bring Me Down

Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved.Rockwell Automation TechED 2015 @ROKTechED #ROKTechED

HSRP – Demo 2

24

Topology of High Availability Method 2 – HSRP

Page 25: NW18 - Building a Highly Available Network - Don't Bring Me Down

Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved.Rockwell Automation TechED 2015 @ROKTechED #ROKTechED

Hot Standby Router Protocol (HSRP)

25

HSRP Standby IP address is reported as a duplicate IP address - STP

HSRP state continuously changes - STP

HSRP does not recognize peer - Physical Layer

Host MAC Address Flap Between Ports - STP

HSRP state changes and switch Runtime Diagnostic Message RTD Flap – STP

HSRP Intermittent State Changes - STP

Excessive flooding of Unicast traffic – Load Balancing Configuration, Packet Drops

HSRP virtual IP address is reported as a different IP address – STP

HSRP causes a MAC violation on a secure port – Configuration Issue

HSRP Failure Use Cases

Page 26: NW18 - Building a Highly Available Network - Don't Bring Me Down

Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved.Rockwell Automation TechED 2015 @ROKTechED #ROKTechED

Topology Changes

Example of topology changes on from several customer’s production networks

Page 27: NW18 - Building a Highly Available Network - Don't Bring Me Down

Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved.Rockwell Automation TechED 2015 @ROKTechED #ROKTechED

Virtual Switch System (VSS)

High Availability Method 3

Page 28: NW18 - Building a Highly Available Network - Don't Bring Me Down

Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved.Rockwell Automation TechED 2015 @ROKTechED #ROKTechED

Virtual Switch System (VSS)

28

Page 29: NW18 - Building a Highly Available Network - Don't Bring Me Down

Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved.Rockwell Automation TechED 2015 @ROKTechED #ROKTechED

Virtual Switch System (VSS)

29

Virtual Switch Link

Control and Data Traffic

Interconnects Switches

Stateful Switch Over (SSO)

Increases Network Availability During Switch Over

Non Stop Forwarding NSF – Continue Forwarding Along Known Routes

MEC - Multi-Chassis EtherChannel

In-Service Software Upgrade (ISSU)

VSS Features

Page 30: NW18 - Building a Highly Available Network - Don't Bring Me Down

Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved.Rockwell Automation TechED 2015 @ROKTechED #ROKTechED

Virtual Switch System (VSS)

30

When you create or restart a VSS, the peer switches negotiate their roles.

One switch becomes the VSS active switch.

The other switch becomes the VSS standby switch.

The VSS active switch controls the VSS, running the Layer 2 and Layer 3 control protocols for the switching modules on both switches.

The VSS Active switch also provides management functions for the VSS.

The VSS active and standby switches perform packet forwarding for ingress data traffic on their locally hosted interfaces.

The VSS standby switch sends all control traffic to the VSS active switch for processing.

VSS Features

Page 31: NW18 - Building a Highly Available Network - Don't Bring Me Down

Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved.Rockwell Automation TechED 2015 @ROKTechED #ROKTechED

Virtual Switch System (VSS)

VSS Active Switch Fails

Stateful Switch Over Initiated

Standby Assumes Active Role

Failed Switch Reloads

VSS Standby Fails

No Switch Over

Failed Switch Reloads

MEC Links Stay Active, But Bandwidth Reduced Until Recovery Complete

Page 32: NW18 - Building a Highly Available Network - Don't Bring Me Down

Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved.Rockwell Automation TechED 2015 @ROKTechED #ROKTechED

VSS – Demo 3

34

Topology of High Availability Method 3 – VSS

Page 33: NW18 - Building a Highly Available Network - Don't Bring Me Down

Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved.Rockwell Automation TechED 2015 @ROKTechED #ROKTechED

Separate Parallel Redundancy (A | B)

High Availability Method 4

Page 34: NW18 - Building a Highly Available Network - Don't Bring Me Down

Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved.Rockwell Automation TechED 2015 @ROKTechED #ROKTechED

Separate Parallel Redundancy (A | B)

36

Can be combined with other high availability methods.

More intuitive for a Control System Engineer to understand.

Convergences at the scan time of the PLC.

Hence, convergences as fast as the process requires.

Has the ability to provide the highest possible availability, under the right

circumstances.

A | B Features

Page 35: NW18 - Building a Highly Available Network - Don't Bring Me Down

Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved.Rockwell Automation TechED 2015 @ROKTechED #ROKTechED

Separate Parallel Redundancy (A | B)

37

PLC Logic Dependent

All devices may not be able to support

I/O

May not be able to meet the requirements of all systems

May cost more to implement

A | B Failures

Page 36: NW18 - Building a Highly Available Network - Don't Bring Me Down

Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved.Rockwell Automation TechED 2015 @ROKTechED #ROKTechED

A | B – Demo 4

38

Topology of High Availability Method 4 – A | B Networks

Page 37: NW18 - Building a Highly Available Network - Don't Bring Me Down

www.rockwellautomationteched.com

Copyright © 2015 Rockwell Automation, Inc. All Rights Reserved.

PUBLIC INFORMATION

Rockwell Automation TechED 2015 @ROKTechED #ROKTechED

Thank you!