ethernet control systems using media redundancy

Acromag, Incorporated 30765 S Wixom Rd, PO Box 437, Wixom, MI 48393-7037 USA Tel: 248-295-0880 Fax: 248-624-9234 www.acromag.com

Copyright © Acromag, Inc. July 28, 2008

White Paper:

How to Develop High Reliability Ethernet Control Systems Using Media Redundancy

Extending Ethernet Redundancy Schemes All the Way Down to the I/O

www.acromag.com

- 2 -

How to Develop High Reliability Ethernet Control Systems Using Media Redundancy

Ethernet on the Plant Floor Ethernet was never designed for the plant floor industrial environment. It was designed for office systems and the environment of the enterprise business systems. In the business systems environment, making sure that the network has ultra high availability at all times is not necessary. For example, servers are often rebooted during off-peak use times, like at night, and on weekends. This of course will not work for the plant floor—there are few, if any, off-peak use times. A plant floor network must have availability 24/7 and probably 24/7/365. Networks cannot go down, and servers cannot be rebooted in the middle of production. But because Ethernet is entirely ubiquitous throughout the rest of the enterprise, its cost-of-implementation and cost-of-use has been reduced to the point where it is also commonly found on the plant floor. Most IT specialists understand and can troubleshoot an Ethernet network without special training, too. So it is entirely understandable why Ethernet has become the network of preference on the plant floor as well. The key to using Ethernet on the plant floor is to make the implementation highly available. That means in practice, various means of redundancy are employed. A redundant system must be fault tolerant. We typically make systems redundant by duplicating system components that have high probabilities of failure, like power supplies, software file systems, and other computer hardware. We also add what is called “media redundancy” which refers to the formation of a backup communication path when part of a network is suddenly unavailable. A redundant media system can survive the break of a network cable or link by switching to alternate communication paths as soon as the break is detected. Building media redundancy into an Ethernet network isn’t simply a matter of adding another cable connection. The Ethernet devices on the network must have the ability to manage communication over duplicate paths. Ethernet normally does not allow duplicate message paths in the same network, because doing so would cause the messages to travel the network infinitely, or “loop.” This drives so-called “message storms” that can ultimately shut down the network, or prevent real communication between network devices.

Hubs, Switches and Ring Topologies Ethernet networks began by connecting devices directly to other devices. It quickly became apparent that some means was needed to add more than two devices to a network. This device is called a hub, because the devices were connected in a set of spokes to the hub. This sort of configuration is also called a “star” topology. More than one “star” can be connected through another hub to form a “tree.” Star topologies are very practical for connecting edge devices, such as sensors, scanners, and other I/O devices to the network, but they do not address redundant message paths well.

- 3 -

A network hub electrically repeats an incoming network message at all of its connection ports. By itself, a network hub cannot be made redundant, because it has no way of knowing which messages are redundant, and therefore passes all of them. This can cause packets to collide and become garbled as messages are received and reformed. “Packet” or “message” storms can be created that could effectively shut down the network. Figure 1: OSI Reference Model

7. Application

6. Presentation

5. Session

4. Transport

3. Network

2. Data Link

1. Physical

The Seven-Layer OSI Networking Model.Ethernet defines Layers 1 and 2 of thenetwork topology.

Modbus, Ethernet/IP,EtherCAT, PROFInet,Powerlink, etc.

TCP, UDP, IP

Ethernet

User

Hubs work at the lower layer of the OSI Reference Model, the physical layer. Other devices, called bridges by the IEEE, and “switches” by most everybody else, perform the same function as a hub, but with more intelligence, because they sit on the second or data link layer of the OSI model. Switches are similar to hubs, however, they learn which of their ports are connected to which devices, and then forward messages to the correct port only, effectively reducing network traffic and preserving network bandwidth. Switches are designed to perform other intelligent functions to filter and direct network traffic as required. Switches can be “cascaded” or added upstream or downstream of switches already in the network to further segment the Ethernet network.

- 4 -

Before we discuss how switches can be used to manage redundant message paths in a network, we should review the five basic topologies used to form Ethernet networks: Star, Tree, Bus, Ring and Mesh. Bus networks have a single cable backbone, with many devices connected to it. Foundation Fieldbus HSE is an example of an Ethernet network formed as a Bus. The single cable backbone is the common point of failure, so industrial plant bus networks are usually designed with dual redundant cable pairs. Star topologies have all the nodes connected to a central site, or hub. It has the advantage over the Bus that if cable breaks between the hub and any node, the operation of the other nodes on the network are not impaired. Tree topology is a combination of Star and Bus topologies, providing the ability to link together smaller networks or Stars via a common Bus. Mesh topologies use lots of wire to connect every node with every other node, providing alternate paths for every node to maintain communication in the event of failure. Mesh topologies are very expensive, but are very fault tolerant. It is worth noting that the Internet is itself a mesh network, designed that way to operate even in the event of damage during a nuclear war. The last of the five is the Ring. In this topology, devices are connected in a series ring and messages go only one way from device to device. Figure 2 : Basic network topologies

Ring Mesh

Tree Bus

Star

With the exception of the more expensive Mesh topology, none of these ways of designing an Ethernet network provides any means for handling redundancy or redundant message paths. Recognize that Bus and Ring topologies can have single point of failure problems.

Ethernet Redundancy Schemes So how do we build redundant Ethernet networks without causing message loops? We simply utilize switches that can manage multiple connection paths to any one device.

- 5 -

That is, we connect our devices with switches capable of detecting a redundant path to the same address. This works independent of topology. There are four of these methodologies, or protocols, for achieving redundancy with switches: Spanning Tree (STP), Rapid Spanning Tree (RSTP), Proprietary Ring and Trunking. Each of these redundancy methods will take care of media failure within the network structure itself, that is, between the switches and hubs, but not necessarily to the network devices they connect to. Spanning Tree Protocol, or STP was the first protocol to be developed based on the IEEE 802.1D Media Access Control (MAC). Spanning Tree is based on timers and will operate with link segments and coaxial cable segments. Rapid Spanning Tree is a modernization of the STP protocol which, as its name implies, operates significantly faster than its older predecessor. Both protocols aren’t limited to Ring topologies, but will also work well with a Mesh topology. The Proprietary Ring is a combination of a topology and proprietary protocol that permits a network Ring to continue to function when one of its segments is broken. A master switch is set up to monitor and control packet traffic in a proprietary way. Proprietary rings are simple to understand and set up, but they have the disadvantage of being proprietary, and this limits hardware selection to a single vendor for the entire network. The Proprietary Ring is very fast, recovers well, and does an excellent job of managing redundancy with minimal usage of switches and cabling. Trunking, which is sometimes called link aggregation (as it is in the IEEE 802.1 standard), provides two or more parallel paths between device ports for redundancy. Trunking has the advantage of increased bandwidth and throughput, since it provides dual paths for data transmission between each switch. Figure 3: Trunking Diagram

P1

P3

P5

P7

TX

TX

P2

P4

P6

P1

P3P4

P6P7

PWR

FDX/COL

P8

RX

LNK/ACT

RX

P2

P5

P8

R.M.

R.M.

PWR2

PWR1

EIS-408FX-M

RESET

FAULT

SwitchPort 1 NIC 1

NIC 2SwitchPort 2

LinkAggregation

Fully Redundant Ethernet Networks Means All the Way to the I/O Because of Ethernet’s heritage from the enterprise or business system networks, most network designers limit themselves to containing media failure within the network infrastructure itself. That means that they provide redundant connectivity between the hubs and switches of the network, but they generally omit the devices the network connects to. In the case of plant floor networks, that generally means the I/O.

- 6 -

Ethernet I/O comes in two basic flavors. Some devices have digital connectivity themselves, such as the network ports found on barcode scanners or weigh scales. Other devices act as I/O aggregators with multiple analog, discrete, or digital inputs and outputs. Figure 4: Types of I/O

Ethernet I/O aggregators are available in a variety of form factors. Acromag’s BusWorks® and EtherStax® series demonstrate configurations to interface just a few or very many analog and discrete I/O signals. In order to provide appropriate redundancy to the network, we have to consider what the purpose of redundancy in a plant floor network is: high availability of the control system. That means we must provide redundancy not only to the network topology, but also all the way to the I/O, or we continue to have single point of failure issues that may cause the control system itself to fail.

Back to the Hub Remember that a hub will repeat an incoming message at all of its other ports. If two ports of the hub happen to be connected to an external redundant network switch, this causes the switch to immediately detect the redundant path, disable it, and hold it as a back-up path should the primary path fail. If the primary path fails, it will trigger the fail-over and recovery operation using the particular protocol of the redundant network switch. This provides some significant advantages when multi-port hub functionality is combined and embedded inside an I/O device. This simple methodology as implemented by Acromag’s EtherStax I/O system allows the I/O to become interoperable with standard Ethernet redundancy protocols like STP or RSTP. Further, it allows the I/O to also work seamlessly inside any proprietary Ring network. This means that media redundancy and reliability are now easily implemented all the way back to the “end-node” device rather than stopping at the connected switch. The Ethernet ports on the I/O can operate as switches or hubs which allows their use in virtually any Ethernet network architecture or in any redundancy scheme supported by switch manufacturers. Designers and users win. It is a simple, cost effective, and reliable approach which can also provide a future upgrade path based on standards, so that lifecycle costs are minimized.

- 7 -

Methodology for Fully Redundant Ethernet Networks in Control Systems The premise for fully redundant Ethernet networks in control systems is to use devices with dual Ethernet ports that can emulate the behavior of a hub. If an I/O or control device is able to operate its dual network ports as a hub, this easily brings media redundancy down to the end-node. In its simplest form, a Redundant Media Path is formed with an Ethernet I/O device, such as an EtherStax unit, that is connected to two ports of the same redundant switch. This switch would use a Spanning Tree Protocol to manage redundancy. Figure 5: Simple Node Redundancy Connections (One RSTP Switch)

P1

P3

P5

P7

TX

TX

P2

P4

P6

P1

P3P4

P6P7

PWR

FDX/COL

P8

RX

LNK/ACT

RX

P2

P5

P8

R.M.

R.M.

PWR2

PWR1

EIS-408FX-M

RESET

FAULT

OR

NO

T R

ED

UN

DA

NT

PC

TO

SW

ITC

H

Acromag

STP/RSTPREDUNDANTSWITCHREQUIRED

HOST PC

TO NIC

BREAK 1 OR BREAK 2 WILL RECOVERCOMMUNICATION IN LESS THAN 15 SECONDS.

BREAK 1

BREAK 2

CONNECT TWOPATHS TO UNIT

USE AN ETHERNETSWITCH TO DISTRIBUTENODES

REMOTE HOST(w/ NIC INSTALLED)

REDUNDANT TO NODE

IN HUB MODE, THE ETHERSTAXREPEATS ANY MESSAGE ON APORT AT THE OPPOSITE PORT,TRIGGERING THE EXTERNALSWITCH TO SENSE THEREDUNDANT PATH, DISABLE IT,AND HOLD IT AS A BACKUP PATHSHOULD THE PRIMARY PATH FAIL.

A break in either path to the unit will recover communication on the opposite path in about 15 seconds or less. With RSTP switches, recovery times can be much faster, perhaps 1 or 2 seconds. Additional redundant paths can be added by simply installing a second redundant switch.

- 8 -

Figure 6: Simple Node Redundancy Connections (Two Switches)

P1 P2

P3P2

P6P5

P8R.M.

P3

P5

P7

P8

P4

P6

P1

P4

P7

FDX/COL

P1

FDX/COL

EIS-408FX-M

P3P2

P6P5

LNK/ACT P8

RXPWR1

R.M.

EIS-408FX-M

P2

LNK/ACT

RX

P3

P5

P7

P8

RX RESET

TX

P1

P4

P7

TX PWR

PWR2

P4

P6

RX

TX

TX

R.M.

FAULT

FAULT

PWR1

R.M.

RESET

PWR

PWR2

BREAK 2

BREAK 1

HOST PC

BRK 3


TO NIC

OR

CONNECT TWOPATHS TO UNIT

RE

DU

ND

AN

T N

ET

WO

RK

ME

DIA


BREAK 1 OR BREAK 2 WILL RECOVERCOMMUNICATION IN LESS THAN 15 SECONDS.

Acromag

USE AN ETHERNET SWITCHTO DISTRIBUTE NODES

Acromag

BRK 4

REDUNDANT TO NODE

BREAK 3 OR BREAK 4 WILL RECOVERCOMMUNICATION IN 1-2 SECONDS WITH RSTP.

REMOTE HOST(w/ NIC INSTALLED)

IN HUB MODE, THE ETHERSTAXREPEATS ANY MESSAGE ON APORT AT THE OPPOSITE PORT,TRIGGERING THE EXTERNALSWITCH TO SENSE THEREDUNDANT PATH, DISABLE IT,AND HOLD IT AS A BACKUP PATHSHOULD THE PRIMARY PATH FAIL.

Because of the hub functionality, redundancy to the I/O functions independently of the system topology and can accommodate mesh, ring and tree configurations.

- 9 -

Figure 7: RSTP Mesh Connections and Ring Redundancy Connections

P1

P3P4

P6P7

R.M.

P1

P3P4

P6P7

R.M.

P2

P3

FDX/COL

TX

LNK/ACT

RX

P2

P4

P3P2

P6P5

P8

R.M.

P2

P3

P5

P7

P8

RX

TX

TX

P1

P3P4

P6P7

R.M.

P2

P4

P6

P1

P3P4

P6P7

PWR

P2

P5

P8

P2

P5

P8

P5

P7

P8

RX

TX

P6

P1

P4

P7

PWR

FDX/COL

RX

LNK/ACT

P2

P5

P8

P2

P5

P8

R.M.

P2

P3

P5

RX

P7

TX

P8

TX

P2

P3

P5

RX

P7

TX

P8

TX

P2

P4

P6

RESET

PWR2

PWR1

PWR

P2

P4

P6

RESET

PWR2

PWR1

PWR

P2

P3

P5

P7

P8

P2

P4

P6

RX

TX

TX

EIS-408FX-M

RX

Acromag

EIS-408FX-M

FDX/COL

RX

LNK/ACT

Acromag

FAULT

R.M.

FAULT

R.M.

EIS-408FX-M

EIS-408FX-M

RX

EIS-408FX-M

FDX/COL

LNK/ACT

FDX/COL

LNK/ACT

R.M.

FAULT

PWR1

Acromag

RESET

PWR2

PWR1

PWR

RESET

R.M.

PWR2

PWR1

1-2s

RESET

PWR2

1-2s

t<=30s

FAULT

R.M.

FAULT

1-2s

1-2s

1-2s

1-2s

1-2s

1-2s

t<=30st<=15s

Acromag

Acromag

HOST PC

t<=15s

EtherStax I/O Connectedto Same Switch

STP/RSTP CONNECTED AS A MESH(At Least 3 Connections Between Each Switch and Neighboring Switches)

MEDIA FAILURE RECOVERY TIME 1-2s INSIDENETWORK, 15-30s TO ETHERSTAX.

EtherStax I/O ConnectedBetween Switches

P3P2

P6P5

R.M.P8

RX

TX

RX

TX

P1P2

P4P5

P7P8

R.M.

P2P1

P5P4

P8P7

P1

P4

P7

P3

P6

R.M.

P3

P6

R.M.

P1

FDX/COL

LNK/ACT

RX

P2

FAULT

PWR1

R.M.

EIS-408FX-M

P3

P5

P7

P8

P4

P6

RESET

FAULT

PWR2

PWR

P5

P7

P8

P6

RX RESET

FAULT

RXPWR2

R.M.

TX PWR

P3

P5

P7

RX

TX

P8

TX

P4

P6

RESET

PWR2

PWR

P1

FDX/COL

LNK/ACT

Acromag

P2

PWR1

P1

P3

FDX/COL

P2

P4

TX

LNK/ACT

PWR1

EIS-408FX-MEIS-408FX-M

Acromag Acromag

300ms

SWITCH

300ms300ms

HOST PC

THE ETHERSTAX ISTRANSPARENT TOTHE RING IN HUBMODE

SWITCH

ACROMAGEIS-408FX-M

DISABLED PATH(SWITCH BLOCKSCOMMUNICATIONVIA REDUNDANTPATH)

300ms

ACROMAGEIS-408FX-M

UNIT MUST BE INHUB/REPEATERMODE

ETHERSTAX UNIT

SWITCH

REDUNDANT MEDIARING CONNECTIONS

IF A PRIMARY PATH FAILS, THENRING WILL FAIL-OVER TO THEALTERNATE PATH IN LESS THAN300 MILLISECONDS.

ACROMAGEIS-408FX-M

- 10 -

Figure 8: Redundant Ring Network with a Wireless Link

P2

P3

P5

P7

P8

RX

TX

TX

P2

P4

P6

P2

FDX/CO L

LNK/AC T

RX

P2

P2P3

P5P6

P8

R.M.

FDX/CO L

RX

LNK/AC T

P3

P5

P7

P8

RX

TX

TX

P4

P6

P7

P4

P1

PWR

P1

P4P3

P7P6

R.M.

RESE T

R.M.

FAU LT

PWR1

EIS-408FX-M

Acromag

P2

P8

P5

PWR2

ETHERNET

RF RECEIVERFTRANSMIT

ETHERNETSERIAL

RFTRANSMITPOWER/STATUS

RESE T

PWR2

PWR1

PWR

EIS-408FX-M

SERIAL

POWER/STATUS

SIGNALSTRENGTH

RF RECEIVE

t <= 90s

FAU LT

R.M.

SIGNALSTRENGTH

RADIOMODEM

300ms

t <= 90s

EtherStax I/O

RINGSWITCH

Acromag

RINGSWITCH

300ms

HOST PC

Recovery to Copper Link in 300ms.Recovery to Wireless Link in 2-90s.

(dependent on radios)

2.4GHz802.11b

RADIOMODEM2.4GHz802.11b

The second port of the Ethernet I/O devices may also function as a backup path for other types of connection media, such as Ethernet-to-Wireless radio transmission, as shown above. You could just as easily replace the radio path with an Ethernet serial server, Ethernet-to-Cellular or Ethernet-to-Satellite Modem to form a backup path along some other media as well. In the case shown, the primary path is the hard-wired copper connections for “mission critical” operation and the wireless path is held in reserve as a backup.

Conclusion Which method you ultimately choose to manage redundant media should consider future upgradeability and expansion, the recovery time, its flexibility in wiring, and the amount of switch ports and cable that will be consumed.

- 11 -

STP RSTP RING TRUNKING Protocol Reference

IEEE 802.1D

IEEE 802.1W

Proprietary Proprietary

Recovery Time 30-60s 1-2s ≤ 300ms ≤10ms Wiring Flexibility Very

Flexible Very Flexible

Restrictive Very Restrictive

Switch Supplier Flexibility

High High Low Low

Amount of Cable Needed for Media Redundancy

High High Low Highest

# Ports Consumed High High Low Highest # of Switches Needed

High High Low Highest

With RSTP, the Recovery Time for a media failure between switches ranges from 1-2 seconds, while the recovery time to the Ethernet I/O may be longer depending on how the switch software parameters are configured (e.g. Max Age Time, Hello Time, Forward Delay Time, etc.). For redundancy applications where general recovery times of several seconds or longer are acceptable, then using switches with standard RSTP protocol offers the most flexibility and interoperability among switch vendors, although at a cost of using additional switches and cable. For control or I/O system applications where recovery times are more critical and costs may be sensitive, Ring Redundancy offers media recovery times of less than 300 milliseconds for any ring path. Further, when using Ring Redundancy combined with Ethernet I/O such as EtherStax, this methodology offers “end-node” redundancy at a faster rate and lowest cost based on the number of switches and cables needed. Trunking, or Link Aggregation as mentioned earlier, is a method that uses multiple paths between switches to achieve the greatest bandwidth and more redundancy levels. With recovery times of less than 10mS, it is currently the fastest media redundancy method. It is ideal for high-speed redundancy between servers and information systems. Since trunking is proprietary and costs more due to its restrictive nature, trunking is not considered a viable or practical redundancy protocol for end-node industrial i/o or control system applications. In conclusion, by integrating hub functionality into end devices, it is therefore possible to extend standard and open Ethernet redundancy methodologies all the way to the I/O. The end results allow users to utilize “off the shelf” Ethernet switch technology to provide the very high availability and reliability necessary to operate in an industrial plant floor environment. NOTE: Recovery times listed are estimates based on bench tests. Actual times may vary.

- 12 -

About the Author: Acromag is an international corporation that has been manufacturing and developing measurement and control products for more than 50 years. Acromag offers a complete line of industrial I/O products including process instruments, signal conditioners, distributed I/O systems, network communication devices, and data acquisition boards. For more information about Acromag products, call the Inside Sales Department at (248) 295-0880, Fax (248) 624-9234, or e-mail [email protected]. Write Acromag at P.O. Box 437, Wixom, MI 48393-7037 USA. Our web site address is http://www.acromag.com. BusWorks and EtherStax are registered trademarks of Acromag, Inc.

ethernet control systems using media redundancy

Documents

tx p7 rx

nic installed

osi reference model

single cable backbone

primary path fail

rapid spanning tree

plant floor network

spanning tree protocol