rce platform technology (rpt) gregg thayer ([email protected]) shelf installation, networking,...

24
RCE Platform Technology (RPT) Gregg Thayer ([email protected]) Shelf Installation, Networking, and Administration V3

Upload: jewel-long

Post on 29-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: RCE Platform Technology (RPT) Gregg Thayer (jgt@slac.stanford.edu) Shelf Installation, Networking, and Administration V3

RCE Platform Technology (RPT)

Gregg Thayer ([email protected])

Shelf Installation, Networking, and Administration

V3

Page 2: RCE Platform Technology (RPT) Gregg Thayer (jgt@slac.stanford.edu) Shelf Installation, Networking, and Administration V3

2

Overview

• The COB is a cluster of networked compute elements

• ATCA allows these clusters to be densely networked

• The ATCA Shelf does not exist in isolation- We want the data to make it out

• The external network is an integral part of the COB system- They must be configured together

• The external network has the capacity to adversely affect the

performance of the internal Shelf network- The COB cannot absolutely protect itself from a poorly designed

external network

- Care needs to be taken to avoid problems

- Need to understand what is important in each installation

- This is hard, there is no one right answer!

• There may be different needs and options in a development

environment than a deployed system

• Shouldn’t forget that each Shelf Manager can/should be

connected to a network too- Separate external connection for management

• There is some tension between IP and ATCA with regard to

the network - IP tends to think of LANs as stable collections of nodes

- ATCA expects nodes to appear and disappear with little or no warning

- This can have operational implications

Page 3: RCE Platform Technology (RPT) Gregg Thayer (jgt@slac.stanford.edu) Shelf Installation, Networking, and Administration V3

3

Shelf Manager Networking

• The specifics of Shelf Manager

configuration are vendor specific- Read your Shelf Manager

documentation!

• Many allow various networking

options- Static or Dynamic IP Address

• Our suggestions- Use a static IP address for your

Shelf Manager

- Don’t place the Shelf Manager on

the same network as the RCEs

- Consider storing the Shelf

Manager IP information in the

Shelf FRU Information

Page 4: RCE Platform Technology (RPT) Gregg Thayer (jgt@slac.stanford.edu) Shelf Installation, Networking, and Administration V3

4

Shelf Networking Overview

When considering how to integrate the Shelf of COBs into an external network,

you should consider

• How much bandwidth out of the Shelf do you require?

- How many external network connections are required

• If more than one connection is to be used, the external network must be configured in such a way

as to prevent Layer 2 loops

• How will the Cluster Element IP addresses be assigned?

- All DPM RCE IP addresses are assigned via DHCP

- The DTMs do not use DHCP, see next slide

- The DHCP server can reside in ONE of the DTMs or external to the Shelf

- The internal DHCP server is enabled with a DIP switch on the COB

• This state is indicated by the Amber LED on the Faceplate

• Best practice: Isolate the Shelf LAN from the outside world

- Keep unintended external traffic from impacting the function of the clusters

- Keep unintended internal traffic from impacting the wider network

- RPT provides means of achieving this

Page 5: RCE Platform Technology (RPT) Gregg Thayer (jgt@slac.stanford.edu) Shelf Installation, Networking, and Administration V3

5

DHCP

• An RPT defined Record placed in the Shelf FRU Information defines the

range of IP addresses available to the Shelf

- The DTMs self-assign their IP address based on their physical slot number and

the IP address range found in the Shelf IP Info Record

- If the DHCP server is internal to the Shelf, the IP addresses it assigns are taken

from the range specified in the Shelf IP Info Record

- If the DHCP server is to be external, it is crucial that the IP addresses assigned to

the DPM RCEs are on the same network as the IP addresses of the DTM RCEs

• Best practice: Don’t use the RCE MAC addresses to assign IP addresses

- Retain the flexibility to move/replace boards without having to reconfigure the

DHCP server

• Don’t underestimate the administrative burden

• Why add an unnecessary step to swapping in a spare board?

- The actual IP addresses assigned are not so important

• The SDK provides tools which allow you to interact with the RCEs based on their location

Page 6: RCE Platform Technology (RPT) Gregg Thayer (jgt@slac.stanford.edu) Shelf Installation, Networking, and Administration V3

6

Development vs. Production

• The development environment is different than the production environment- It is very reasonable for the networks to be fundamentally different

• The development environment is dynamic- The production system might include many COBs in many Shelves

• Development probably starts with a single RCE

• As development continues RCEs and COBs may be added

• The production environment is stable- The network in the two cases can and probably should be different

• In production, there may be restrictions on how the you are allowed to configure the network- In Development there is usually more freedom to do whatever you need

• Example: DHCP- In a developments systems, COBs may be moving in and out of the Shelf

• Might prefer the DHCP server outside the Shelf (The host machine may be more stable)

- In production, COBs should be stable• Having DHCP inside the Shelf scales better than DHCP outside the Shelf

• Example: Should a single external network connection be 1G or 10G Ethernet?- It’s not all about bandwidth!

- 1G is cheaper and easier to find external networking hardware for

- 10G doesn’t require the Cluster Interconnect to perform rate matching between the internal Shelf network and

the External connection

Page 7: RCE Platform Technology (RPT) Gregg Thayer (jgt@slac.stanford.edu) Shelf Installation, Networking, and Administration V3

7

Shelf IP Information Record

rddev110:bin$ display_shelf_ip_info --shelf collins-sm

Shelf FRU Info previous commit timestamp is Wed Dec 31 16:00:00 1969

===================================================== Shelf IP Info----------------------------------------------------- SLAC Shelf IP Information (collins-sm)----------------------------------------------------- VLAN ID (Valid): 001 (1) Discard Untagged: 0 Discard Tagged: 1 Discard Boundary Viol: 0 Group Base: 192.168.209.1 Group End: 192.168.209.254 Subnet Mask: 255.255.255.0 Gateway: 0.0.0.0-----------------------------------------------------

• The Shelf IP Information Record is an OEM

record which is loaded into the Shelf FRU

Info

- Utilities for loading, displaying, and deleting

this record are included in the SDK

• Used to specify the pool of IP addresses

available to the onboard DHCP server

- Group Base – Group End

• Used to specify the DTM IP addresses

- DTM addresses are filled from the “top” of the

range following the rule

• IP address = (Group End) - (Physical Slot)

• Used to identify VLAN used by the shelf and

rules for discarding packets based on VLAN

ID

- Discard all VLAN tagged packets

- Discard all VLAN untagged packets

- Discard all packets whose VLAN tag doesn’t

match the shelf VLAN

Page 8: RCE Platform Technology (RPT) Gregg Thayer (jgt@slac.stanford.edu) Shelf Installation, Networking, and Administration V3

8

Setting the Shelf IP Information Record

Usage is: set_shelf_ip_info --shelf <shelf_ip> [OPTIONS]

Required arguments: -s,--shelf=SHELF_IP IP address of shelf with FRU info to set

Optional arguments: When specified w/o parameters, the optional arguments will assume their default values. When omitted, the values present in the Shelf FRU Information will be retained.

-i, --vlan=VLAN_ID VLAN ID (12-bits) -u, --untagged=DISCARD When DISCARD is 1, incoming untagged frames will be discarded. -t, --tagged=DISCARD When DISCARD is 1, incoming tagged frames will be discarded. -x, --boundary-violations=DISCARD When DISCARD is 1, incoming boundary violations are discarded. -b, --group-base=BASE_IP BASE_IP is the base address for the block of addresses available

to the shelf. Host ID of all ones or all zeros are not permitted.

-e, --group-end=END_IP END_IP is the end address for

the block of addresses available to the shelf. Host ID of all ones

or all zeros are not permitted. -m, --subnet-mask=SUBNET_MASK SUBNET_MASK is the subnet mask

for the shelf. It defines the network assigned to the shelf.

-g, --gateway=GATEWAY_IP GATEWAY_IP is the IP address

of the gateway -d, --dry-run When this option is used, the

nothing is written back to the shelf. -f, --force When this option is used, write to

the shelf even if there are no changes. When there is no record present in the shelf, this can be used w/o any other configuration options to create a default record

-v, --verbose=VERBOSITY VEBOSITY can range from 1 to 3.

Return Value: 0 if OK 1 if there is a problem with the arguments provided 2 if there are problems communicating with the shelf

• Command included in the SDK

set_shelf_ip_info is used to

write/modify the Shelf IP Information

Record

• group-base and group-end cannot be

the lowest or highest addresses in the

network

• The number of addresses between

group-base and group-end must be large

enough for each configuration- Internal DHCP

• All RCEs in the Shelf and any external

- External DHCP• All DTMs

• Won’t let you specify an inconsistent set

of IP parameters- All addresses must be on the same

network

- Consistent with the subnet mask

Page 9: RCE Platform Technology (RPT) Gregg Thayer (jgt@slac.stanford.edu) Shelf Installation, Networking, and Administration V3

9

Example Topology 1 (single external connection)

• In this configuration one SFP on one

COB is connected to an external

host

• This host is dual homed- Packets pulled from Shelf can

undergo processing/steering to the

outside world

- Don’t bridge the two interfaces!• Defeats the purpose of dual homing,

may as well put the Shelf on the wider

network

• Shelf network can reside on a VLAN

for extra protection against

unnecessary broadcast traffic and

flooding

Page 10: RCE Platform Technology (RPT) Gregg Thayer (jgt@slac.stanford.edu) Shelf Installation, Networking, and Administration V3

10

Example Topology 2 (multiple external connections)

• If more bandwidth out of the shelf is

required connect more COBs

• Most straightforward configuration is

one external host per external

connection- Again dual homed

- Now more critical that the two

interfaces are not bridged to avoid

Layer 2 Loops if the outside

interfaces reside on the same

network

• A switch could be substituted for the

set of hosts, but great care must be

used in configuring it to avoid Layer

2 Loops- For now, you’ll have to know what

you’re doing!

Page 11: RCE Platform Technology (RPT) Gregg Thayer (jgt@slac.stanford.edu) Shelf Installation, Networking, and Administration V3

Example:One external host

Page 12: RCE Platform Technology (RPT) Gregg Thayer (jgt@slac.stanford.edu) Shelf Installation, Networking, and Administration V3

12

Configuring an ASIS Shelf Manager

shmm500 login: rootPassword:# clia setlanconfig 1 3 “172.21.6.22”# clia setlanconfig 1 6 “255.255.255.0”# clia setlanconfig 1 12 “172.21.6.1”

• BEWARE! Configuring a Shelf Manager is vendor specific

- That said, most vendors base their systems on a Pigeon Point System solution, so

something very like this will probably work

- Read the documentation for your shelf

- We’ll use the Shelf Manager console port

• Set a static IP address for the Shelf Manager and store it in the Shelf FRU Information

• Set the shelf Address- Intended to indicate the physical location of the shelf

- Used by the RCE to identify its location

shmm500 login: rootPassword:# clia shelfaddress “darwin”

Page 13: RCE Platform Technology (RPT) Gregg Thayer (jgt@slac.stanford.edu) Shelf Installation, Networking, and Administration V3

13

Setting the Shelf IP Info Record

$ set_shelf_ip_info --shelf="collins-sm" --vlan=1 --untagged=0 --tagged=1 \--boundary-violations=0 --group-base=192.168.209.1 --group-end=192.168.209.254 \--subnet-mask=255.255.255.0 --gateway=0.0.0.0 –v

===================================================== New Shelf IP Info----------------------------------------------------- SLAC Shelf IP Information (collins-sm)----------------------------------------------------- VLAN ID (Valid): 001 (1) Discard Untagged: 0 Discard Tagged: 1 Discard Boundary Viol: 0 Group Base: 192.168.209.1 Group End: 192.168.209.254 Subnet Mask: 255.255.255.0 Gateway: 0.0.0.0-----------------------------------------------------Shelf FRU Info Lock timestamp is Fri Oct 1 08:45:33 1971Shelf FRU Info Commit timestamp is Fri Oct 1 08:45:41 1971

• Use the set_shelf_ip_info command from the SDK- IMPORTANT!!!! After updating the Shelf IP Record you will need to reboot the Shelf Manager

• To update the cached Shelf FRU Information

- IMPORTANT!!! After rebooting the Shelf Manager you should send a Cold Data Reset command to all the

COBs• To update the cached Shelf IP Record in the IPMC

Page 14: RCE Platform Technology (RPT) Gregg Thayer (jgt@slac.stanford.edu) Shelf Installation, Networking, and Administration V3

14

Configuring a COB to run a DHCP server

• To enable the DHCP server

on one COB set the DIP

switch #4 to OFF

- DIP switch is located near

the Zone 1 connector

• The Amber LED should light

immediately

• The DHCP server will be

started after the DTM is

rebooted

Page 15: RCE Platform Technology (RPT) Gregg Thayer (jgt@slac.stanford.edu) Shelf Installation, Networking, and Administration V3

15

Inserting/Extracting a COB

• The IPMC coordinates insertion and extraction with the user with the Blue LED and a switch

activated by the handle- Insertion

• As soon as the board is inserted (and management power is present) the Blue LED lights

• When the handle switch closes the Blue LED BLINKS- Means the IPMC has contacted the Shelf Manager and is awaiting permission to activate

• When the board activates, the Blue LED turns OFF

- Extraction• The user indicates a desire to remove the board by opening the handles (but not applying pressure)

• The IPMC requests permission from the Shelf Manager to deactivate- Causes the LED to BLINK

• When permission is granted, the IPMC removes payload power from the board- When complete, the Blue LED remains ON

• The Shelf Manager can also deactivate the board without user intervention, the Blue LED will

reflect that

• The board can also be activated and deactivated remotely via commands to the Shelf Manager- So long as the “insertion criterion” are also met (handle switch closed, etc.)

Page 16: RCE Platform Technology (RPT) Gregg Thayer (jgt@slac.stanford.edu) Shelf Installation, Networking, and Administration V3

16

Connecting the COB to the Network

• The Ethernet switch on the COB can be connected to the “lab network”

through the faceplate Ethernet ports or through the faceplate Ethernet

ports of another board in the Shelf- COBs are connected through the ATCA Zone 2 backplane

• On COBs with the current switch technology, only a single 10G and/or

1G SFP are functional- Designated by labels on the faceplate

• The Shelf Manager ensures through the ATCA E-keying mechanism that

all active COBs are notified of the presence of all other active COBs- If one COB has a faceplate Ethernet connection, all other COBs in that shelf

will have access to it by default

Page 17: RCE Platform Technology (RPT) Gregg Thayer (jgt@slac.stanford.edu) Shelf Installation, Networking, and Administration V3

17

COB Status

• The ATCA specified LEDs are Green, Red, and Amber- Green LED is lit when every present RCE is fully booted

- Red LED is lit when any present RCE is not fully booted

- Amber LED is lit when the COB is configured to host the DHCP server

• The two pushbuttons allow two different types of reset- Reboot will cause each RCE to reboot and reload its FPGA fabric

• The same as sending the command cob_rce_reset <shelf-manager>/<slot>

- Reset will reset the IPMC, clearing all state information• If RCEs are powered, they will be briefly turned off

• Moral equivalent of extracting and inserting board

• The same as sending the command cob_cold_data_reset <shelf-manager>/<slot>

- Both of these functions are also available via remote commands

Page 18: RCE Platform Technology (RPT) Gregg Thayer (jgt@slac.stanford.edu) Shelf Installation, Networking, and Administration V3

18

COB LCD Display

• The LCD display cycles every two seconds- If you’re standing in front of the Shelf, you can get state information

• Displays information for every occupied bay and present RCE

• COB Summary

• Hardware Version

• IPMC Switch

• IPMC Version

• ATCA Display

• FRU State

• Shelf Address/Physical Slot

• Bay Display

• Bay Name

• ID

• Voltage Status

• Power Draw

• RCE Display

• RCE Location

• Board/Junction Temps

• RCE State

Page 19: RCE Platform Technology (RPT) Gregg Thayer (jgt@slac.stanford.edu) Shelf Installation, Networking, and Administration V3

19

COB Faceplate USB

• There are two micro-USB ports on the COB faceplate

• Console- 2 USB devices, both 115200, 8N1, no flow control

- DTM/RCE0 serial console

- IPMC serial console

• JTAG- DTM JTAG

- Digalent device compatible with Xilinx tools

Page 20: RCE Platform Technology (RPT) Gregg Thayer (jgt@slac.stanford.edu) Shelf Installation, Networking, and Administration V3

20

IPMC Console

================================================================================| BAY |Pre|Ena|Vok| Voltage | Current | Power | Alloc | ID |--------------------------------------------------------------------------------| DPM0 | Y | Y | Y | 11.952V | 1346mA | 16.087W | 35.0W | 4d0000007ea94970 || DPM1 | Y | Y | Y | 12.012V | 1350mA | 16.216W | 35.0W | bf0000007ef52e70 || DPM2 | Y | Y | Y | 11.988V | 1366mA | 16.375W | 35.0W | eb0000007ea67670 || DPM3 | Y | Y | Y | 11.910V | 1373mA | 16.352W | 35.0W | 500000007e98a570 || DTM | Y | Y | Y | 11.976V | 676mA | 8.095W | 15.0W | b20000007ed1ae70 || RTM | N | N | | 11.958V | 3mA | 0.035W | | || CEN | Y | Y | Y | 12.006V | 1526mA | 18.321W | 24.0W | 2c0000007e959970 |================================================================================================================================================================| RCE |Ena|Rst|Rdy|Dne|Vok|BTemp|JTemp| RCE State |--------------------------------------------------------------------------------| DPM0: RCE0 | Y | N | Y | Y | Y | 33C | 47C | RUNNING || RCE2 | Y | N | Y | Y | Y | 36C | 48C | RUNNING || DPM1: RCE0 | Y | N | Y | Y | Y | 35C | 50C | RUNNING || RCE2 | Y | N | Y | Y | Y | 40C | 50C | RUNNING || DPM2: RCE0 | Y | N | Y | Y | Y | 33C | 51C | RUNNING || RCE2 | Y | N | Y | Y | Y | 38C | 47C | RUNNING || DPM3: RCE0 | Y | N | Y | Y | Y | 32C | 49C | RUNNING || RCE2 | Y | N | Y | Y | Y | 36C | 45C | RUNNING || DTM: RCE0 | Y | N | Y | Y | Y | 37C | 46C | RUNNING |================================================================================

• The Console intended to be

used for COB development- We don’t expect end users

to connect to it

• It displays things like state

changes and responses to

Shelf Manager commands

• Can also be configured to

periodically dump blocks of

summary information

• There is a more convenient

way to get this summary

information...

Page 21: RCE Platform Technology (RPT) Gregg Thayer (jgt@slac.stanford.edu) Shelf Installation, Networking, and Administration V3

21

cob_dump Command

$ cob_dump collins-sm –bay================================================================================| BAY |Pre|Ena|Vok |Voltage | Current | Power | Alloc | ID |================================================================================| collins-sm/1 |--------------------------------------------------------------------------------| DPM0 | Y | Y | Y | 12.084V | 893m | 10.791W | 35.0W | 120000007eb55a70 || DPM1 | Y | Y | Y | 11.952V | 880m | 10.517W | 35.0W | 250000007ed77470 || DPM2 | Y | Y | Y | 11.970V | 880m | 10.533W | 35.0W | 130000007eda4570 || DPM3 | Y | Y | Y | 11.994V | 876m | 10.506W | 35.0W | 8e0000007ece7b70 || DTM | Y | Y | Y | 12.036V | 673m | 8.100W | 15.0W | 7f0000007ea77570 || RTM | Y | Y | | 11.994V | 113m | 1.355W | 16.0W | 7a00000019665870 || CEN | Y | Y | Y | 12.054V | 2410m | 29.050W | 24.0W | 090000007e9ddb70 |--------------------------------------------------------------------------------| collins-sm/2 |--------------------------------------------------------------------------------| DPM0 | Y | Y | Y | 11.958V | 883m | 10.558W | 35.0W | 8b0000007e979670 || DPM1 | Y | Y | Y | 11.904V | 906m | 10.785W | 35.0W | 500000007e98a570 || DPM2 | Y | Y | Y | 11.886V | 890m | 10.578W | 35.0W | 160000007eb9d070 || DPM3 | Y | Y | Y | 11.898V | 870m | 10.351W | 35.0W | 4c0000007ed2bb70 || DTM | Y | Y | Y | 11.934V | 676m | 8.067W | 15.0W | 680000007ef2bf70 || RTM | Y | Y | | 11.856V | 110m | 1.304W | 16.0W | 7e00000047654c70 || CEN | Y | Y | Y | 11.868V | 2310m | 27.415W | 24.0W | 620000007e978070 |--------------------------------------------------------------------------------================================================================================

$ cob_dump collins-sm/1/2 –rce================================================================================| RCE |Ena|Rst|Rdy|Dne|Vok|BTemp|JTemp| RCE State |================================================================================| collins-sm/1 |--------------------------------------------------------------------------------| DPM2: RCE0 | Y | N | Y | Y | Y | 43 | 57 | 0x00 || RCE2 | Y | N | Y | Y | Y | 49 | 54 | 0x00 |--------------------------------------------------------------------------------================================================================================

• The status information visible

through the IPMC Console is

available with the cob_dump and

cob_dump_bsi commands

• These commands work by

communicating to the IPMC

controllers on the COBs targeted- Communication is through IPMI

bridged through the Shelf

Manager

• You can retrieve data on multiple

COBs/RCEs/Shelves with one

command

• Uses the same arguments as the

other cob_ipmc commands in the

SDK- Which has been covered in the

SDK tools talk

Page 22: RCE Platform Technology (RPT) Gregg Thayer (jgt@slac.stanford.edu) Shelf Installation, Networking, and Administration V3

22

Rear Transition Module

• There can be many different types of RTMs but they will all have some

common features

• They mirror the Green, Red, and Amber LEDs on the COB- Allows one to asses the state of the COB from the front or back of the shelf

• The RTM Blue LED is used to manage the hot swap of the RTM

specifically- The RTM can be extracted in the same manner as the COB

• Release the handle switch

• Wait for the Blue LED to be solid ON

• Extract RTM

- If the COB is deactivated, the RTM will also be deactivated

- If the RTM is deactivated, the COB need not be deactivated

• All other connection/indicators on the RTM are application specific

• There are a lot of very small pins on the RTM connector, before

inserting the RTM, always make sure to check for bent pins

Page 23: RCE Platform Technology (RPT) Gregg Thayer (jgt@slac.stanford.edu) Shelf Installation, Networking, and Administration V3

23

Summary

• Back-end networking is an integral part of a useful system

• There is no one right answer

• The development environment will very likely have different

needs and constraints than the production environment

• Some decisions you’ll have to make- How many connections to the back-end from each Shelf?

- What speed are they?

- Where will the DHCP server run?• Regardless, there needs to be an RPT defined Shelf IP Info record

added to the Shelf FRU Information

• The RCE on the DTM will always derive its IP address from this range,

and not use DHCP

Page 24: RCE Platform Technology (RPT) Gregg Thayer (jgt@slac.stanford.edu) Shelf Installation, Networking, and Administration V3