rce platform technology (rpt) gregg thayer ([email protected]) shelf installation, networking,...
TRANSCRIPT
RCE Platform Technology (RPT)
Gregg Thayer ([email protected])
Shelf Installation, Networking, and Administration
V3
2
Overview
• The COB is a cluster of networked compute elements
• ATCA allows these clusters to be densely networked
• The ATCA Shelf does not exist in isolation- We want the data to make it out
• The external network is an integral part of the COB system- They must be configured together
• The external network has the capacity to adversely affect the
performance of the internal Shelf network- The COB cannot absolutely protect itself from a poorly designed
external network
- Care needs to be taken to avoid problems
- Need to understand what is important in each installation
- This is hard, there is no one right answer!
• There may be different needs and options in a development
environment than a deployed system
• Shouldn’t forget that each Shelf Manager can/should be
connected to a network too- Separate external connection for management
• There is some tension between IP and ATCA with regard to
the network - IP tends to think of LANs as stable collections of nodes
- ATCA expects nodes to appear and disappear with little or no warning
- This can have operational implications
3
Shelf Manager Networking
• The specifics of Shelf Manager
configuration are vendor specific- Read your Shelf Manager
documentation!
• Many allow various networking
options- Static or Dynamic IP Address
• Our suggestions- Use a static IP address for your
Shelf Manager
- Don’t place the Shelf Manager on
the same network as the RCEs
- Consider storing the Shelf
Manager IP information in the
Shelf FRU Information
4
Shelf Networking Overview
When considering how to integrate the Shelf of COBs into an external network,
you should consider
• How much bandwidth out of the Shelf do you require?
- How many external network connections are required
• If more than one connection is to be used, the external network must be configured in such a way
as to prevent Layer 2 loops
• How will the Cluster Element IP addresses be assigned?
- All DPM RCE IP addresses are assigned via DHCP
- The DTMs do not use DHCP, see next slide
- The DHCP server can reside in ONE of the DTMs or external to the Shelf
- The internal DHCP server is enabled with a DIP switch on the COB
• This state is indicated by the Amber LED on the Faceplate
• Best practice: Isolate the Shelf LAN from the outside world
- Keep unintended external traffic from impacting the function of the clusters
- Keep unintended internal traffic from impacting the wider network
- RPT provides means of achieving this
5
DHCP
• An RPT defined Record placed in the Shelf FRU Information defines the
range of IP addresses available to the Shelf
- The DTMs self-assign their IP address based on their physical slot number and
the IP address range found in the Shelf IP Info Record
- If the DHCP server is internal to the Shelf, the IP addresses it assigns are taken
from the range specified in the Shelf IP Info Record
- If the DHCP server is to be external, it is crucial that the IP addresses assigned to
the DPM RCEs are on the same network as the IP addresses of the DTM RCEs
• Best practice: Don’t use the RCE MAC addresses to assign IP addresses
- Retain the flexibility to move/replace boards without having to reconfigure the
DHCP server
• Don’t underestimate the administrative burden
• Why add an unnecessary step to swapping in a spare board?
- The actual IP addresses assigned are not so important
• The SDK provides tools which allow you to interact with the RCEs based on their location
6
Development vs. Production
• The development environment is different than the production environment- It is very reasonable for the networks to be fundamentally different
• The development environment is dynamic- The production system might include many COBs in many Shelves
• Development probably starts with a single RCE
• As development continues RCEs and COBs may be added
• The production environment is stable- The network in the two cases can and probably should be different
• In production, there may be restrictions on how the you are allowed to configure the network- In Development there is usually more freedom to do whatever you need
• Example: DHCP- In a developments systems, COBs may be moving in and out of the Shelf
• Might prefer the DHCP server outside the Shelf (The host machine may be more stable)
- In production, COBs should be stable• Having DHCP inside the Shelf scales better than DHCP outside the Shelf
• Example: Should a single external network connection be 1G or 10G Ethernet?- It’s not all about bandwidth!
- 1G is cheaper and easier to find external networking hardware for
- 10G doesn’t require the Cluster Interconnect to perform rate matching between the internal Shelf network and
the External connection
7
Shelf IP Information Record
rddev110:bin$ display_shelf_ip_info --shelf collins-sm
Shelf FRU Info previous commit timestamp is Wed Dec 31 16:00:00 1969
===================================================== Shelf IP Info----------------------------------------------------- SLAC Shelf IP Information (collins-sm)----------------------------------------------------- VLAN ID (Valid): 001 (1) Discard Untagged: 0 Discard Tagged: 1 Discard Boundary Viol: 0 Group Base: 192.168.209.1 Group End: 192.168.209.254 Subnet Mask: 255.255.255.0 Gateway: 0.0.0.0-----------------------------------------------------
• The Shelf IP Information Record is an OEM
record which is loaded into the Shelf FRU
Info
- Utilities for loading, displaying, and deleting
this record are included in the SDK
• Used to specify the pool of IP addresses
available to the onboard DHCP server
- Group Base – Group End
• Used to specify the DTM IP addresses
- DTM addresses are filled from the “top” of the
range following the rule
• IP address = (Group End) - (Physical Slot)
• Used to identify VLAN used by the shelf and
rules for discarding packets based on VLAN
ID
- Discard all VLAN tagged packets
- Discard all VLAN untagged packets
- Discard all packets whose VLAN tag doesn’t
match the shelf VLAN
8
Setting the Shelf IP Information Record
Usage is: set_shelf_ip_info --shelf <shelf_ip> [OPTIONS]
Required arguments: -s,--shelf=SHELF_IP IP address of shelf with FRU info to set
Optional arguments: When specified w/o parameters, the optional arguments will assume their default values. When omitted, the values present in the Shelf FRU Information will be retained.
-i, --vlan=VLAN_ID VLAN ID (12-bits) -u, --untagged=DISCARD When DISCARD is 1, incoming untagged frames will be discarded. -t, --tagged=DISCARD When DISCARD is 1, incoming tagged frames will be discarded. -x, --boundary-violations=DISCARD When DISCARD is 1, incoming boundary violations are discarded. -b, --group-base=BASE_IP BASE_IP is the base address for the block of addresses available
to the shelf. Host ID of all ones or all zeros are not permitted.
-e, --group-end=END_IP END_IP is the end address for
the block of addresses available to the shelf. Host ID of all ones
or all zeros are not permitted. -m, --subnet-mask=SUBNET_MASK SUBNET_MASK is the subnet mask
for the shelf. It defines the network assigned to the shelf.
-g, --gateway=GATEWAY_IP GATEWAY_IP is the IP address
of the gateway -d, --dry-run When this option is used, the
nothing is written back to the shelf. -f, --force When this option is used, write to
the shelf even if there are no changes. When there is no record present in the shelf, this can be used w/o any other configuration options to create a default record
-v, --verbose=VERBOSITY VEBOSITY can range from 1 to 3.
Return Value: 0 if OK 1 if there is a problem with the arguments provided 2 if there are problems communicating with the shelf
• Command included in the SDK
set_shelf_ip_info is used to
write/modify the Shelf IP Information
Record
• group-base and group-end cannot be
the lowest or highest addresses in the
network
• The number of addresses between
group-base and group-end must be large
enough for each configuration- Internal DHCP
• All RCEs in the Shelf and any external
- External DHCP• All DTMs
• Won’t let you specify an inconsistent set
of IP parameters- All addresses must be on the same
network
- Consistent with the subnet mask
9
Example Topology 1 (single external connection)
• In this configuration one SFP on one
COB is connected to an external
host
• This host is dual homed- Packets pulled from Shelf can
undergo processing/steering to the
outside world
- Don’t bridge the two interfaces!• Defeats the purpose of dual homing,
may as well put the Shelf on the wider
network
• Shelf network can reside on a VLAN
for extra protection against
unnecessary broadcast traffic and
flooding
10
Example Topology 2 (multiple external connections)
• If more bandwidth out of the shelf is
required connect more COBs
• Most straightforward configuration is
one external host per external
connection- Again dual homed
- Now more critical that the two
interfaces are not bridged to avoid
Layer 2 Loops if the outside
interfaces reside on the same
network
• A switch could be substituted for the
set of hosts, but great care must be
used in configuring it to avoid Layer
2 Loops- For now, you’ll have to know what
you’re doing!
Example:One external host
12
Configuring an ASIS Shelf Manager
shmm500 login: rootPassword:# clia setlanconfig 1 3 “172.21.6.22”# clia setlanconfig 1 6 “255.255.255.0”# clia setlanconfig 1 12 “172.21.6.1”
• BEWARE! Configuring a Shelf Manager is vendor specific
- That said, most vendors base their systems on a Pigeon Point System solution, so
something very like this will probably work
- Read the documentation for your shelf
- We’ll use the Shelf Manager console port
• Set a static IP address for the Shelf Manager and store it in the Shelf FRU Information
• Set the shelf Address- Intended to indicate the physical location of the shelf
- Used by the RCE to identify its location
shmm500 login: rootPassword:# clia shelfaddress “darwin”
13
Setting the Shelf IP Info Record
$ set_shelf_ip_info --shelf="collins-sm" --vlan=1 --untagged=0 --tagged=1 \--boundary-violations=0 --group-base=192.168.209.1 --group-end=192.168.209.254 \--subnet-mask=255.255.255.0 --gateway=0.0.0.0 –v
===================================================== New Shelf IP Info----------------------------------------------------- SLAC Shelf IP Information (collins-sm)----------------------------------------------------- VLAN ID (Valid): 001 (1) Discard Untagged: 0 Discard Tagged: 1 Discard Boundary Viol: 0 Group Base: 192.168.209.1 Group End: 192.168.209.254 Subnet Mask: 255.255.255.0 Gateway: 0.0.0.0-----------------------------------------------------Shelf FRU Info Lock timestamp is Fri Oct 1 08:45:33 1971Shelf FRU Info Commit timestamp is Fri Oct 1 08:45:41 1971
• Use the set_shelf_ip_info command from the SDK- IMPORTANT!!!! After updating the Shelf IP Record you will need to reboot the Shelf Manager
• To update the cached Shelf FRU Information
- IMPORTANT!!! After rebooting the Shelf Manager you should send a Cold Data Reset command to all the
COBs• To update the cached Shelf IP Record in the IPMC
14
Configuring a COB to run a DHCP server
• To enable the DHCP server
on one COB set the DIP
switch #4 to OFF
- DIP switch is located near
the Zone 1 connector
• The Amber LED should light
immediately
• The DHCP server will be
started after the DTM is
rebooted
15
Inserting/Extracting a COB
• The IPMC coordinates insertion and extraction with the user with the Blue LED and a switch
activated by the handle- Insertion
• As soon as the board is inserted (and management power is present) the Blue LED lights
• When the handle switch closes the Blue LED BLINKS- Means the IPMC has contacted the Shelf Manager and is awaiting permission to activate
• When the board activates, the Blue LED turns OFF
- Extraction• The user indicates a desire to remove the board by opening the handles (but not applying pressure)
• The IPMC requests permission from the Shelf Manager to deactivate- Causes the LED to BLINK
• When permission is granted, the IPMC removes payload power from the board- When complete, the Blue LED remains ON
• The Shelf Manager can also deactivate the board without user intervention, the Blue LED will
reflect that
• The board can also be activated and deactivated remotely via commands to the Shelf Manager- So long as the “insertion criterion” are also met (handle switch closed, etc.)
16
Connecting the COB to the Network
• The Ethernet switch on the COB can be connected to the “lab network”
through the faceplate Ethernet ports or through the faceplate Ethernet
ports of another board in the Shelf- COBs are connected through the ATCA Zone 2 backplane
• On COBs with the current switch technology, only a single 10G and/or
1G SFP are functional- Designated by labels on the faceplate
• The Shelf Manager ensures through the ATCA E-keying mechanism that
all active COBs are notified of the presence of all other active COBs- If one COB has a faceplate Ethernet connection, all other COBs in that shelf
will have access to it by default
17
COB Status
• The ATCA specified LEDs are Green, Red, and Amber- Green LED is lit when every present RCE is fully booted
- Red LED is lit when any present RCE is not fully booted
- Amber LED is lit when the COB is configured to host the DHCP server
• The two pushbuttons allow two different types of reset- Reboot will cause each RCE to reboot and reload its FPGA fabric
• The same as sending the command cob_rce_reset <shelf-manager>/<slot>
- Reset will reset the IPMC, clearing all state information• If RCEs are powered, they will be briefly turned off
• Moral equivalent of extracting and inserting board
• The same as sending the command cob_cold_data_reset <shelf-manager>/<slot>
- Both of these functions are also available via remote commands
18
COB LCD Display
• The LCD display cycles every two seconds- If you’re standing in front of the Shelf, you can get state information
• Displays information for every occupied bay and present RCE
• COB Summary
• Hardware Version
• IPMC Switch
• IPMC Version
• ATCA Display
• FRU State
• Shelf Address/Physical Slot
• Bay Display
• Bay Name
• ID
• Voltage Status
• Power Draw
• RCE Display
• RCE Location
• Board/Junction Temps
• RCE State
19
COB Faceplate USB
• There are two micro-USB ports on the COB faceplate
• Console- 2 USB devices, both 115200, 8N1, no flow control
- DTM/RCE0 serial console
- IPMC serial console
• JTAG- DTM JTAG
- Digalent device compatible with Xilinx tools
20
IPMC Console
================================================================================| BAY |Pre|Ena|Vok| Voltage | Current | Power | Alloc | ID |--------------------------------------------------------------------------------| DPM0 | Y | Y | Y | 11.952V | 1346mA | 16.087W | 35.0W | 4d0000007ea94970 || DPM1 | Y | Y | Y | 12.012V | 1350mA | 16.216W | 35.0W | bf0000007ef52e70 || DPM2 | Y | Y | Y | 11.988V | 1366mA | 16.375W | 35.0W | eb0000007ea67670 || DPM3 | Y | Y | Y | 11.910V | 1373mA | 16.352W | 35.0W | 500000007e98a570 || DTM | Y | Y | Y | 11.976V | 676mA | 8.095W | 15.0W | b20000007ed1ae70 || RTM | N | N | | 11.958V | 3mA | 0.035W | | || CEN | Y | Y | Y | 12.006V | 1526mA | 18.321W | 24.0W | 2c0000007e959970 |================================================================================================================================================================| RCE |Ena|Rst|Rdy|Dne|Vok|BTemp|JTemp| RCE State |--------------------------------------------------------------------------------| DPM0: RCE0 | Y | N | Y | Y | Y | 33C | 47C | RUNNING || RCE2 | Y | N | Y | Y | Y | 36C | 48C | RUNNING || DPM1: RCE0 | Y | N | Y | Y | Y | 35C | 50C | RUNNING || RCE2 | Y | N | Y | Y | Y | 40C | 50C | RUNNING || DPM2: RCE0 | Y | N | Y | Y | Y | 33C | 51C | RUNNING || RCE2 | Y | N | Y | Y | Y | 38C | 47C | RUNNING || DPM3: RCE0 | Y | N | Y | Y | Y | 32C | 49C | RUNNING || RCE2 | Y | N | Y | Y | Y | 36C | 45C | RUNNING || DTM: RCE0 | Y | N | Y | Y | Y | 37C | 46C | RUNNING |================================================================================
• The Console intended to be
used for COB development- We don’t expect end users
to connect to it
• It displays things like state
changes and responses to
Shelf Manager commands
• Can also be configured to
periodically dump blocks of
summary information
• There is a more convenient
way to get this summary
information...
21
cob_dump Command
$ cob_dump collins-sm –bay================================================================================| BAY |Pre|Ena|Vok |Voltage | Current | Power | Alloc | ID |================================================================================| collins-sm/1 |--------------------------------------------------------------------------------| DPM0 | Y | Y | Y | 12.084V | 893m | 10.791W | 35.0W | 120000007eb55a70 || DPM1 | Y | Y | Y | 11.952V | 880m | 10.517W | 35.0W | 250000007ed77470 || DPM2 | Y | Y | Y | 11.970V | 880m | 10.533W | 35.0W | 130000007eda4570 || DPM3 | Y | Y | Y | 11.994V | 876m | 10.506W | 35.0W | 8e0000007ece7b70 || DTM | Y | Y | Y | 12.036V | 673m | 8.100W | 15.0W | 7f0000007ea77570 || RTM | Y | Y | | 11.994V | 113m | 1.355W | 16.0W | 7a00000019665870 || CEN | Y | Y | Y | 12.054V | 2410m | 29.050W | 24.0W | 090000007e9ddb70 |--------------------------------------------------------------------------------| collins-sm/2 |--------------------------------------------------------------------------------| DPM0 | Y | Y | Y | 11.958V | 883m | 10.558W | 35.0W | 8b0000007e979670 || DPM1 | Y | Y | Y | 11.904V | 906m | 10.785W | 35.0W | 500000007e98a570 || DPM2 | Y | Y | Y | 11.886V | 890m | 10.578W | 35.0W | 160000007eb9d070 || DPM3 | Y | Y | Y | 11.898V | 870m | 10.351W | 35.0W | 4c0000007ed2bb70 || DTM | Y | Y | Y | 11.934V | 676m | 8.067W | 15.0W | 680000007ef2bf70 || RTM | Y | Y | | 11.856V | 110m | 1.304W | 16.0W | 7e00000047654c70 || CEN | Y | Y | Y | 11.868V | 2310m | 27.415W | 24.0W | 620000007e978070 |--------------------------------------------------------------------------------================================================================================
$ cob_dump collins-sm/1/2 –rce================================================================================| RCE |Ena|Rst|Rdy|Dne|Vok|BTemp|JTemp| RCE State |================================================================================| collins-sm/1 |--------------------------------------------------------------------------------| DPM2: RCE0 | Y | N | Y | Y | Y | 43 | 57 | 0x00 || RCE2 | Y | N | Y | Y | Y | 49 | 54 | 0x00 |--------------------------------------------------------------------------------================================================================================
• The status information visible
through the IPMC Console is
available with the cob_dump and
cob_dump_bsi commands
• These commands work by
communicating to the IPMC
controllers on the COBs targeted- Communication is through IPMI
bridged through the Shelf
Manager
• You can retrieve data on multiple
COBs/RCEs/Shelves with one
command
• Uses the same arguments as the
other cob_ipmc commands in the
SDK- Which has been covered in the
SDK tools talk
22
Rear Transition Module
• There can be many different types of RTMs but they will all have some
common features
• They mirror the Green, Red, and Amber LEDs on the COB- Allows one to asses the state of the COB from the front or back of the shelf
• The RTM Blue LED is used to manage the hot swap of the RTM
specifically- The RTM can be extracted in the same manner as the COB
• Release the handle switch
• Wait for the Blue LED to be solid ON
• Extract RTM
- If the COB is deactivated, the RTM will also be deactivated
- If the RTM is deactivated, the COB need not be deactivated
• All other connection/indicators on the RTM are application specific
• There are a lot of very small pins on the RTM connector, before
inserting the RTM, always make sure to check for bent pins
23
Summary
• Back-end networking is an integral part of a useful system
• There is no one right answer
• The development environment will very likely have different
needs and constraints than the production environment
• Some decisions you’ll have to make- How many connections to the back-end from each Shelf?
- What speed are they?
- Where will the DHCP server run?• Regardless, there needs to be an RPT defined Shelf IP Info record
added to the Shelf FRU Information
• The RCE on the DTM will always derive its IP address from this range,
and not use DHCP