design overview template

47
1 *All brand and product names are trademarks or registered trademarks of their respective companies. Copyright© Intel Corporation 2002, All rights reserved. OpenSM OpenSM Design Overview Design Overview August 2002 August 2002

Upload: others

Post on 07-Dec-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Design Overview Template

1*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

OpenSMOpenSMDesign OverviewDesign Overview

August 2002August 2002

Page 2: Design Overview Template

2*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

Infiniband ManagementInfiniband ManagementThe Infiniband architecture requires a “subnet The Infiniband architecture requires a “subnet manager” node to configure and maintain an manager” node to configure and maintain an Infiniband fabric.Infiniband fabric.The subnet manager actually consists of several The subnet manager actually consists of several modules. The term “subnet manager” is modules. The term “subnet manager” is commonly used to refer to the node that provides commonly used to refer to the node that provides both a both a Subnet Manager (SM)Subnet Manager (SM) and the and the Subnet Subnet Administrator (SA)Administrator (SA)..The SM configures subnet addresses and switch The SM configures subnet addresses and switch routing tables.routing tables.The SA serves as the subnet’s database, hosting The SA serves as the subnet’s database, hosting queries from other subnet nodes.queries from other subnet nodes.

Page 3: Design Overview Template

3*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

OpenSM FeaturesOpenSM FeaturesOpen sourceOpen sourceObject oriented, extensible design implemented in C.Object oriented, extensible design implemented in C.Embedded documentation with Robodoc.Embedded documentation with Robodoc.Designed for portability to other platforms and Infiniband Designed for portability to other platforms and Infiniband interfaces. Current implementation runs in Linux userinterfaces. Current implementation runs in Linux user--space on top of the open source AL API.space on top of the open source AL API.Supports major SM features: multipathing (LMC > 0), Supports major SM features: multipathing (LMC > 0), partitions, multicast groups, SM failpartitions, multicast groups, SM fail--over.over.Guarantees optimal routing between any two endGuarantees optimal routing between any two end--points points regardless of subnet topology.regardless of subnet topology.Supports commonly used SA queries.Supports commonly used SA queries.Complete logging facilities to the MAD frame level.Complete logging facilities to the MAD frame level.

Page 4: Design Overview Template

4*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

Unsupported FeaturesUnsupported FeaturesThe current developers do not plan to implement The current developers do not plan to implement the following features. These capabilities could the following features. These capabilities could be added by other interested developers.be added by other interested developers.

–– Switches with random forwarding tables.Switches with random forwarding tables.–– Subnet routers.Subnet routers.–– Complete set of SA queries.Complete set of SA queries.–– Virtual lanes.Virtual lanes.–– GUIGUI

Page 5: Design Overview Template

5*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

The OpenSM Object ModelThe OpenSM Object Model

Page 6: Design Overview Template

6*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

The OpenSM Object ModelThe OpenSM Object ModelOpenSM models most of its internal data OpenSM models most of its internal data structures on the physical components in an IB structures on the physical components in an IB fabric, such as nodes, switches, partitions, etc.fabric, such as nodes, switches, partitions, etc.OpenSM objects are linked together to model the OpenSM objects are linked together to model the physical relationships on the IB subnet. For physical relationships on the IB subnet. For example, a switch object contains port objects. example, a switch object contains port objects. Port objects point to other port objects.Port objects point to other port objects.Most OpenSM objects provide functions that hide Most OpenSM objects provide functions that hide the details of the object’s implementation.the details of the object’s implementation.

Page 7: Design Overview Template

7*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

The Component LibraryThe Component LibraryThe Component Library is a portable, open source The Component Library is a portable, open source library of useful functions and “containers” library of useful functions and “containers” implemented in C.implemented in C.The Component Library provides lists, balanced The Component Library provides lists, balanced trees, variable sized arrays, etc.trees, variable sized arrays, etc.OpenSM makes heavy use of the component OpenSM makes heavy use of the component library. In the source, you will notice use of library. In the source, you will notice use of functions starting with: functions starting with: cl_. cl_. These are calls to These are calls to the Component Library.the Component Library.When in doubt, refer to the component library When in doubt, refer to the component library documentation!documentation!

Page 8: Design Overview Template

8*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

OpenSM Objects: A Top Down ViewOpenSM Objects: A Top Down View

The “highest order” OpenSM object is the The “highest order” OpenSM object is the OpenSM object itself.OpenSM object itself.The OpenSM object spans exactly one IB subnet.The OpenSM object spans exactly one IB subnet.The OpenSM object is built from a variety of other The OpenSM object is built from a variety of other objects:objects:

–– One Subnet objectOne Subnet object–– One SM objectOne SM object–– One SA objectOne SA object–– A Log object & many other supporting objectsA Log object & many other supporting objects

The following slides will examine each of these The following slides will examine each of these components in detail.components in detail.

Page 9: Design Overview Template

9*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

The Subnet ObjectThe Subnet ObjectName: osm_subn_tName: osm_subn_tDefined in: osm_subnet.hDefined in: osm_subnet.h

The subnet object models the physical The subnet object models the physical composition of the subnet.composition of the subnet.OpenSM builds the subnet object during OpenSM builds the subnet object during discovery and maintains it during subsequent discovery and maintains it during subsequent sweeps.sweeps.The subnet object is built from many other object The subnet object is built from many other object types, for example:types, for example:

–– Node Objects Node Objects –– represent physical nodesrepresent physical nodes–– Switch Objects Switch Objects –– represent physical switchesrepresent physical switches–– Router Objects Router Objects –– represent physical routersrepresent physical routers–– Partition Objects Partition Objects –– represent logical partitionsrepresent logical partitions–– Multicast Group Objects Multicast Group Objects –– represent multicast groups.represent multicast groups.

Page 10: Design Overview Template

10*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

The Subnet ObjectThe Subnet Object

SSNNSS

SS

SS

NN

NN NNNN

NN

NN

NN

RR

RR

NN

Master

NodeN

S Switch

R Router

Port / Link

This view shows the internal object-to-object linkage that models the physical subnet.

Page 11: Design Overview Template

11*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

Subnet Pointer TablesSubnet Pointer TablesThe subnet object contains tables of pointers of each type The subnet object contains tables of pointers of each type of object in the subnet.of object in the subnet.The pointer tables provide quick access to objects of a The pointer tables provide quick access to objects of a particular type, be it a switch, node port, partition, etc.particular type, be it a switch, node port, partition, etc.Where appropriate, two pointer tables are provided for each Where appropriate, two pointer tables are provided for each type of object:type of object:

–– LidTable LidTable –– pointers sorted by unicast LIDpointers sorted by unicast LID–– GuidTable GuidTable –– pointers sorted by GUIDpointers sorted by GUID

The pointer tables sorted by GUID are implemented using The pointer tables sorted by GUID are implemented using cl_map (redcl_map (red--black trees).black trees).The pointer tables sorted by LID are implemented using The pointer tables sorted by LID are implemented using cl_ptr_vector (variable size array of pointers).cl_ptr_vector (variable size array of pointers).

Page 12: Design Overview Template

12*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

Pointer Table Example Pointer Table Example -- NodesNodes

SSNNSS

SS

SS

NN

NN NNNN

NN

NN

NN

RR

RR

NN

Master

…………332211

PtrPtrLidLid

…………0x7890x7890x4560x4560x1230x123

PtrPtrGUIDGUID

Page 13: Design Overview Template

13*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

SSNNSS

SS

SS

NN

NN NNNN

NN

NN

NN

RR

RR

NN

Master

…………332211

PtrPtrLidLid

…………0x7890x7890x4560x4560x1230x123

PtrPtrGUIDGUID

LMC > 0 is not a special case!

How does LMC > 0 look?How does LMC > 0 look?

Page 14: Design Overview Template

14*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

Node ObjectNode ObjectName: osm_node_tName: osm_node_tDefined in: osm_node.hDefined in: osm_node.h

A node object represents a node on the subnet.A node object represents a node on the subnet.A node may be a Host Channel Adapter (HCA), a A node may be a Host Channel Adapter (HCA), a Target Channel Adapter (TCA), a Switch or a Target Channel Adapter (TCA), a Switch or a Router.Router.A node object is built from other object types:A node object is built from other object types:

–– An array of Physical Port objectsAn array of Physical Port objects–– NodeInfo attributeNodeInfo attribute–– NodeDescription attributeNodeDescription attribute

Nodes “own” their corresponding Physical Port Nodes “own” their corresponding Physical Port objects. All other references to a Physical Port objects. All other references to a Physical Port point to a Physical Port inside a Node.point to a Physical Port inside a Node.

Page 15: Design Overview Template

15*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

Node ObjectsNode Objects

Physical Port NPhysical Port N……

Physical Port 1Physical Port 1Physical Port 0Physical Port 0

NodeInfo NodeInfo AttributeAttribute

NodeDescription NodeDescription AttributeAttribute

Physical Port NPhysical Port N……

Physical Port 1Physical Port 1Physical Port 0Physical Port 0

NodeInfo NodeInfo AttributeAttribute

NodeDescription NodeDescription AttributeAttribute

Node ObjectNode Object

Node ObjectNode Object

Node Objects are linked to their neighbors though their Physical Port Objects.

Page 16: Design Overview Template

16*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

OpenSM Port Object ModelOpenSM Port Object ModelThe object model for ports is somewhat complicated.The object model for ports is somewhat complicated.In switches, all ports share a common port GUID value, and In switches, all ports share a common port GUID value, and are thus logically grouped together. In all other node types, are thus logically grouped together. In all other node types, ports have a unique GUID value.ports have a unique GUID value.The OpenSM object model handles this by creating two The OpenSM object model handles this by creating two types of port objects: Physical Ports and Logical Ports.types of port objects: Physical Ports and Logical Ports.Physical Port objects are “owned” by their parent Node Physical Port objects are “owned” by their parent Node object, be it a switch, HCA, etc. and are oneobject, be it a switch, HCA, etc. and are one--forfor--one with one with physical ports on the subnet.physical ports on the subnet.Logical Port objects contain 1 or more Physical Port objects Logical Port objects contain 1 or more Physical Port objects that all share the same GUID value. Logical Port objects are that all share the same GUID value. Logical Port objects are oneone--toto--many with physical ports on a switch, and onemany with physical ports on a switch, and one--toto--one for physical ports on any other type of node.one for physical ports on any other type of node.

Page 17: Design Overview Template

17*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

Physical Port ObjectsPhysical Port ObjectsName: osm_physp_tName: osm_physp_tDefined in: osm_port.hDefined in: osm_port.h

Physical Port objects represent physical ports on Physical Port objects represent physical ports on the subnet.the subnet.A Physical Port object points to its neighbor A Physical Port object points to its neighbor Physical Port object on the other end of the wire.Physical Port object on the other end of the wire.A Physical Port object also contains a pointer to A Physical Port object also contains a pointer to its parent node object.its parent node object.These two pointers allow functions to traverse the These two pointers allow functions to traverse the subnet object in a manner that mirrors the subnet object in a manner that mirrors the topology of the physical subnet.topology of the physical subnet.

Page 18: Design Overview Template

18*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

Logical Port ObjectsLogical Port ObjectsName: osm_port_tName: osm_port_tDefined in: osm_port.hDefined in: osm_port.h

A Port object represents one or more physical A Port object represents one or more physical ports that share the same GUID.ports that share the same GUID.Port objects are “owned” by the Subnet object.Port objects are “owned” by the Subnet object.The Port object contains:The Port object contains:

–– An array of pointers to the corresponding Physical Port An array of pointers to the corresponding Physical Port objects.objects.

–– The GUID value of this port.The GUID value of this port.–– A pointer to the Node object that owns the ports.A pointer to the Node object that owns the ports.

Page 19: Design Overview Template

19*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

Port Object DetailsPort Object DetailsPhysical Port ObjectsPhysical Port Objects

num_portsnum_ports

p_nodep_node

guidguidport_guidport_guidport_numport_num

p_remotep_remotep_nodep_node

port_infoport_info

Port ObjectPort Object

Port Objects are logical collections of Physical Port objects that share the same port GUID. Switches have many Physical Ports per GUID, while other nodes have only 1 Physical Port per GUID.

Page 20: Design Overview Template

20*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

Switch ObjectSwitch ObjectName: osm_switch_tName: osm_switch_tDefined in: osm_switch.hDefined in: osm_switch.h

A switch object represents a switch on the A switch object represents a switch on the subnet.subnet.A switch object is built from other object types:A switch object is built from other object types:

–– SwitchInfo attributeSwitchInfo attribute–– Pointer to its parent node objectPointer to its parent node object–– Switch forwarding table objectSwitch forwarding table object–– Lid Matrix objectLid Matrix object

Page 21: Design Overview Template

21*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

LID Matrix ObjectLID Matrix ObjectName: osm_lid_matrix_tName: osm_lid_matrix_tDefined in: osm_matrix.hDefined in: osm_matrix.h

The LID Matrix object contains the number of The LID Matrix object contains the number of hops to any LID value through any port in the hops to any LID value through any port in the parent switch.parent switch.The LID Matrix is structured like a two The LID Matrix is structured like a two dimensional array, with LID values on one axis, dimensional array, with LID values on one axis, and port numbers on the other.and port numbers on the other.The LID Matrix expands as new LIDs are added to The LID Matrix expands as new LIDs are added to the subnet.the subnet.The routing algorithms populate the hop count The routing algorithms populate the hop count values for each switch’s matrix.values for each switch’s matrix.Constructing a switch’s LID Matrix is a precursor Constructing a switch’s LID Matrix is a precursor step to building its forwarding table.step to building its forwarding table.

Page 22: Design Overview Template

22*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

Partition ObjectsPartition ObjectsPartition objects model the logical grouping of physical Partition objects model the logical grouping of physical ports in an IB partition.ports in an IB partition.Partition objects contain general information about that Partition objects contain general information about that partition and a pointer table to port objects in the partition.partition and a pointer table to port objects in the partition.

0x9220x9220x7890x7890x2030x203

PtrPtrPort Port GUIDGUID NN

NN

NN

NN

NN

SS

Partition NPartition N

Page 23: Design Overview Template

23*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

Partition Objects (continued)Partition Objects (continued)Pointers to partition objects are stored in a the Pointers to partition objects are stored in a the partition pointer table indexed by P_KEY value.partition pointer table indexed by P_KEY value.

0x7890x7890x4560x4560x1230x123

PtrPtrP_KEYP_KEY

Partition 0x789Partition 0x789

Partition 0x123Partition 0x123

Partition 0x456Partition 0x456

Page 24: Design Overview Template

24*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

Multicast GroupsMulticast GroupsA multicast group is identified by its LID in the range 0xC000A multicast group is identified by its LID in the range 0xC000--0xFFFE, which is assigned to all ports in that multicast group.0xFFFE, which is assigned to all ports in that multicast group.Multicast group objects look a lot like partition objects: They Multicast group objects look a lot like partition objects: They have have a pointer table to ports in the multicast group.a pointer table to ports in the multicast group.

0x7890x7890x4560x4560x1230x123

PtrPtrPort Port GUIDGUID

NN

NN

NN

NN

NN

SS

Multicast Group NMulticast Group N

Page 25: Design Overview Template

25*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

Multicast Groups (continued)Multicast Groups (continued)Pointers to multicast group objects are stored in a Pointers to multicast group objects are stored in a the multicast group object pointer table indexed the multicast group object pointer table indexed by MLID.by MLID.

0xE1000xE1000xD0B50xD0B50xC0010xC001

PtrPtrMLIDMLIDMulticast Group 0xD0B5Multicast Group 0xD0B5

Multicast Group 0xE100Multicast Group 0xE100

Multicast Group 0xC001Multicast Group 0xC001

Page 26: Design Overview Template

26*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

Path ObjectsPath ObjectsSM/SA must provide the information in the SM/SA must provide the information in the PathRecord attribute:PathRecord attribute:

–– Path MTUPath MTU–– RateRate–– Packet lifetimePacket lifetime–– Preference value relative to other paths between the two Preference value relative to other paths between the two

nodes in question.nodes in question.

There are too many paths in the fabric to There are too many paths in the fabric to precompute all the path records. In a fabric with precompute all the path records. In a fabric with N nodes, the number of paths is:N nodes, the number of paths is:

N(NN(N--1)2 1)2 LMCLMC

which grows large quickly!which grows large quickly!

Page 27: Design Overview Template

27*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

Handling PathsHandling PathsPath information must be determined on the fly by Path information must be determined on the fly by the SM/SA.the SM/SA.The SM/SA traverses the subnet object, including The SM/SA traverses the subnet object, including the local copies of the switch forwarding tables, the local copies of the switch forwarding tables, to accumulate path parameters in a path object.to accumulate path parameters in a path object.The path object is discarded once SA has The path object is discarded once SA has delivered the information to the requester.delivered the information to the requester.

Page 28: Design Overview Template

28*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

Multicast Spanning TreesMulticast Spanning TreesConsider a Consider a multicast group multicast group that consists of that consists of nodes A, B, C.nodes A, B, C.The multicast The multicast spanning tree spanning tree for this group is for this group is shown in red.shown in red.The spanning The spanning tree covers 3 tree covers 3 nodes, 3 nodes, 3 switches and 10 switches and 10 Physical ports.Physical ports.

SSNNSS

SS

SS

NN

NN NNNN

NN

NN

NN

RR

RR

NN

MasterBB

AACC

Page 29: Design Overview Template

29*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

Multicast LID RoutingMulticast LID RoutingMulticast LID routing through a subnet is Multicast LID routing through a subnet is configured as a single spanning tree, i.e. loops configured as a single spanning tree, i.e. loops are not allowed.are not allowed.Multicast forwarding tables in switches have Multicast forwarding tables in switches have limited size, so the spanning tree for each limited size, so the spanning tree for each multicast group should include only the minimum multicast group should include only the minimum number of switches.number of switches.Some multicast members can be “send only”Some multicast members can be “send only”Non members cannot send to the group.Non members cannot send to the group.

Page 30: Design Overview Template

30*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

Multicast Spanning TreesMulticast Spanning TreesA single subnet contains many spanning trees, A single subnet contains many spanning trees, one per multicast group.one per multicast group.The subnet object models the physical The subnet object models the physical interconnections between subnet components, interconnections between subnet components, whereas spanning trees are logical structures.whereas spanning trees are logical structures.Normal unicast routes can be considered trivial Normal unicast routes can be considered trivial spanning trees.spanning trees.Do spanning trees need first class treatment? Do spanning trees need first class treatment? Can they just be implicit somehow?Can they just be implicit somehow?

Page 31: Design Overview Template

31*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

Multicast Group CreationMulticast Group Creation

Start with the set of ports belonging to the group.Start with the set of ports belonging to the group.Build the optimal spanning tree by traversing the Build the optimal spanning tree by traversing the subnet object.subnet object.Convert the spanning tree into forwarding table Convert the spanning tree into forwarding table entries for the affected switches.entries for the affected switches.Configure the switches.Configure the switches.The same process works for both unicast and The same process works for both unicast and multicast routing.multicast routing.

Page 32: Design Overview Template

32*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

OpenSM Methods & AlgorithmsOpenSM Methods & Algorithms

Page 33: Design Overview Template

33*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

OpenSM Object MethodsOpenSM Object MethodsSince we’re using ‘C’, we don’t have the concept Since we’re using ‘C’, we don’t have the concept of a of a classclass with member functions.with member functions.The common workThe common work--around is to have pointers to around is to have pointers to functions in a ‘C’ functions in a ‘C’ struct. struct. Unfortunately, these Unfortunately, these functions are never made inline by the compiler.functions are never made inline by the compiler.The OpenSM approach will be to use a ‘C’ The OpenSM approach will be to use a ‘C’ structstructwith separately defined functions (and hence able with separately defined functions (and hence able to inline) that operate on that structure, as done in to inline) that operate on that structure, as done in the component library.the component library.This method offers higher performance at the This method offers higher performance at the expense of wordy function names.expense of wordy function names.

Page 34: Design Overview Template

34*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

OpenSM Method StyleOpenSM Method StyleFollows the Component Library style.Follows the Component Library style.OpenSM methods use the following form:OpenSM methods use the following form:

osm_<object type>_rest_of_nameosm_<object type>_rest_of_name

For example:For example:ib_net16_tib_net16_t

osm_port_get_base_lid(osm_port_get_base_lid(IN const osm_port_t* const p_port );IN const osm_port_t* const p_port );

ib_net16_tib_net16_tosm_switch_get_base_lid(osm_switch_get_base_lid(

IN const osm_switch_t* const p_sw );IN const osm_switch_t* const p_sw );

Page 35: Design Overview Template

35*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

Paths & Switch ForwardingPaths & Switch Forwarding

SSNNSS

SS

SS

NN

NN NNNN

NN

NN

NN

RR

RR

NN

MasterBB

AA

Consider the routes Consider the routes through the subnet through the subnet from A to B.from A to B.There are two There are two possible paths:possible paths:A A –– 1 1 –– 4 4 –– BBandandA A –– 1 1 –– 2 2 –– 3 3 –– 4 4 –– BBObviously, we’d Obviously, we’d like to take like to take advantage of the advantage of the shorter path!shorter path!

1

2

3

4

Page 36: Design Overview Template

36*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

IB LID Routing BasicsIB LID Routing BasicsLeaf nodes are trivial.Leaf nodes are trivial.Switches with only 1 interSwitches with only 1 inter--switch link are also trivial to switch link are also trivial to handle.handle.Switches with multiple interSwitches with multiple inter--switch links to the same switch switch links to the same switch can have statically load balanced paths.can have statically load balanced paths.All the interesting action pertains to switches with interAll the interesting action pertains to switches with inter--switch links to more than one other switch.switch links to more than one other switch.Switches with a random forwarding table have a “default Switches with a random forwarding table have a “default route” like in Fibre Channel switching.route” like in Fibre Channel switching.Switches with a linear forwarding table do not have a default Switches with a linear forwarding table do not have a default route. Thus, a route for route. Thus, a route for everyevery LID in the subnet must be LID in the subnet must be programmed into the switch’s routing table.programmed into the switch’s routing table.

Page 37: Design Overview Template

37*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

OpenSM Subnet Path StrategyOpenSM Subnet Path StrategyOpenSM hopOpenSM hop--optimizes subnet routing.optimizes subnet routing.

–– Routes to from any two nodes traverse the minimum Routes to from any two nodes traverse the minimum number of switches. number of switches.

–– Given a choice of equal hopGiven a choice of equal hop--count paths, OpenSM count paths, OpenSM performs static load balancing using a simple weighted performs static load balancing using a simple weighted roundround--robin scheme. Different weights are selected for robin scheme. Different weights are selected for 1x, 4x and 12x links.1x, 4x and 12x links.

LMC > 0 paths are also distributed using a LMC > 0 paths are also distributed using a weighted roundweighted round--robin scheme.robin scheme.

Page 38: Design Overview Template

38*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

Handling Path RecordsHandling Path RecordsPath information must be determined on the fly by Path information must be determined on the fly by the SM/SA.the SM/SA.The SM/SA traverses the subnet object, including The SM/SA traverses the subnet object, including the local copies of the switch forwarding tables, the local copies of the switch forwarding tables, to accumulate path parameters.to accumulate path parameters.

Page 39: Design Overview Template

39*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

OpenSM Subnet Configuration OverviewOpenSM Subnet Configuration Overview

The process is asynchronous. As response The process is asynchronous. As response MADs arrive, OpenSM processes them and may MADs arrive, OpenSM processes them and may initiate further probing.initiate further probing.OpenSM throttles DROpenSM throttles DR--MAD traffic so as to not MAD traffic so as to not overload the VL15 buffering capability of the overload the VL15 buffering capability of the switches.switches.The various parts of the discovery algorithm are The various parts of the discovery algorithm are divided into relatively simple Dispatcher divided into relatively simple Dispatcher “controllers”.“controllers”.Initial discovery and subsequent sweeps share Initial discovery and subsequent sweeps share common code where possible.common code where possible.

Page 40: Design Overview Template

40*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

OpenSM Subnet ConfigurationOpenSM Subnet ConfigurationOpenSM performs a multiOpenSM performs a multi--stage subnet stage subnet configuration:configuration:1.1. Discovery & subnet inventoryDiscovery & subnet inventory2.2. LID assignmentLID assignment3.3. Configure each switch’s LID Matrix for local Configure each switch’s LID Matrix for local

leaf nodes.leaf nodes.4.4. Configure each switch’s LID Matrix for Configure each switch’s LID Matrix for

neighbor’s leaf nodes.neighbor’s leaf nodes.5.5. Configure each switch’s LID Matrix for Configure each switch’s LID Matrix for

successively more remote nodes.successively more remote nodes.6.6. Build and download switch forwarding tables.Build and download switch forwarding tables.7.7. Bring ports to ARMED state as necessary.Bring ports to ARMED state as necessary.8.8. Bring ports to ACTIVE state as necessary.Bring ports to ACTIVE state as necessary.

Page 41: Design Overview Template

41*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

Subnet Discovery & InventorySubnet Discovery & InventoryObjects representing subnet components are Objects representing subnet components are created on the fly as OpenSM discovers each created on the fly as OpenSM discovers each component.component.The OpenSM architecture is based on a central The OpenSM architecture is based on a central asynchronous Dispatcher model.asynchronous Dispatcher model.‘Controllers’ around the Dispatcher perform the ‘Controllers’ around the Dispatcher perform the actual processing. The processing performed by actual processing. The processing performed by any one Controller is narrow in scope.any one Controller is narrow in scope.The Dispatcher uses its worker threads to call the The Dispatcher uses its worker threads to call the processing function of each Controller.processing function of each Controller.

Page 42: Design Overview Template

42*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

Subnet Discovery & InventorySubnet Discovery & Inventory

AssignmentController

AssignmentController

SSNN SS SSSS

NN

NN NN NN

NNNN

NNRR

RR

NN

AttrController

AssignmentController

AssignmentController

Attribute RcvController

Attribute RcvController

Controller

Subnet ObjectSubnet Object

VL15 Outbound

Queue

Controller

SerializedAccess

SerializedAccess

Attribute RcvController

Attribute RcvController

Attribute RcvAttribute RcvController

cl_dispatcher

ibute RequestAttribute RequestController

cl_dispatcher

Inbound SMPInbound SMPController

Page 43: Design Overview Template

43*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

LID AssignmentLID AssignmentOpenSM attempts to preserve existing LID OpenSM attempts to preserve existing LID assignments when possible.assignments when possible.Base LID assignments on the subnet may follow Base LID assignments on the subnet may follow predictable patterns in practice, but OpenSM predictable patterns in practice, but OpenSM treats every base LID as an arbitrary value.treats every base LID as an arbitrary value.For example:For example:

–– Switches are not assigned dedicated LID ranges.Switches are not assigned dedicated LID ranges.–– The base LIDs of leaf nodes on a switch may be The base LIDs of leaf nodes on a switch may be

discontiguousdiscontiguous

OpenSM assigns consecutive base LIDs when OpenSM assigns consecutive base LIDs when possible.possible.

–– Helps minimizes the number of switch forwarding table Helps minimizes the number of switch forwarding table blocks needed to hold all possible routes.blocks needed to hold all possible routes.

Page 44: Design Overview Template

44*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

Multicast Group CreationMulticast Group Creation

Start with the set of ports belonging to the group.Start with the set of ports belonging to the group.Build the optimal spanning tree by traversing the Build the optimal spanning tree by traversing the subnet object.subnet object.Convert the spanning tree into forwarding table Convert the spanning tree into forwarding table entries for the affected switches.entries for the affected switches.Configure the switches.Configure the switches.The same process works for both unicast and The same process works for both unicast and multicast routing.multicast routing.

Page 45: Design Overview Template

45*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

The Discovery ProcessThe Discovery ProcessDiscovery starts by requesting the NodeInfo Discovery starts by requesting the NodeInfo attribute of the node at “hop 0” which is the SM’s attribute of the node at “hop 0” which is the SM’s own HCA.own HCA.The OpenSM object’s “sweep” method initiates The OpenSM object’s “sweep” method initiates this NodeInfo Request.this NodeInfo Request.Once started at hop 0, the discovery process is a Once started at hop 0, the discovery process is a selfself--sustaining “chain reaction” the proceeds sustaining “chain reaction” the proceeds until all nodes and ports have been discovered.until all nodes and ports have been discovered.The GUID Tables are populated during the The GUID Tables are populated during the discovery process.discovery process.

Page 46: Design Overview Template

46*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

Discovery CompletionDiscovery CompletionThe discovery process is multiThe discovery process is multi--threaded, thus threaded, thus nodes may be discovered in different orders from nodes may be discovered in different orders from one sweep to the next.one sweep to the next.Discovery is complete when the OpenSM Discovery is complete when the OpenSM statistics block indicates there are no outstanding statistics block indicates there are no outstanding QP0 MADs. The SM Mad Controller checks for QP0 MADs. The SM Mad Controller checks for this condition each time it retires a QP0 MAD to this condition each time it retires a QP0 MAD to the MAD pool.the MAD pool.Once all QP0 MADs are off the subnet, the SM Once all QP0 MADs are off the subnet, the SM Mad Controller posts a message to the Mad Controller posts a message to the Dispatcher. This message causes the State Dispatcher. This message causes the State Manager to begin processing the discovered Manager to begin processing the discovered nodes.nodes.

Page 47: Design Overview Template

47*All brand and product names are trademarks or registered trademarks of their respective companies.Copyright© Intel Corporation 2002, All rights reserved.

LID ConfigurationLID ConfigurationThe LID Manager walks the container of The LID Manager walks the container of discovered Port objects and compares their discovered Port objects and compares their reported LID values with the Subnet Object’s LID reported LID values with the Subnet Object’s LID table.table.The possible cases are:The possible cases are:

–– The Port reports the same LID range that OpenSM The Port reports the same LID range that OpenSM believes it owns.believes it owns.

–– The Port reports a LID range that overlaps the LID range The Port reports a LID range that overlaps the LID range owned by another Port.owned by another Port.

–– The port reports a LID range that OpenSM thought was The port reports a LID range that OpenSM thought was unoccupied.unoccupied.

–– The port does not yet have a LID range.The port does not yet have a LID range.