data center logical table apis · 2019-04-23 · broadcom’s new sdklt responds to the needs of...

11
White Paper Broadcom Confidential January 29, 2018 1 Introduction Modern mass-scale and hyper-scale data centers are comprised of complex networks of switches and servers. The reduction in Opex is driving operators to find ways to achieve network efficiency. The operators closely adhere to the ISO Management model of fault, configuration, accounting, performance, and security management through automation enabled by network management software. The network controllers have to manage many functions, thus simplifying the underlying software will improve controller efficiency and reliability. To this end, Broadcom is offering table-based programming with a simple software interface. This white-paper explains how logical table APIs in the SDKLT (Logical Table-Based Switch Development Kit) benefits the performance, scalability, and maintainability networking software in a data center. 1.1 Challenges for Networking Software in Data Centers The network control management layer manages normal configuration, day-to-day administration, and unexpected behavior. Network efficiency, fault management, and maintainability are important factors to consider when creating a data center software management system. Network controllers in data centers, such as SDN controllers have complex tasks and handle heavy workloads. See Figure 1, Typical Data Center Network View. The performance of the controllers is critical to regularly deal with network exceptions, orchestration, automation, equipment upgrade and visibility. Based on the information updates the system gets from the network of switches, the network management system configures and controls the network of switching and routing nodes in its domain. As traffic patterns change, the network management controller runs algorithms to best manage the traffic and update the network. Two critical functions for controllers are gathering the resource profiles of the participating switch nodes, and deriving the information stored into the logical table view of the network. This information is algorithmically processed by the controller clusters if needed, and the nodes are updated at specified times. The ability to get the data reliably and fast is critical. Controllers constantly deal with unexpected behaviors, changing scenarios, policy changes, outages and traffic overload challenges. The network load is not constant due to variations caused by application usage, and is dependent on application controllers. The load and traffic profiles do change. Fault management relies on the ability to quickly move traffic away from the defective nodes, with help routing updates to the switches. Traffic re-routing in large data centers is required occasionally to mitigate outages due to defective equipment traffic. In such cases the response must be immediate Data Center Logical Table APIs Benefits of Logical Table APIs in the Modern Data Center

Upload: others

Post on 30-May-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Center Logical Table APIs · 2019-04-23 · Broadcom’s new SDKLT responds to the needs of the next generation data centers. The SDKLT provides a simple, table-based method

White Paper

Broadcom Confidential January 29, 2018

1 Introduction

Modern mass-scale and hyper-scale data centers are comprised of complex networks of switches and servers. The reduction in Opex is driving operators to find ways to achieve network efficiency. The operators closely adhere to the ISO Management model of fault, configuration, accounting, performance, and security management through automation enabled by network management software. The network controllers have to manage many functions, thus simplifying the underlying software will improve controller efficiency and reliability. To this end, Broadcom is offering table-based programming with a simple software interface.

This white-paper explains how logical table APIs in the SDKLT (Logical Table-Based Switch Development Kit) benefits the performance, scalability, and maintainability networking software in a data center.

1.1 Challenges for Networking Software in Data Centers

The network control management layer manages normal configuration, day-to-day administration, and unexpected behavior. Network efficiency, fault management, and maintainability are important factors to consider when creating a data center software management system.

Network controllers in data centers, such as SDN controllers have complex tasks and handle heavy workloads. See Figure 1, Typical Data Center Network View. The performance of the controllers is critical to regularly deal with network exceptions, orchestration, automation, equipment upgrade and visibility. Based on the

information updates the system gets from the network of switches, the network management system configures and controls the network of switching and routing nodes in its domain. As traffic patterns change, the network management controller runs algorithms to best manage the traffic and update the network.

Two critical functions for controllers are gathering the resource profiles of the participating switch nodes, and deriving the information stored into the logical table view of the network. This information is algorithmically processed by the controller clusters if needed, and the nodes are updated at specified times. The ability to get the data reliably and fast is critical.

Controllers constantly deal with unexpected behaviors, changing scenarios, policy changes, outages and traffic overload challenges. The network load is not constant due to variations caused by application usage, and is dependent on application controllers. The load and traffic profiles do change. Fault management relies on the ability to quickly move traffic away from the defective nodes, with help routing updates to the switches. Traffic re-routing in large data centers is required occasionally to mitigate outages due to defective equipment traffic. In such cases the response must be immediate

Data Center Logical Table APIs Benefits of Logical Table APIs in the Modern Data Center

Page 2: Data Center Logical Table APIs · 2019-04-23 · Broadcom’s new SDKLT responds to the needs of the next generation data centers. The SDKLT provides a simple, table-based method

Broadcom Confidential LT-API-MDC--WP1002

Data Center Logical Table APIs White Paper Benefits of Logical Table APIs in the Modern Data Center

Figure 1: Typical Data Center Network View

1.2 How These Challenges are Met

Reliable software performance is measured by how quickly the updates can be passed onto the switch nodes from the time an event happens. This requires:

A minimum number of software layers in the system stack between the controller and the switch.

The ability to enable applications to quickly monitor, directly access, configure, and leverage the switch resources.

Reducing the networking management software layer and simplifying the APIs, offers many benefits including easy and reliable deployment, excellent performance, lower maintenance, and longer uptime. See Figure 2, A Typical Data Center Software Stack, and Figure 3, Network Management Stack using Broadcom's SDKLT.

Figure 2: A Typical Data Center Software Stack

Page 3: Data Center Logical Table APIs · 2019-04-23 · Broadcom’s new SDKLT responds to the needs of the next generation data centers. The SDKLT provides a simple, table-based method

Broadcom Confidential LT-API-MDC--WP1003

Data Center Logical Table APIs White Paper Benefits of Logical Table APIs in the Modern Data Center

The software development challenges in a data center as mentioned above needs to be supported at several levels:

Collaboration amongst third-party vendors helps to create an overall solution that is robust and helps accelerate deployment.

Offering simple interfaces (APIs) helps with the development work and inter-op testing.

The interworking, and compatibility can be solved by having an open development community for software in the data center.

Collaborators can work with each others’ code and mitigate compatibility issues.

Figure 3: Network Management Stack using Broadcom's SDKLT

1.3 How Switch Software Plays in the Stack

The network management controller shown in Figure 2, A Typical Data Center Software Stack, interfaces with the local controller and network management middleware that communicates with the NOS control plane stack. Figure 3, Network Management Stack using Broadcom's SDKLT illustrates a switch software interface that has effective performance. This means that it:

is simple to use.

supports fast updates.

has the ability to verify the updates.

is extensible.

is opaque to switch pipeline changes.

helps in fault isolation.

Table-based programming is the core of a data-driven programming model which permits simplicity to be brought into switch software APIs. Table-based programming gives APIs and logical tables help in achieving the required direct and fast access to resources to update and verify updates. Table-based programming is explained in the next section.

Even though functional APIs can somewhat achieve these capabilities, the APIs are operationally heavy, hard to debug, and cannot offer the granularity and control needed by the network management system.

Page 4: Data Center Logical Table APIs · 2019-04-23 · Broadcom’s new SDKLT responds to the needs of the next generation data centers. The SDKLT provides a simple, table-based method

Broadcom Confidential LT-API-MDC--WP1004

Data Center Logical Table APIs White Paper Benefits of Logical Table APIs in the Modern Data Center

2 SDKLT Meets the Challenges

Figure 4: SDKLT Block Diagram

Broadcom’s new SDKLT responds to the needs of the next generation data centers. The SDKLT provides a simple, table-based method that offers APIs and logical tables the ability to program the switches, while offering direct access to chip resources. See Figure 5 on the next page. The SDKLT offers high quality, and a maintainable switch development kit that can scale as new devices are added to the system. The logical table APIs can be batched and sequenced asynchronously or run in synchronous mode. See Figure 4, SDKLT Block Diagram. This new chip programming method improves software performance and allows developers to batch bulk read-writes efficiently, into atomic update, thus improving programming performance. The logical tables represent physical tables/registers that are in the chip. The resource manager, logical table manager, and physical table manager work together to provide a simple but efficient switch development kit.

The logical table through a data-driven approach enables a robust Warmboot and ISSU scheme, employing a high-availability (HA) database. All of the operations are based on table programming, the high availability database stores information on a play-by-play basis. Thus the system manager can replay programming of the hardware after a restart or moving to a new SDK.

Page 5: Data Center Logical Table APIs · 2019-04-23 · Broadcom’s new SDKLT responds to the needs of the next generation data centers. The SDKLT provides a simple, table-based method

Broadcom Confidential LT-API-MDC--WP1005

Data Center Logical Table APIs White Paper Benefits of Logical Table APIs in the Modern Data Center

Figure 5: High-Level Overview of Logical Table Interfaces

2.1 SDKLT Benefits, Logical Tables, and LT APIs in the Data Center

An SDKLT based on logical table programming has been developed to respond to the data center software challenges. Figure 6, Table-based Programming with Logical Tables illustrates what a logical table looks like. It has fields, individual rows of entries, and multiple rows of entries. Programming each entry or field sequentially populates the table.

Figure 6: Table-based Programming with Logical Tables

The APIs used to program the tables are few and easy to use and updates are quick and reliable. There are three classes of LT-APIs:

Entry LT-APIs: commit, add, update, lookup or delete the row and the fields.

Table LT-APIs: obtain information and traverse the table.

Transaction LT-APIs: set operation modes.

In the data center, performance and resource management are very critical. When traffic profiles change, these layers of spine and leaf networks must be configured and monitored from a central network controller within a short window. The logical table (LT) architecture provides the ability to quickly push new profiles down to switches and verify them.

Page 6: Data Center Logical Table APIs · 2019-04-23 · Broadcom’s new SDKLT responds to the needs of the next generation data centers. The SDKLT provides a simple, table-based method

Broadcom Confidential LT-API-MDC--WP1006

Data Center Logical Table APIs White Paper Benefits of Logical Table APIs in the Modern Data Center

Here are some of the benefits of using SDKLT for Broadcom switches that match well with data center networking software:

Data-driven architecture offering table-driven programming.

Higher performance in many areas including Packet I/O and API performance.

Usage flexibility in choosing mode of API transactions.

End-user experience with simple and easy-to-use APIs.

Monitoring the resources and verification of the updates.

Open sourced, therefore can be freely used by the third-party collaborators in software development.

The SDKLT provides Logical Table APIs that have been developed to program the logical tables with simplicity and results in much better performance than comparable traditional functional APIs.

2.2 SDKLT Features

How the logical table APIs work with data center software architecture is detailed in this section. These features improve operational efficiency and reduce Opex.

Simple and Consistent Set of Logical Table APIs

Data center network configuration is complex. The simpler the interface, the better it is from a programming and task management perspective. The developers need to work with simple APIs that can be used repeatedly or in a structured manner.

SDKLT offers a simple set of APIs which helps in fast development of robust and bug-free code. The small set APIs are based on five primitives: insert, lookup, update, delete, and traverse. These APIs work on logical tables; the signatures of which are specified by logical table definition. Logical table APIs can operate on an entry, a field, a table, or a transaction. Transactions can be set as batch for multiple entries or optionally as atomic. The logical table definition maps the physical tables entries, the fields, and the tables themselves.

The logical tables implementing the network functions are designed for programming efficiency and are optimized for data center features. Developers can create features and program their applications to populate the logical tables through API usage. The common terminology of the logic tables between the hardware and software contributes to ease of implementation. See Figure 6, Table-based Programming with Logical Tables. Logical table APIs are a consistent set and can be applied in a similar manner across features and devices.

Logical tables are modeled in software and mask any physical table writes. There are also transactional capabilities in SDKLT that allow batching of the APIs invocation to access the logical tables. As a result, users will see useful responses in both table updates and lookups.

The simplicity of the APIs leads to other advantages including accurate auto-generated documentation that is well tested. These attributes make it easier to implement and deploy networking operating system software in a data center.

Resource Monitoring, Reservation, and Control

Data center network controllers need an up-to-date view of the device resources and the ability to control their usage. Resource monitoring is an important requirement for networking operating systems. The table-based programming method enables implementation of a resource manager that can interact with the users through call-backs. This resource manager can monitor resource usage and also allow users to limit and carve up the table resources to conserve physical memory space: a very important feature for a data center use case.

Page 7: Data Center Logical Table APIs · 2019-04-23 · Broadcom’s new SDKLT responds to the needs of the next generation data centers. The SDKLT provides a simple, table-based method

Broadcom Confidential LT-API-MDC--WP1007

Data Center Logical Table APIs White Paper Benefits of Logical Table APIs in the Modern Data Center

It is important to know the status of chip resources so that when new service requirements or parameters are pushed into the system it is done with awareness of the existing resources. The system provides a notification of the resources through callbacks which is extremely beneficial in a data center operational environment. Resource managers also provide the capability to reserve the resources.

As the logic table entries are pushed into the chip using any of the various operational modes the physical table manager keeps track of the resource usage and flags if the resources are used beyond the limits set by the developer.

Resource monitoring through callbacks and resource reservation helps the data center manager tune their switch workload to optimize efficiency.

High-Packet I/O Performance

In a data center the latency between control request and response should be kept to aimum. Packet I/O performance performance plays a critical role in shortening this latency. Most of the physical tables are software modeled thus the table operations to the physical memory are buffered allowing faster writes that are posted into the buffer. The logical table manager can continue to insert more table entries while the physical table manager takes care of the slower writes to the actual device. See Figure 4, SDKLT Block Diagram. This is one of the factors that improves the packet I/O operation.

Packet I/O can be run in two ways:

1. Core network packet DMA or I/O which is called CNET.

2. Kernel Network I/O drivers called KNET.

The user has the choice of using either method. KNET provides faster packet I/O since it directly involves the kernel packet I/O services.

High Performance

Data center network management handles many tasks. For operational efficiency, data center network management software should not be blocked, waiting for the results of one operation before moving on to the next. SDKLT capabilities enable such operational efficiency.

SDKLT provides features that can manage table entry operations in different ways. The table-entry operation can be combined into single transaction which then can be sequenced in several modes that can help lower the latency and wait times for the CPU. See Figure 7, Asynchronous Operations in SDKLT. These operation modes are listed below:

Asynchronous and synchronous operations

Simple batch or atomic operations

Asynchronous operation allows the developers or users to submit an API operation and request a call back when the operation is complete. This is a very useful feature in a data center environment since it allows the network controller to move on to other tasks without waiting for completion of the operation.

Page 8: Data Center Logical Table APIs · 2019-04-23 · Broadcom’s new SDKLT responds to the needs of the next generation data centers. The SDKLT provides a simple, table-based method

Broadcom Confidential LT-API-MDC--WP1008

Data Center Logical Table APIs White Paper Benefits of Logical Table APIs in the Modern Data Center

Figure 7: Asynchronous Operations in SDKLT

A set of asynchronous operations can be sequenced in a simple or atomic operation. An atomic operation in asynchronous or synchronous mode allows the table updates to be sent at one time. During an atomic operation if any of the entries sequence fail to complete due to lack of resources, the operation will fail and the transaction is rolled back. (i.e., the state of the tables will roll back to the previous state.) Users do not have to re-initialize the device to the previous state.

In a data center, table updates to switches happen frequently and are expected to complete quickly. Running the atomic or batch operations concurrently on multiple units is possible if carefully architected. This results in faster updates. Figure 8, Centralized Controller Connected to a Switch Network and Figure 9, Atomic and Non-Atomic Operation Across Network Switches offer examples of how controllers can push staggered updates to the switches in the network taking advantage of the batched transactions.

In a data center setting, the scheme of putting bulk writes into an atomic transaction enables an SDN controller to do a single update when it pushes a completely new logical table layout to the device.

Page 9: Data Center Logical Table APIs · 2019-04-23 · Broadcom’s new SDKLT responds to the needs of the next generation data centers. The SDKLT provides a simple, table-based method

Broadcom Confidential LT-API-MDC--WP1009

Data Center Logical Table APIs White Paper Benefits of Logical Table APIs in the Modern Data Center

Figure 8: Centralized Controller Connected to a Switch Network

Figure 9: Atomic and Non-Atomic Operation Across Network Switches

High Quality Test Coverage

The simplicity and consistency of the logic table APIs allow better controllability and observability while testing logical table APIs. The logical table-based APIs make it much easier to automate the testing and run more test cases. SDKLT uses an automated framework for functional and performance validation of the LT-APIs, resulting in high-quality test coverage.

Page 10: Data Center Logical Table APIs · 2019-04-23 · Broadcom’s new SDKLT responds to the needs of the next generation data centers. The SDKLT provides a simple, table-based method

Broadcom Confidential LT-API-MDC--WP10010

Data Center Logical Table APIs White Paper Benefits of Logical Table APIs in the Modern Data Center

CLI Development and Debug-ability

One of the key features developers value is the debug capability in an SDK for bug tracing and error messages.

SDKLT provides extensive debug and online development capabilities with the following features:

Diag-shell access to all logical and physical tables. The new diag-shell has been designed to provide logical and physical table information. A list of active tables can be called out by wildcards, by index, and keys. Grep has been implemented in the shell to allow filtering of results.

The Diag-shell along with C-interpreter enables developers to develop LT APIs entries in CLI mode and control a full switch. C-level interfaces can be run inside the shell to create scripts to aid development.

Complete and understandable error messages.

Fine-grained control of debug levels and verbosity across modules. The debug levels allow users to control how much information they need to see.

Action replay. Users can enable the replay of an API sequence. This dump file can be replayed and used for debug.

Event logging and event history.

Common error handlers with relevant debug information (backtrace, etc.).

Accurate Auto-generated Documentation

The documentation of the logical tables APIs is auto-generated through Doxygen. The output is provided in a web-based HTML format. This document is searchable by using table names, fields, and description of tables. Logical table documentation in Doxygen is relevant to the device for these reasons:

Lists logical tables per feature

Lists logical table and its fields

Shows logical-to-physical table/register mappings

Warmboot and ISSU

Data centers run non-stop and thus have to be highly reliable. Having a network system failure on the switch node can be disastrous, so it is important to quickly recover without any disruption in the traffic. Recovering from a failure or being able to do an in-service software upgrade is a highly desired feature. The table-based programming methodology is part of data-driven programming model of the SDKLT. The SDKLT supports Warmboot and ISSU. One of the benefits of the programming model is that all operation results and table states are stored in an HA database. The goal of the high availability feature is to keep the hardware state and software state aligned. The NVM memory is usually reserved for the HA database.

If a crash or a planned shutdown occurs after a transaction is committed and acknowledged, the transaction is recovered from the HA database.

Physical table cache, transactions, committed operations, index table allocated entries, and in-memory tables are all kept in the HA memory (database).

3 Conclusion

The SDKLT architecture is a revolutionary next generation SDK architecture that enables data center developers to quickly

deploy their switch products. This architecture provides high quality, ease of programming, and performance and reliable

ISSU and Warmboot essential for data centers. The logical table APIs are a product of the table-based programming which

makes the software applications data-driven. This is a significant step in creating reliable products that can be quickly

deployed in the data center.

Page 11: Data Center Logical Table APIs · 2019-04-23 · Broadcom’s new SDKLT responds to the needs of the next generation data centers. The SDKLT provides a simple, table-based method

Broadcom, the pulse logo, Connecting everything, Avago Technologies, Avago, and the A logo are among the trademarks of Broadcom and/or its affiliates in the United States, certain other countries and/or the EU.

Copyright © 2018 by Broadcom. All Rights Reserved.

The term “Broadcom” refers to Broadcom Limited and/or its subsidiaries. For more information, please visit www.broadcom.com.

Broadcom reserves the right to make changes without further notice to any products or data herein to improve reliability, function, or design. Information furnished by Broadcom is believed to be accurate and reliable. However, Broadcom does not assume any liability arising out of the application or use of this information, nor the application or use of any product or circuit described herein, neither does it convey any license under its patent rights nor the rights of others.