chapter 8 hardware conventional computer hardware architecture

Chapter 8Hardware Conventional Computer

Hardware Architecture

Outline

The Traditional Software Router Measures Of Speed Fine-grain parallelism Symmetric coarse-grain parallelism Asymmetric coarse-grain parallelism Special-purpose coprocessors NICs with onboard processing Smart NICs with onboard stacks Cell switching Data pipelines

The Traditional Software Router

The hardware architecture used with a software-based network system

The CPU handles all protocol processing tasks except for framing and onboard address recognition

framing &address

recognition

framing &address

recognition

all otherprocessing

Standard CPU NIC2NIC1

Two Measures Of Speed

Data rate (bits per second)– Per interface rate– Aggregate rate

Packet rate (packets per second)– Per interface rate– Aggregate rate

Processing Speed For Two Reasons

A router must be able to handle packets as they arrive from a given network, the processing speed determines the maximum data rate of a network that can be attached to the router

A router must be able to handle packets arriving from multiple networks, the processing speed limits the possible topologies with which the router can be used

Aggregate Data Rate

Total rate at which data can arrive or leave a network system

The maximum aggregate data rate of a system is important because it limits the type and numbers of networks connections the system can handle

Aggregate Packet Rate

For protocol processing tasks that have a fixed cost per packet, the number of packets processed is more important than the aggregate data rate

How many packets arrive per second over a network– Depends on the network’s throughput rate and

the size of the packets

Digital Circuit Speeds

Technology Network Packet Rate Packet Rate Data Rate For small Packets For large Packets In Gbps In Kpps In Kpps

10Base-T 0.010 19.5 0.8100Base-T 0.100 195.3 8.2OC-3 0.156 303.8 12.8OC-12 0.622 1,214.8 51.21000Base-T 1.000 1,953.1 82.3OC-48 2.488 4,860.0 204.9OC-192 9.953 19,440.0 819.6OC-768 39.813 77,760.0 3,278.4

Key concept: maximum packet rate occurs with minimum-size packets

Bar Chart Of Example Packet Rates

Gray areas show rates for large packets

Packet Rate And Software Router Feasibility

The exact rate depends on the CPU speeds, bus bandwidth, and memory latency as well as the amount of processing

The amount of processing required depends on the packet content

Software running on a general-purpose processor is an insufficient architecture to handle high-speed networks because the aggregate packet rate exceeds the capabilities of current CPUs

Maximum per-packet processing time in microseconds of small and large packets for various technologies

Technology Time per Packet Time per Packet For small Packets For large Packets (In μs) (In μs)

10Base-T 51.20 1,214.40 100Base-T 5.12 121.44OC-3 3.29 78.09OC-12 0.82 19.521000Base-T 0.51 12.14OC-48 0.21 4.88OC-192 0.05 1.22OC-768 0.01 0.31

Possible Ways To SolveThe CPU Bottleneck

Fine-grain parallelism Symmetric coarse-grain parallelism Asymmetric coarse-grain parallelism Special-purpose coprocessors NICs with onboard processing Smart NICs with onboard stacks Cell switching Data pipelines

Fine-Grain Parallelism (Instruction-Level Parallelism)

Multiple CPU to work together Instruction-level parallelism does not achieve

significantly higher performance– Few packet processing functions are amenable to

fine-grain optimization– A program must spend time setting up the parallel

instructions– Only improves CPU performance– Expensive

Symmetric Coarse-Grain Parallelism

Offer a set of N identical CPUs Advantages

– Network system designers did not need to invent new symmetric multiprocessor hardware

– Vender had ported a conventional Unix operating system to their multiprocessor hardware Familiar

Processing Capability

Processing capability does not scale linearly as the number of processors increases– Most multiprocessor systems use a shared

memory paradigm where all processors share a kernel address space

– Packet processing software must coordinate access to data structure such as packet queues

– A multiprocessor architecture does not automatically increase the I/O bandwidth

Asymmetric Coarse-Grain Parallelism

Uses multiple, heterogeneous processor that can operate simultaneously

The Advantage arises from the ability to specialize– Each processor in an Asymmetric system can be optimized

for a specific task Drawbacks

– Need general-purpose instructions– Difficult to program– May not perform well for a specific task or a specific

protocol– Expensive to design and build

Special-Purpose Coprocessors

Coprocessors ： an architecture that contains a general-purpose CPU plus one or more special-purpose processor

Each coprocessor is designed to perform a specific function all coprocessors function under of the CPU

The chief advantage lies in the freedom it gives a designer

It can also be a small logic circuit that performs one operation does not need general-purpose instructions, and does not need a fetch-execute cycle

Special-Purpose Coprocessors (con’t)

A coprocessor is a piece of hardware that operates under control of the CPU

A processor need not be sophisticated; the coprocessor only need to perform on specific task

To optimize computation, move operations that account for the most CPU time from software into hardware

ASIC Coprocessor Implementation

Application Specific Integrated Circuit (ASIC) refers to an integrated circuit (IC) that has been customer-designed for a specific need

The availability of ASIC technology is especially pertinent to coprocessors

Designers attempt to make the coprocessor general enough to work with many protocol

NICs With Onboard Processing

Many protocol processing tasks are I/O bound An obvious optimization consists of moving

processing onto NIC IP checksum, packet encryption or compression

The chief advantage of onboard processing lies in reduce CPU load a NIC only needs to handle packets from a single interface

What components are used to create smart NICs ?– ASIC hardware : incorporate special-purpose chips in to a

NIC– Embedded RISC hardware : contains an onboard RAM and

an onboard ROM

An optimized system with smart NIC

Standard CPU Smart NIC2Smart NIC1

Most layer 2 processingsome layer 3 processing

all otherprocessing

Most layer 2 processingsome layer 3 processing

NIC handles layers 2 and 3 CPU only handles exceptions

Smart NICs With Onboard Stacks

A RISC processor makes it possible to add more protocol processing functionality to a NIC

Constrains arise that limit the scalability of a system that uses smart NICs in a conventional computer the data path between NICs becomes a bottleneck

In a traditional computer system, the data path includes the bus to which the NIC attaches and memory

Existing protocols

Redesign protocols– Allow sender to choose a size up to the maximum– Make hardware design more difficult– Are not well-suited to applications like voice that

require bounded latency

Variable-size packets– Each address is globally known– Arises from forwarding overhead

Cells And Connection-Oriented Addressing

Requires new protocol, new packet formats, and a connection-oriented paradigm

Fixed-size packets– Allows fixed-size buffers– Guaranteed time to transmit/receive

Relative (connection-oriented) addressing– Smaller address size– Label on packet changes at each switch– Requires connection setup

Example: ATM

Data Pipelines

Move each packet through series of processors

Each processor handles some tasks Assessment

– Well-suited to many protocol processing tasks– Individual processor can be fast

Advantage– Much less complex and run faster– All stages can operate at the same time

5-stage data pipeline

Lookup the des. Add. Computing the outgoing checksum

fragmentation Encapsulation the datagramDecode the datagram

QUESTION？