chapter 8 hardware conventional computer hardware architecture
TRANSCRIPT
Chapter 8Hardware Conventional Computer
Hardware Architecture
Outline
The Traditional Software Router Measures Of Speed Fine-grain parallelism Symmetric coarse-grain parallelism Asymmetric coarse-grain parallelism Special-purpose coprocessors NICs with onboard processing Smart NICs with onboard stacks Cell switching Data pipelines
The Traditional Software Router
The hardware architecture used with a software-based network system
The CPU handles all protocol processing tasks except for framing and onboard address recognition
framing &address
recognition
framing &address
recognition
all otherprocessing
Standard CPU NIC2NIC1
Two Measures Of Speed
Data rate (bits per second)– Per interface rate– Aggregate rate
Packet rate (packets per second)– Per interface rate– Aggregate rate
Processing Speed For Two Reasons
A router must be able to handle packets as they arrive from a given network, the processing speed determines the maximum data rate of a network that can be attached to the router
A router must be able to handle packets arriving from multiple networks, the processing speed limits the possible topologies with which the router can be used
Aggregate Data Rate
Total rate at which data can arrive or leave a network system
The maximum aggregate data rate of a system is important because it limits the type and numbers of networks connections the system can handle
Aggregate Packet Rate
For protocol processing tasks that have a fixed cost per packet, the number of packets processed is more important than the aggregate data rate
How many packets arrive per second over a network– Depends on the network’s throughput rate and
the size of the packets
Digital Circuit Speeds
Technology Network Packet Rate Packet Rate Data Rate For small Packets For large Packets In Gbps In Kpps In Kpps
10Base-T 0.010 19.5 0.8100Base-T 0.100 195.3 8.2OC-3 0.156 303.8 12.8OC-12 0.622 1,214.8 51.21000Base-T 1.000 1,953.1 82.3OC-48 2.488 4,860.0 204.9OC-192 9.953 19,440.0 819.6OC-768 39.813 77,760.0 3,278.4
Key concept: maximum packet rate occurs with minimum-size packets
Bar Chart Of Example Packet Rates
Gray areas show rates for large packets
Packet Rate And Software Router Feasibility
The exact rate depends on the CPU speeds, bus bandwidth, and memory latency as well as the amount of processing
The amount of processing required depends on the packet content
Software running on a general-purpose processor is an insufficient architecture to handle high-speed networks because the aggregate packet rate exceeds the capabilities of current CPUs
Maximum per-packet processing time in microseconds of small and large packets for various technologies
Technology Time per Packet Time per Packet For small Packets For large Packets (In μs) (In μs)
10Base-T 51.20 1,214.40 100Base-T 5.12 121.44OC-3 3.29 78.09OC-12 0.82 19.521000Base-T 0.51 12.14OC-48 0.21 4.88OC-192 0.05 1.22OC-768 0.01 0.31
Possible Ways To SolveThe CPU Bottleneck
Fine-grain parallelism Symmetric coarse-grain parallelism Asymmetric coarse-grain parallelism Special-purpose coprocessors NICs with onboard processing Smart NICs with onboard stacks Cell switching Data pipelines
Fine-Grain Parallelism (Instruction-Level Parallelism)
Multiple CPU to work together Instruction-level parallelism does not achieve
significantly higher performance– Few packet processing functions are amenable to
fine-grain optimization– A program must spend time setting up the parallel
instructions– Only improves CPU performance– Expensive
Symmetric Coarse-Grain Parallelism
Offer a set of N identical CPUs Advantages
– Network system designers did not need to invent new symmetric multiprocessor hardware
– Vender had ported a conventional Unix operating system to their multiprocessor hardware Familiar
Processing Capability
Processing capability does not scale linearly as the number of processors increases– Most multiprocessor systems use a shared
memory paradigm where all processors share a kernel address space
– Packet processing software must coordinate access to data structure such as packet queues
– A multiprocessor architecture does not automatically increase the I/O bandwidth
Asymmetric Coarse-Grain Parallelism
Uses multiple, heterogeneous processor that can operate simultaneously
The Advantage arises from the ability to specialize– Each processor in an Asymmetric system can be optimized
for a specific task Drawbacks
– Need general-purpose instructions– Difficult to program– May not perform well for a specific task or a specific
protocol– Expensive to design and build
Special-Purpose Coprocessors
Coprocessors : an architecture that contains a general-purpose CPU plus one or more special-purpose processor
Each coprocessor is designed to perform a specific function all coprocessors function under of the CPU
The chief advantage lies in the freedom it gives a designer
It can also be a small logic circuit that performs one operation does not need general-purpose instructions, and does not need a fetch-execute cycle
Special-Purpose Coprocessors (con’t)
A coprocessor is a piece of hardware that operates under control of the CPU
A processor need not be sophisticated; the coprocessor only need to perform on specific task
To optimize computation, move operations that account for the most CPU time from software into hardware
ASIC Coprocessor Implementation
Application Specific Integrated Circuit (ASIC) refers to an integrated circuit (IC) that has been customer-designed for a specific need
The availability of ASIC technology is especially pertinent to coprocessors
Designers attempt to make the coprocessor general enough to work with many protocol
NICs With Onboard Processing
Many protocol processing tasks are I/O bound An obvious optimization consists of moving
processing onto NIC IP checksum, packet encryption or compression
The chief advantage of onboard processing lies in reduce CPU load a NIC only needs to handle packets from a single interface
What components are used to create smart NICs ?– ASIC hardware : incorporate special-purpose chips in to a
NIC– Embedded RISC hardware : contains an onboard RAM and
an onboard ROM
An optimized system with smart NIC
Standard CPU Smart NIC2Smart NIC1
Most layer 2 processingsome layer 3 processing
all otherprocessing
Most layer 2 processingsome layer 3 processing
NIC handles layers 2 and 3 CPU only handles exceptions
Smart NICs With Onboard Stacks
A RISC processor makes it possible to add more protocol processing functionality to a NIC
Constrains arise that limit the scalability of a system that uses smart NICs in a conventional computer the data path between NICs becomes a bottleneck
In a traditional computer system, the data path includes the bus to which the NIC attaches and memory
Existing protocols
Redesign protocols– Allow sender to choose a size up to the maximum– Make hardware design more difficult– Are not well-suited to applications like voice that
require bounded latency
Variable-size packets– Each address is globally known– Arises from forwarding overhead
Cells And Connection-Oriented Addressing
Requires new protocol, new packet formats, and a connection-oriented paradigm
Fixed-size packets– Allows fixed-size buffers– Guaranteed time to transmit/receive
Relative (connection-oriented) addressing– Smaller address size– Label on packet changes at each switch– Requires connection setup
Example: ATM
Data Pipelines
Move each packet through series of processors
Each processor handles some tasks Assessment
– Well-suited to many protocol processing tasks– Individual processor can be fast
Advantage– Much less complex and run faster– All stages can operate at the same time
5-stage data pipeline
Lookup the des. Add. Computing the outgoing checksum
fragmentation Encapsulation the datagramDecode the datagram
QUESTION?