l13 ixaxscalemicroengine sdk tutorial

Upload: misha-kornev

Post on 03-Jun-2018

227 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 L13 IXAXscaleMicroengine SDK tutorial

    1/25

    ECE 526 Network

    Processing Systems DesignIXP XScale and Microengines

    Chapter 18 & 19: D. E. Comer

  • 8/12/2019 L13 IXAXscaleMicroengine SDK tutorial

    2/25

    Ning Weng ECE 526 2

    Overview Recalled

    Packet processing functions (forwarding, queuing)

    Traditional network processing systems (CPU + NICs)

    General network processor architecture and tradeoffs

    Intel IXP network processors overall architecture

    Focus on individual components of Intel IXP chip Control processor (slow path): XScale core

    Overall architecture

    Typical functions

    Processor features

    Packet processing processor (fast path): Microengines

    Architecture and features

    Differences to conventional processors

    Pipelining and multi-threading

  • 8/12/2019 L13 IXAXscaleMicroengine SDK tutorial

    3/25

    Ning Weng ECE 526 3

    Purpose of Control Processor

    Functions typically executed by embedded control proc: Bootstrapping

    Exception handling

    Higher-layer protocol processing

    Interactive debugging

    Diagnostics and logging

    Memory allocation

    Application programs (if needed)

    User interface and/or

    interface to the GPP Control of packet processors

    Other administrative functions

  • 8/12/2019 L13 IXAXscaleMicroengine SDK tutorial

    4/25

    Ning Weng ECE 526 4

    XScale Memory Architecture Memory architecture

    Uses 32-bit linear address space

    configurable endian mode

    Byte addressable

    Memory Mapping

    Allocation of address space (2^32) to different systemcomponents

    Accesses to memory is translated into access to component

    Needs to be carefully crafted

    XScale assumes byte addressable memory

    Underlying memory uses different size (SDRAM)

    How does this work?

    Support for Virtual Memory For demand paging to secondary storage

  • 8/12/2019 L13 IXAXscaleMicroengine SDK tutorial

    5/25

    Ning Weng ECE 526 5

    Shared Memory Address Issues

    Memory is shared between XScale and Microengines Same data, but different addresses

    What impact does this have? Pointers need to be translated

    Data structures with pointers can not be shared

  • 8/12/2019 L13 IXAXscaleMicroengine SDK tutorial

    6/25

    Ning Weng ECE 526 6

    Microengines

    Microengines are data-path packet processors IXP IXP 2400 have 8 Microengines

    Simpler than XScale

    Low level device

    as a micro-sequencer Optimized for

    packet processing

    More complex to use

    Often abbreviated as uE

  • 8/12/2019 L13 IXAXscaleMicroengine SDK tutorial

    7/25

    Ning Weng ECE 526 7

    uE Functions

    uEs handle ingress and egress packet processing: Packet ingress from physical layer hardware

    Checksum verification

    Header processing and classification

    Packet buffering in memory

    Table lookup and forwarding

    Header modification

    Checksum computation

    Packet egress to physical layer hardware

  • 8/12/2019 L13 IXAXscaleMicroengine SDK tutorial

    8/25

    Ning Weng ECE 526 8

    uE Architecture

    uE characteristics: Programmable microcontroller

    RISC design

    256 general-purpose registers

    512 transfer registers

    128 next neighbor registers

    Hardware support for 8 threads and context switching

    640 words of local memory

    Control of an Arithmetic and Logic Unit

    Direct access to various functional units

    A unit to compute a Cyclic Redundancy Check (CRC)

  • 8/12/2019 L13 IXAXscaleMicroengine SDK tutorial

    9/25

    Ning Weng ECE 526 9

    uE as Micro-sequencer Micro-sequencer does not contain native instructions for

    possible operations Instead of using instructions, uE invokes functional units to

    perform operations

    Control unit is much simpler

    Example 1: uE does not have ADD R2,R3 instruction

    Instead: ALU ADD R2, R3

    ALU indicates that ALU should be used

    ADD is a parameter to ALU

    Example 2: Memory access not by simple LOAD R2, 0xdeadbeef

    Instead: SRAM LOAD R2, 0xdeadbeef

    Altogether similar to normal processor, but more basic

  • 8/12/2019 L13 IXAXscaleMicroengine SDK tutorial

    10/25

  • 8/12/2019 L13 IXAXscaleMicroengine SDK tutorial

    11/25

    Ning Weng ECE 526 11

    uE Memories

    uEs: viewing memories differently than XScale does Does not map memories and I/O devices into a liner address

    space

    Does not view memories as a seamless, uniform repository

    uE ISA: requiring a separate instruction for each type ofmemory and I/O device

    SRAM[read, $$x, address1, address2]

    Programmer: required binding of data items to specific

    type of memory permanently.

  • 8/12/2019 L13 IXAXscaleMicroengine SDK tutorial

    12/25

    Ning Weng ECE 526 12

    Execution Pipeline

    What is pipeline? Why pipeline is employed? One instruction is executed per cycle if pipeline is proper

    designed

    uEs use five-stage or six-stage pipeline:

  • 8/12/2019 L13 IXAXscaleMicroengine SDK tutorial

    13/25

    Ning Weng ECE 526 13

    Pipelining

  • 8/12/2019 L13 IXAXscaleMicroengine SDK tutorial

    14/25

    Ning Weng ECE 526 14

    Pipelining Problems Possible sources of pipelining problems

    Data dependencies

    Control dependencies

    Resource dependencies

    Memory accesses

    How pipelining problem impact system performance

    How these impact can be removed or reduced Remove the sources so that no stall happened

    Hide the impact of pipelining stall

  • 8/12/2019 L13 IXAXscaleMicroengine SDK tutorial

    15/25

    Ning Weng ECE 526 15

    Pipeline Stalls K: ALU ADD R2, R1, R2

    K+1 ALU ADD R3, R2, R3

    Control dependencies, memory have even bigger impact

  • 8/12/2019 L13 IXAXscaleMicroengine SDK tutorial

    16/25

    Ning Weng ECE 526 16

    Threading Illustration

  • 8/12/2019 L13 IXAXscaleMicroengine SDK tutorial

    17/25

    Ning Weng ECE 526 17

    Hardware Threads uEs support 8 hardware thread contexts

    One thread can execute at any given time

    When stall occurs, uE can switch to other thread (if not stalled)

    Very low overhead for context switch Zero-cycle context switch

    Effectively can take around three cycles due to pipeline flush

    Switching rules If thread stalls, check if next is ready for processing

    Keep trying until ready thread is found

    If none is available, stall uE and wait for any thread to unblock

    Improves overall throughput

    Questions: Why not 16, 32 threads

    why not have 48 uEs with 1 thread?

  • 8/12/2019 L13 IXAXscaleMicroengine SDK tutorial

    18/25

    Ning Weng ECE 526 18

    Summary Control processor (slow path): XScale core

    Overall architecture

    Typical functions

    Processor features

    Packet processing processor (fast path): Microengines

    Architecture and features Differences to conventional processors

    Pipelining and multi-threading

  • 8/12/2019 L13 IXAXscaleMicroengine SDK tutorial

    19/25

    Ning Weng ECE 526 19

    Lab3 Brief Intel Reference Systems

    SDK Tutorial

    Lab 3

  • 8/12/2019 L13 IXAXscaleMicroengine SDK tutorial

    20/25

    Ning Weng ECE 526 20

    Intel Reference Systems Hardware Testbed

    IXP2400 network processors

    QDRM-SRAM, Flash ROM and other memories

    1G optical ethernet ports

    100M ethernet management port

    Serial interface

    PCI interfaces

    SDK (software development kit) Compiler

    Assembler, linker

    Simulator

    Reference codes

  • 8/12/2019 L13 IXAXscaleMicroengine SDK tutorial

    21/25

    Ning Weng ECE 526 21

    Lab3: Forwarding, Counting & Classification

    Goal: to explore the basic functionalities of the IXP2400 softwaredevelopment kit and Microengines.

    3 parts: Part I: collecting a number of workload statistics from the IXP SDK

    simulator. Follow steps of lab instruction.

    Part II: adding one counting block to count the number of packets.

    Part III: implementing a simple packet classification mechanism.

    Tools: All three parts require access to a machine that has the Intel

    SDK installed. If you want, you can also request an installation CDfor your own machine, check with TA.

  • 8/12/2019 L13 IXAXscaleMicroengine SDK tutorial

    22/25

    Ning Weng ECE 526 22

    Part I: Forwarding Simulation run an implementation of IP forwarding on the IXP2400

    simulator. All the code is provided to you.

    collect a set of workload statistics that are reported bythe simulator.

  • 8/12/2019 L13 IXAXscaleMicroengine SDK tutorial

    23/25

    Ning Weng ECE 526 23

    Part II: Forwarding and Counting

    modify above applications by adding counter block

    store how many packets are received.

  • 8/12/2019 L13 IXAXscaleMicroengine SDK tutorial

    24/25

  • 8/12/2019 L13 IXAXscaleMicroengine SDK tutorial

    25/25

    Ning Weng ECE 526 25

    How to do Lab3 Windows machine with SDK installed

    Download lab instructions and source code fromblackboard

    Start early.

    Very exciting lab. Due day

    Part I and Part II 10/13

    Part III 10/20