1 ramp implementation j. wawrzynek. 2 rdl supports multiple platforms: xup, pure software, bee2...

13
1 RAMP Implementation J. Wawrzynek

Post on 19-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

1

RAMP Implementation

J. Wawrzynek

2

RDL supports multiple platforms: XUP, pure software, BEE2

BEE2 will be the standard RAMP platform for the next ~1.5 years.

Early 2006, start to design BEE3 platform (based on Xilinx Virtex 5) detailed design and prototypes in 2007.

What are we going to build and when?Platforms (hosts):Platforms (hosts):

3

2VP70FPGA2VP70FPGA

2VP70FPGA2VP70FPGA

2VP70FPGA2VP70FPGA

2VP70FPGA2VP70FPGA

2VP70FPGA2VP70FPGA

5 Virtex-IIPro70 FPGAs 2.5M logic gates equivalents

20GB DRAM 20 10Gbps connections

10GigE/Infiniband Inter-module

connections I/O, analog

interfaces

BEE2 Platform DevelopmentChen Chang, Pierre Droz, Henry Chen, Andrew Schultz, Dan Burke, Bob Broderson

4

BEE2 Module DesignFPGAs

DRAM

CompactFlash Card

10GigEports

10/100 Enet

USB

DVI/HDMI

14X17 inch 22 layer PC board~$4K/module w/o FPGAs or DRAM

5

“If You Build it, They Will Come”

BWRC: ASIC/SOC emulation, Cognitive Radio Algorithm Exploration, PicoRadio simulation, LDPC simulation, EM Antenna Simulation

SSL, UC Radio Astronomy Lab: SETI, Allen Telescope Array

GSRC: Home Media Gateway RAMP: UCB, Stanford, UW, UT Austin, CMU, MIT, Intel:

Multiprocessor Emulation Bob Conn/ Research Triangle Inst.: Spice Circuit

Simulation Rob Reutenbar/CMU: Speech Recognition Stanford BioInformatics Group: Biological signaling

research Chris Dick, Kees Vissers / Xilinx: Signal/Media

Processing ST Microelectronics Widespread interest and dozens of other requests.

Current and Soon to Be Users

6

Lots of BEE2s Hardware:

Module in “production” use, JPL Deep Space-Network (Barstow, CA)

10 modules in test/bring-up Currently allocated to BWRC, SSL,

RAL, Xilinx Working with SAE Materials to

move from prototype to “turn-key” production and move production management away from BWRC

Production of another 25 modules underway for RAMP (and others)

Gateware/Software: Linux port Simulink/Xilinx-EDA integration for automatic compilation to bit-files Several Radio Astronomy applications (with ADC interface) complete

(spectrometers, correlators) Board-support package close to release (test suites, docs, app notes)

First BEE2 users hands-on workshop January 17-19.

7

RAMP Gateware/Software:Goals:

successes in the short run build a long lasting flexible infrastructure.

Three versions of RAMP with varying degrees of capability and flexibility All start now and hitting milestones staggered in time.

RAMP red: Transactional Memory, 2Q06 RAMP blue: Cluster of MicroBlazes, 3Q06 RAMP white: Cache Coherent Shared Memory Multiprocessor, (version 1.0 4Q06, others over '07)

All three versions will have HW debug support.

8

RAMP Red: Transactional Memory Lead is Stanford Group (Christos Kozyakis)

Already have XUP version with ring interconnect topology

Port to BEE2 module 8 hard powerPC cores connected through the central

control FPGA. Control FPGA also runs Linux and I/O, enet, disks, etc.

Later versions extend to multiple BEE2 modules using multiboard routers as they become available.

9

RAMP Red

P P

$ $Arb

P P

$ $Arb

P P

$ $Arb

P P

$ $Arb

DRAMDRAM

DRAM

DRAM

FPGA FPGA

FPGAFPGA

FPGAControl

10

RAMP Blue

Lead by Berkeley Group Based on MicroBlaze from Xilinx

Already optimized for FPGAs Gives us more than 2 cores per FPGA

Cluster message passing style architecure is a popular and useful model: with a port of MPI, runs standard SPMD supercomputing

applications (linpack, etc.) Can evolve into "internet in a box" system

11

RAMP Blue

6 microBlaze cores SP FP (double precision initially

through emulation, later modify core (microcode) approach)

4 Cores share one memory port (1-2GB) richly connected within

single FPGA

Each core runs micro-Linux kernel (no VM)

MBMB MBMB

MBMB MBMB

DRAMDRAM

MBMB MBMB

MBMB MBMB

DRAMDRAM

MBMB MBMB

MBMB MBMB

DRAMDRAM

MBMB MBMB

MBMB MBMB

DRAMDRAM

3D R

oute

r3D

Rou

ter

12

RAMP Blue Each FPGA in the system (not counting control FPGAs) reside as a node in a 3D torus

Target design size would be 4X4x4 torus => 1024 processors

Pieces needed: simple modifications to the MB core to

put in RDL framework Memory controller to arbitrate requests

among 4 cores 3D router (1 per FPGA)

Port of MPI (software at first, later optimize in gateware)

13

RAMP White

V1.0 PowerPC hardcores + Cache Coherency

V2.0 32 bit soft cores V3.0 64-bit softcores