1 ramp implementation j. wawrzynek. 2 rdl supports multiple platforms: xup, pure software, bee2...
Post on 19-Dec-2015
217 views
TRANSCRIPT
2
RDL supports multiple platforms: XUP, pure software, BEE2
BEE2 will be the standard RAMP platform for the next ~1.5 years.
Early 2006, start to design BEE3 platform (based on Xilinx Virtex 5) detailed design and prototypes in 2007.
What are we going to build and when?Platforms (hosts):Platforms (hosts):
3
2VP70FPGA2VP70FPGA
2VP70FPGA2VP70FPGA
2VP70FPGA2VP70FPGA
2VP70FPGA2VP70FPGA
2VP70FPGA2VP70FPGA
5 Virtex-IIPro70 FPGAs 2.5M logic gates equivalents
20GB DRAM 20 10Gbps connections
10GigE/Infiniband Inter-module
connections I/O, analog
interfaces
BEE2 Platform DevelopmentChen Chang, Pierre Droz, Henry Chen, Andrew Schultz, Dan Burke, Bob Broderson
4
BEE2 Module DesignFPGAs
DRAM
CompactFlash Card
10GigEports
10/100 Enet
USB
DVI/HDMI
14X17 inch 22 layer PC board~$4K/module w/o FPGAs or DRAM
5
“If You Build it, They Will Come”
BWRC: ASIC/SOC emulation, Cognitive Radio Algorithm Exploration, PicoRadio simulation, LDPC simulation, EM Antenna Simulation
SSL, UC Radio Astronomy Lab: SETI, Allen Telescope Array
GSRC: Home Media Gateway RAMP: UCB, Stanford, UW, UT Austin, CMU, MIT, Intel:
Multiprocessor Emulation Bob Conn/ Research Triangle Inst.: Spice Circuit
Simulation Rob Reutenbar/CMU: Speech Recognition Stanford BioInformatics Group: Biological signaling
research Chris Dick, Kees Vissers / Xilinx: Signal/Media
Processing ST Microelectronics Widespread interest and dozens of other requests.
Current and Soon to Be Users
6
Lots of BEE2s Hardware:
Module in “production” use, JPL Deep Space-Network (Barstow, CA)
10 modules in test/bring-up Currently allocated to BWRC, SSL,
RAL, Xilinx Working with SAE Materials to
move from prototype to “turn-key” production and move production management away from BWRC
Production of another 25 modules underway for RAMP (and others)
Gateware/Software: Linux port Simulink/Xilinx-EDA integration for automatic compilation to bit-files Several Radio Astronomy applications (with ADC interface) complete
(spectrometers, correlators) Board-support package close to release (test suites, docs, app notes)
First BEE2 users hands-on workshop January 17-19.
7
RAMP Gateware/Software:Goals:
successes in the short run build a long lasting flexible infrastructure.
Three versions of RAMP with varying degrees of capability and flexibility All start now and hitting milestones staggered in time.
RAMP red: Transactional Memory, 2Q06 RAMP blue: Cluster of MicroBlazes, 3Q06 RAMP white: Cache Coherent Shared Memory Multiprocessor, (version 1.0 4Q06, others over '07)
All three versions will have HW debug support.
8
RAMP Red: Transactional Memory Lead is Stanford Group (Christos Kozyakis)
Already have XUP version with ring interconnect topology
Port to BEE2 module 8 hard powerPC cores connected through the central
control FPGA. Control FPGA also runs Linux and I/O, enet, disks, etc.
Later versions extend to multiple BEE2 modules using multiboard routers as they become available.
9
RAMP Red
P P
$ $Arb
P P
$ $Arb
P P
$ $Arb
P P
$ $Arb
DRAMDRAM
DRAM
DRAM
FPGA FPGA
FPGAFPGA
FPGAControl
10
RAMP Blue
Lead by Berkeley Group Based on MicroBlaze from Xilinx
Already optimized for FPGAs Gives us more than 2 cores per FPGA
Cluster message passing style architecure is a popular and useful model: with a port of MPI, runs standard SPMD supercomputing
applications (linpack, etc.) Can evolve into "internet in a box" system
11
RAMP Blue
6 microBlaze cores SP FP (double precision initially
through emulation, later modify core (microcode) approach)
4 Cores share one memory port (1-2GB) richly connected within
single FPGA
Each core runs micro-Linux kernel (no VM)
MBMB MBMB
MBMB MBMB
DRAMDRAM
MBMB MBMB
MBMB MBMB
DRAMDRAM
MBMB MBMB
MBMB MBMB
DRAMDRAM
MBMB MBMB
MBMB MBMB
DRAMDRAM
3D R
oute
r3D
Rou
ter
12
RAMP Blue Each FPGA in the system (not counting control FPGAs) reside as a node in a 3D torus
Target design size would be 4X4x4 torus => 1024 processors
Pieces needed: simple modifications to the MB core to
put in RDL framework Memory controller to arbitrate requests
among 4 cores 3D router (1 per FPGA)
Port of MPI (software at first, later optimize in gateware)