slide 1 starbridge viva™ starbridge solutions to supercomputing problems reconfigurable systems...
TRANSCRIPT
Slide 1
Starbridge Viva™
Starbridge Solutions to Supercomputing Problems
Reconfigurable Systems Summer Institute
Esmail ChitalwalaStarbridge Customer Support and Services
12th July 2005
Slide 2
Outline
Current problems faced by application designers:– Code Development and Application Design– Execution Environment– Application Portability– Application Speed-up and Performance– Toolset
Solution:– Current emphasis - Development environment, programming
tools– Concern - Application speed-up– Future directions …
Slide 3
Code Development
Current HPC applications designed using ‘C’ and ‘C’-based languages that perform serial execution on processors.
Parallel computing languages and architectures e.g Unified Parallel C (UPC) ,MPI.
Languages designed for developing applications to run on single or multiple processors, clusters, supercomputers.
Slide 4
Viva™ - Graphical Interface
• Windows-based application– Menu/Toolbar– Window Panes
• Object oriented– Drag and drop– Connect the dots
• Abstraction– High level (“black box”)– Low level (bits)
Slide 5
Viva™ - Graphical Interface
Slide 6
Viva™ - “3D Development”
Top Sheet
2nd Level
3rd Level
x,y
z
Slide 7
Graphical Interface Advantages
Capture native parallelismTune algorithms for speed or space Interactively debug code running in
hardware
Slide 8
Execution Environment
Current generation of parallel computing applications based on single or multiple processors, clusters, supercomputers.
Next generation processors constitute multiple cores on a single processor allowing for parallel thread execution.
Significant overheads in processing and transfer of data.
Huge set-up costs in terms of space, time, power and money.
Slide 9
Execution Environment
Reconfigurable FPGA-based computers already allow the creation of parallel execution modules.
This could potentially allow the instantiation of multiple parallel execution modules depending on application scalability.
Less overheads when communicating and transferring data between modules.
Significantly lower ownership, operation and maintenance costs.
Slide 10
Reconfigurable Computers
Hypercomputer® – 8 - Virtex II – 6000 (6M gates)– 1 – Virtex II – Router– 1 – Virtex II – Cross Point
Switch– 1 - Virtex II - PCIX– 36 Gig RAM in 36 banks
FPGA Virtex II 6000
FPGA Virtex II 6000
0.5 GB
DD
R
RA
M
0.5 GB
DD
R
RA
M
0.5 GB
DD
R
RA
M
0.5 GB
DD
R
RA
M0.5 GB DDR RAM
0.5 GB DDR RAM
0.5 GB DDR RAM
0.5 GB DDR RAM
Slide 11
When someone says ``I want a programming language in which I need only say what I wish done,'' give him a lollipop. -- Alan Perlis
Slide 12
Application Portability
No direct or straight forward path for application portability.
What might help:– Using Viva there is no need to know Verilog/VHDL to
design for FPGA hardware– Abundance of design and application libraries to easily
build newer optimized scalable applications for FPGA execution
– Allows existing VHDL/Verilog cores to be ported into the development environment
– Allows code portability across different hardware platforms
Slide 13
Porting to Viva®
• Algorithm analysis Un-optimized
• Design considerations Parallelization
• Internals• Multiple “pipes”
Hardware efficiency• I/O• Memory• Data width
• Code/Test/Modify
Slide 14
Design Flow in Viva®
START
Load x86 System Description
Design Sheet (.IDL)/Project
(.IPG)Algorithm
Implementation
Viva® synthesis
Functional Test and
Simulation
NO
YES
Load FPGA System
Description
Viva® synthesis
Pass ?NO
Xilinx PAR
Timing, Area ?
NO
YES
YES
END/RUN
Viva®
Xilinx
Slide 15
Viva® : Library and Composite Objects Contained within CoreLib. Composite objects consist of
modules constructed using primitives, EDIF imports and other composite objects.
Objects can be polymorphic or mapped to a particular data set.
Contains modules with a host of functionality like logic gates, math operators, communication objects, memory modules and grammatical objects.
Slide 16
Simulation in X86 Environment
The x86 SD is used in the initial stages of design to test functionality.
Almost every object in CoreLib has an equivalent x86 SD for simulation.
Runs on the micro-processor and provides accurate simulation of design ensuring successful place-and-route during synthesis.
Performs functional simulation of the design.
May not be cycle accurate.
Slide 17
Application Interface
Viva provides a widget based interface to the application whether you are simulating or executing on the hardware.
Slide 18
Execution using Hardware specific System Description Contains objects and system level
implementations mapped to specific components and primitives within FPGA system.
All Library objects and components contain equivalent descriptions for each FPGA SD.
Different SDs can be created using Viva® for different FPGA- based systems from other vendors.
Slide 19
Viva™ Execution Environment
CoreLib
IIADL Editor System Definition
EDIF
HDL
X86
Xilinx Tools
Behavioral CommunicationSystem
FPGA
System Description
Compiler
Slide 20
Viva™ Execution Environment
CoreLib
IIADL Editor
EDIF
HDL
X86
Xilinx Tools FPGA
System Description
Compiler
Hypercomputer
HC-62
Slide 21
Viva™ Execution Environment
CoreLib
IIADL Editor
EDIF
HDL
X86
Xilinx Tools FPGA
System Description
Compiler
NASA RSC
Slide 22
Viva™ Execution Environment
CoreLib
IIADL Editor
EDIF
HDL
X86
Xilinx Tools FPGA
System Description
Compiler
SGI Athena
Slide 23
Viva™ Execution Environment
CoreLib
IIADL Editor
EDIF
HDL
X86
Xilinx Tools FPGA
System Description
Compiler
Nallatech
Slide 24
Viva™ - COM/ActiveX Interface and ‘C’ API• Provides link to/from host
– Data requests (e.g., File I/O) using COM or ‘C’ API (for HC-xx)– Process “spawning” (e.g., multiple execution threads)
Slide 25
Viva Bridges to Existing EnvironmentsEDIF Import & Export
HDL code EDIF
Import Process
Viva Primitive
Viva Design
Export Process
EDIF
Slide 26
Application Speed-Up
Speed-Up
FPGA Clock Speed
IO (Communication)
Speed
Parallelism within Algorithm
Design Complexity
Operations
PCI/PCI-X
PCI Express
JTAG
Proprietary / Non-standard IO
Data dependency
Loops/Iterations
Slide 27
Application speed-up
Factors affecting application speed-up can be split into three broad categories:
FPGA clock speed
IO Communication and bus speeds
Parallelism within the algorithm being implemented
Slide 28
FPGA Clock speed
FPGA clock speed directly relates to the speed of execution in hardware
Higher FPGA clock speeds requires more stringent design rules, heavy use of pipelining and potentially more area on the FPGA
May increase synthesis and place and route time of applications
The maximum clock speed at which an application can be clocked depends to a large extent on the complexity of the application
Slide 29
FPGA Clock Speed
Viva allows the user to adjust the clock speed depending on the constraints and complexity of the algorithm being implemented
Viva allows for quick synthesis with a major portion of the time being spent in place and route
Objects and libraries created in Viva support high clock speeds, removing one more barrier for an application designer
Slide 30
IO Communication and Bus Speeds IO Bandwidth determines to a large extent the efficiency of
the system Could potentially affect the processing rate on the FPGA A variety of protocols exist to facilitate IO communication
between the host and the FPGA Some are industry standards e.g PCI, PCI-X, PCI-Express,
VME, JTAG, etc Others are non-standard or proprietary employing
innovative solutions to achieve high bandwidth Using industry standard protocols allows easy upgrade and
use of COTS components
Slide 31
IO Communication and Bus Speeds The Hypercomputers use a standard PCI-X
interface (66 MHz) to communicate with the host processors.
The Hypercomputer itself could be placed on a PCI slot within any standard desktop or server configuration.
Provides for an easy path for migration from PCI to PCI-Express.
Presence of External IO pins allow for real time data acquisition and processing using FPGAs.
Slide 32
IO Communication and Bus Speeds Performance: HC – 62:
Memory 76.0 GB/s
Interconnect
12.7 GB/s
Crosspoint 12.5 GB/s
Router 12.5 GB/s
External IO 8.5 GB/s
PCIX 200 MB/s
Slide 33
Parallelism within algorithm being implemented The advantage of Reconfigurable hardware lies in the
ability of the designer to unroll software loops and parallelize data independent statements on the FPGA.
//Typical software loop
loop (1 , 3) {
statement 1;
statement 2; }
//Software loop unrolled
statement 1;
statement 2;
statement 1;
statement 2;
statement 1;
statement 2;
Slide 34
Parallelism within algorithm being implemented
Statement 1 Statement 2
Statement 1 Statement 2
Statement 1 Statement 2
Statement 1
Statement 2
Statement 1
Statement 2
Statement 1
Statement 2
Case 1: •Statement 1 and 2 are dependent•Every iteration of the loop is dependent on the results from the previous one.
Case 2: •Statement 1 and 2 are independent•Every iteration of the loop is dependent on the results from the previous one.
Slide 35
Parallelism within algorithm being implemented
Statement 1Statement 2
Statement 1Statement 2
Statement 1Statement 2
Case 3: •Statement 1 and 2 are independent•Every iteration of the loop is independent from the results of the previous one.
Slide 36
Viva™ - Application Speed-up
Smith-Watermano Pattern matching algorithmo Multi-million gates (60-70M)o Full HC-62 (10 FPGAs, 2 GB SDRAM)o Compile time of 20 minuteso 14.7 billion S-W steps/so 4 bits per charactero National Cancer Institute Tests
Data load, process, visualize, single data set
1M x 1M (Rat/Human) Starbridge: approx. 5 min. NCI: approx. 24 hours 288 X Performance
167M x 47M (Human X/Y) Starbridge: approx. 5.5 days NCI: N/A
Slide 37
Viva™ - Application Speed-up
• Traveling Salesman Problem (TSP)o Multi-million gates
(approx. 5.5M)o Single HC-62 FPGAo NASA Tests
Base: 3.2GHz Xeon w/compiler optimization
65 city tour Viva/FPGA: over
11x improvement
Slide 38
Future Direction
Take the best of both worlds: Include a text based programming interface to
supplement the GUI
Include Petri-net based simulation environment for more accurate, fast and reliable simulation
Create support for team based development for FPGA-based modules
Speed-up place and route time by employing processors within a network
Slide 39
Star Bridge Systems, Inc.
Esmail [email protected]@starbridgesystems.com
“The computer is the first metamedium, and as such it has degrees of freedom for representation and expression never before encountered and as yet barely investigated.”
- Alan Kay