slide 1 starbridge viva™ starbridge solutions to supercomputing problems reconfigurable systems...

Post on 13-Jan-2016

224 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Slide 1

Starbridge Viva™

Starbridge Solutions to Supercomputing Problems

Reconfigurable Systems Summer Institute

Esmail ChitalwalaStarbridge Customer Support and Services

12th July 2005

Slide 2

Outline

Current problems faced by application designers:– Code Development and Application Design– Execution Environment– Application Portability– Application Speed-up and Performance– Toolset

Solution:– Current emphasis - Development environment, programming

tools– Concern - Application speed-up– Future directions …

Slide 3

Code Development

Current HPC applications designed using ‘C’ and ‘C’-based languages that perform serial execution on processors.

Parallel computing languages and architectures e.g Unified Parallel C (UPC) ,MPI.

Languages designed for developing applications to run on single or multiple processors, clusters, supercomputers.

Slide 4

Viva™ - Graphical Interface

• Windows-based application– Menu/Toolbar– Window Panes

• Object oriented– Drag and drop– Connect the dots

• Abstraction– High level (“black box”)– Low level (bits)

Slide 5

Viva™ - Graphical Interface

Slide 6

Viva™ - “3D Development”

Top Sheet

2nd Level

3rd Level

x,y

z

Slide 7

Graphical Interface Advantages

Capture native parallelismTune algorithms for speed or space Interactively debug code running in

hardware

Slide 8

Execution Environment

Current generation of parallel computing applications based on single or multiple processors, clusters, supercomputers.

Next generation processors constitute multiple cores on a single processor allowing for parallel thread execution.

Significant overheads in processing and transfer of data.

Huge set-up costs in terms of space, time, power and money.

Slide 9

Execution Environment

Reconfigurable FPGA-based computers already allow the creation of parallel execution modules.

This could potentially allow the instantiation of multiple parallel execution modules depending on application scalability.

Less overheads when communicating and transferring data between modules.

Significantly lower ownership, operation and maintenance costs.

Slide 10

Reconfigurable Computers

Hypercomputer® – 8 - Virtex II – 6000 (6M gates)– 1 – Virtex II – Router– 1 – Virtex II – Cross Point

Switch– 1 - Virtex II - PCIX– 36 Gig RAM in 36 banks

FPGA Virtex II 6000

FPGA Virtex II 6000

0.5 GB

DD

R

RA

M

0.5 GB

DD

R

RA

M

0.5 GB

DD

R

RA

M

0.5 GB

DD

R

RA

M0.5 GB DDR RAM

0.5 GB DDR RAM

0.5 GB DDR RAM

0.5 GB DDR RAM

Slide 11

When someone says ``I want a programming language in which I need only say what I wish done,'' give him a lollipop. -- Alan Perlis

Slide 12

Application Portability

No direct or straight forward path for application portability.

What might help:– Using Viva there is no need to know Verilog/VHDL to

design for FPGA hardware– Abundance of design and application libraries to easily

build newer optimized scalable applications for FPGA execution

– Allows existing VHDL/Verilog cores to be ported into the development environment

– Allows code portability across different hardware platforms

Slide 13

Porting to Viva®

• Algorithm analysis Un-optimized

• Design considerations Parallelization

• Internals• Multiple “pipes”

Hardware efficiency• I/O• Memory• Data width

• Code/Test/Modify

Slide 14

Design Flow in Viva®

START

Load x86 System Description

Design Sheet (.IDL)/Project

(.IPG)Algorithm

Implementation

Viva® synthesis

Functional Test and

Simulation

NO

YES

Load FPGA System

Description

Viva® synthesis

Pass ?NO

Xilinx PAR

Timing, Area ?

NO

YES

YES

END/RUN

Viva®

Xilinx

Slide 15

Viva® : Library and Composite Objects Contained within CoreLib. Composite objects consist of

modules constructed using primitives, EDIF imports and other composite objects.

Objects can be polymorphic or mapped to a particular data set.

Contains modules with a host of functionality like logic gates, math operators, communication objects, memory modules and grammatical objects.

Slide 16

Simulation in X86 Environment

The x86 SD is used in the initial stages of design to test functionality.

Almost every object in CoreLib has an equivalent x86 SD for simulation.

Runs on the micro-processor and provides accurate simulation of design ensuring successful place-and-route during synthesis.

Performs functional simulation of the design.

May not be cycle accurate.

Slide 17

Application Interface

Viva provides a widget based interface to the application whether you are simulating or executing on the hardware.

Slide 18

Execution using Hardware specific System Description Contains objects and system level

implementations mapped to specific components and primitives within FPGA system.

All Library objects and components contain equivalent descriptions for each FPGA SD.

Different SDs can be created using Viva® for different FPGA- based systems from other vendors.

Slide 19

Viva™ Execution Environment

CoreLib

IIADL Editor System Definition

EDIF

HDL

X86

Xilinx Tools

Behavioral CommunicationSystem

FPGA

System Description

Compiler

Slide 20

Viva™ Execution Environment

CoreLib

IIADL Editor

EDIF

HDL

X86

Xilinx Tools FPGA

System Description

Compiler

Hypercomputer

HC-62

Slide 21

Viva™ Execution Environment

CoreLib

IIADL Editor

EDIF

HDL

X86

Xilinx Tools FPGA

System Description

Compiler

NASA RSC

Slide 22

Viva™ Execution Environment

CoreLib

IIADL Editor

EDIF

HDL

X86

Xilinx Tools FPGA

System Description

Compiler

SGI Athena

Slide 23

Viva™ Execution Environment

CoreLib

IIADL Editor

EDIF

HDL

X86

Xilinx Tools FPGA

System Description

Compiler

Nallatech

Slide 24

Viva™ - COM/ActiveX Interface and ‘C’ API• Provides link to/from host

– Data requests (e.g., File I/O) using COM or ‘C’ API (for HC-xx)– Process “spawning” (e.g., multiple execution threads)

Slide 25

Viva Bridges to Existing EnvironmentsEDIF Import & Export

HDL code EDIF

Import Process

Viva Primitive

Viva Design

Export Process

EDIF

Slide 26

Application Speed-Up

Speed-Up

FPGA Clock Speed

IO (Communication)

Speed

Parallelism within Algorithm

Design Complexity

Operations

PCI/PCI-X

PCI Express

JTAG

Proprietary / Non-standard IO

Data dependency

Loops/Iterations

Slide 27

Application speed-up

Factors affecting application speed-up can be split into three broad categories:

FPGA clock speed

IO Communication and bus speeds

Parallelism within the algorithm being implemented

Slide 28

FPGA Clock speed

FPGA clock speed directly relates to the speed of execution in hardware

Higher FPGA clock speeds requires more stringent design rules, heavy use of pipelining and potentially more area on the FPGA

May increase synthesis and place and route time of applications

The maximum clock speed at which an application can be clocked depends to a large extent on the complexity of the application

Slide 29

FPGA Clock Speed

Viva allows the user to adjust the clock speed depending on the constraints and complexity of the algorithm being implemented

Viva allows for quick synthesis with a major portion of the time being spent in place and route

Objects and libraries created in Viva support high clock speeds, removing one more barrier for an application designer

Slide 30

IO Communication and Bus Speeds IO Bandwidth determines to a large extent the efficiency of

the system Could potentially affect the processing rate on the FPGA A variety of protocols exist to facilitate IO communication

between the host and the FPGA Some are industry standards e.g PCI, PCI-X, PCI-Express,

VME, JTAG, etc Others are non-standard or proprietary employing

innovative solutions to achieve high bandwidth Using industry standard protocols allows easy upgrade and

use of COTS components

Slide 31

IO Communication and Bus Speeds The Hypercomputers use a standard PCI-X

interface (66 MHz) to communicate with the host processors.

The Hypercomputer itself could be placed on a PCI slot within any standard desktop or server configuration.

Provides for an easy path for migration from PCI to PCI-Express.

Presence of External IO pins allow for real time data acquisition and processing using FPGAs.

Slide 32

IO Communication and Bus Speeds Performance: HC – 62:

Memory 76.0 GB/s

Interconnect

12.7 GB/s

Crosspoint 12.5 GB/s

Router 12.5 GB/s

External IO 8.5 GB/s

PCIX 200 MB/s

Slide 33

Parallelism within algorithm being implemented The advantage of Reconfigurable hardware lies in the

ability of the designer to unroll software loops and parallelize data independent statements on the FPGA.

//Typical software loop

loop (1 , 3) {

statement 1;

statement 2; }

//Software loop unrolled

statement 1;

statement 2;

statement 1;

statement 2;

statement 1;

statement 2;

Slide 34

Parallelism within algorithm being implemented

Statement 1 Statement 2

Statement 1 Statement 2

Statement 1 Statement 2

Statement 1

Statement 2

Statement 1

Statement 2

Statement 1

Statement 2

Case 1: •Statement 1 and 2 are dependent•Every iteration of the loop is dependent on the results from the previous one.

Case 2: •Statement 1 and 2 are independent•Every iteration of the loop is dependent on the results from the previous one.

Slide 35

Parallelism within algorithm being implemented

Statement 1Statement 2

Statement 1Statement 2

Statement 1Statement 2

Case 3: •Statement 1 and 2 are independent•Every iteration of the loop is independent from the results of the previous one.

Slide 36

Viva™ - Application Speed-up

Smith-Watermano Pattern matching algorithmo Multi-million gates (60-70M)o Full HC-62 (10 FPGAs, 2 GB SDRAM)o Compile time of 20 minuteso 14.7 billion S-W steps/so 4 bits per charactero National Cancer Institute Tests

Data load, process, visualize, single data set

1M x 1M (Rat/Human) Starbridge: approx. 5 min. NCI: approx. 24 hours 288 X Performance

167M x 47M (Human X/Y) Starbridge: approx. 5.5 days NCI: N/A

Slide 37

Viva™ - Application Speed-up

• Traveling Salesman Problem (TSP)o Multi-million gates

(approx. 5.5M)o Single HC-62 FPGAo NASA Tests

Base: 3.2GHz Xeon w/compiler optimization

65 city tour Viva/FPGA: over

11x improvement

Slide 38

Future Direction

Take the best of both worlds: Include a text based programming interface to

supplement the GUI

Include Petri-net based simulation environment for more accurate, fast and reliable simulation

Create support for team based development for FPGA-based modules

Speed-up place and route time by employing processors within a network

Slide 39

Star Bridge Systems, Inc.

Esmail Chitalwalaechitalwala@starbridgesystems.comsupport@starbridgesystems.com

“The computer is the first metamedium, and as such it has degrees of freedom for representation and expression never before encountered and as yet barely investigated.”

- Alan Kay

top related