“it’s like deja-vu all over again”rssi.ncsa.illinois.edu/proceedings/munoz.pdf ·...
TRANSCRIPT
“It’s Like Deja-Vu All Over Again”
José L. Muñoz Acting Director/Senior Science Advisor
Office of CyberInfrastructure (OCI)
Reconfigurable Systems Summer Institute (RSSI)2008
J. L. Muñoz [email protected]
National Science FoundationWhere Discoveries Begin
Office ofCyberInfrastructure
OutlineDARPA: Adaptive Computing Systems
❖ What was the vision?
❖ What was the strategy?
❖ What did we do?
❖ What did we learn?
❖ What are we-relearning?
❖ Things that should be considered
❖ An Opportunity
2
1997 - 2001
J. L. Muñoz [email protected]
National Science FoundationWhere Discoveries Begin
Office ofCyberInfrastructure
DARPA: ACS GOALS
3
Sample ACS Challenge problem: ATR/ 1 cu.ft. 500X better
Performance benefits of Hardware with the flexibility of Software
Temporal re-use: Dynamic adaptation at runtime
Power/area efficiency
Defense testbeds: ACS Challenge Problems
Domain specific development environments
100X - 1000X Performance improvement over micro-processor based systems
Just In Time Hardware
J. L. Muñoz [email protected]
National Science FoundationWhere Discoveries Begin
Office ofCyberInfrastructure
Program Concept: Adaptive Computing Systems (ACS)
4
June 1997 December 1997
Dynamic Young!
Community!
Dynamic/Reconfigurable Capabilities Enabled!•! In-Mission Alterations/Corrections!
•! Custom Efficiency without Custom Point Solutions!
Supporting Development!Environment and Tools!to enable Application!
Khoros
/MatLab Application
Precompiled
ACS elements
CPU FPGA FPGA
Compiler
Mapper
Estimator C
VHDL/Verilog
Place &
Route
Simulation
Developing !New FPGA Based
Architectures and Products!&!
February 1998 January 1999
Board Level
Device Level
May 1999
1997 - 2001
J. L. Muñoz [email protected]
National Science FoundationWhere Discoveries Begin
Office ofCyberInfrastructure
ACS: Program Goals
5
!! Rapid reconfiguration of logic / gate interconnection!–! 1,000,000X reduction in reconfiguration time: msec => nsec!
!! “Just in time” use of logic to achieve space / "power efficiency!
–! 100X - 1000X performance improvement over micro-processors !!! Concurrent algorithm / software / hardware design!
–! Automatic mapping of 0.5M gates in 1/100th of the time!!! Defense Challenge Problems!
–! ATR/1 ft.3 (500X improvement) (Sandia)!
–! Sonar adaptive beamforming (Naval Undersea Warfare Center)!–! IR ATR (Night Vision Laboratory)!
–! RF transient signal analysis (Los Alamos National Lab)!
National Security Impact
!! Address rapid implementation of application specific hardware, without development costs and development time of ASICs!
!! Provide processing adaptive to evolving mission requirements!
!! Enable real-time adaptation to in-mission, changing processing needs !
!! Efficient application implementation will enable size, weight, and power advances enabling more capable and versatile embedded applications!
J. L. Muñoz [email protected]
National Science FoundationWhere Discoveries Begin
Office ofCyberInfrastructure
ACS: Challenges❖ Problem Formulation
• Biggest gains will be achieved through new algorithm approaches. Problems must be re-formulated
❖ Difficult to Program
• Automated HW/SW co-design required for efficient design process, ease of programming
• Create technology abstractions for user environments
❖ Minimal Runtime Support
• automate runtime reconfiguration to the data-path level
• Support dynamic (micro-second) reconfiguration
6
J. L. Muñoz [email protected]
National Science FoundationWhere Discoveries Begin
Office ofCyberInfrastructure
ACS: Challenges❖ Technology Immature/Risky
• Develop ad-hoc vendor independent standards for technology sharing, defense insertion and interoperability
❖ Verification
• Develop techniques to rapidly verify that each dynamic mapping (HW/SW) will perform the intended function
❖ Building Block Inadequate
• Create configurable computing components (e.g. chips, modules, boards)
• Create new system-level heterogeneous architectures; system integration support
7
J. L. Muñoz [email protected]
National Science FoundationWhere Discoveries Begin
Office ofCyberInfrastructure
Key Challenge: “Compilation” Times
uproc/DSPs
{ ASICs
{
100 101 102 103 104 105 106 107
Configurable
Logic (today) {
1 hour 1 day 1 month 1 year seconds
{
Configurable
Logic (future, if not addressed)
4 orders of magnitude speedups are
required in this area
8
J. L. Muñoz [email protected]
National Science FoundationWhere Discoveries Begin
Office ofCyberInfrastructure
Technical Approach/Challenges
9
!! FPGA granularity!
–! fine grain - gate-level!
–! coarse grain - cluster-of-gates!
–! very coarse grain - device-level!
!! ACS software!
–! algorithm/specification to hardware mapping!
–! algorithm analysis!
–! runtime reconfiguration support!
!! Assessment and ease of use!
–! application of ACS technologies to DoD challenge problems!
Approach -!
Technical Challenges !! Reconfiguration time, flexibility!
!! Development/design time and ease of use!
!! Intercept incoming missile within a compressed timeline!
!! Intercept to kill!
Enabling Technologies Current State of the Art Program Goal
Advanced Architectures Medium grain common logic blocks, 100msec reconfiguration, low clock speeds
Range of granularities, Real Time reconfiguration, architectural flexibility, competitive speeds
ACS Software Development Weeks to Months from algorithm development, disjoint tools, little reuse
Hours to Days from algorithm development, integrated tools and libraries with analysis
J. L. Muñoz [email protected]
National Science FoundationWhere Discoveries Begin
Office ofCyberInfrastructure
Technical Accomplishment
10
2nd Gen FLIR!
Utilize ACS/EHPC !
Technology to enable!
IR/ATR Processor!
ATR allowed 2 VME slots within
entire M1A2 electronics
architecture!
Insert ATR capability
into existing M1A2
Architecture!
3Qtr FY99 Demonstration!
Approach!•! Demonstration of ATR cueing for the M1A2
tank ATR mission utilizing ACS/Embedded technologies.!
•! Utilize processor testbed with mature ATR algorithms for processing of 2nd GEN FLIR image data. !
•! Work with PM-Abrams/TACOM/TARDEC & PM Night Vision on technology insertion plan.!
!! First demonstration of Army ATR cueing mission within physical requirements
!! Enable 10 Hz ATR frame rate
!! Addresses robustness of processor technology for future changes.
!! Path to retrofit current fielded & integrate within future Army ground platforms
Impact!M1A2 Top Level Architecture
J. L. Muñoz [email protected]
National Science FoundationWhere Discoveries Begin
Office ofCyberInfrastructure 11
J. L. Muñoz [email protected]
National Science FoundationWhere Discoveries Begin
Office ofCyberInfrastructure
Technical Accomplishment
12
•!Optimized for DSP and Glue Logic
•!Capable of internally storing 4 contexts
•!Contexts can be switched in ONE clock cycle
–!Data in flip-flops & LUTS can be shared between contexts
–!Two data sharing schemes (global and public/private addressable)
–!Context switch initiated either by internal logic or by external pins
–!Background loading of contexts
–!Active context can download inactive context(s)
Static Memory Cells!
User space!“User plane”! “CLB”!
Memory Plane!
Additional Memory!
Dynamically Reconfigurable (logic function
implementations changed while processing is underway )
Multiple (multiple contexts stored internally on device)
Context Switching (replacing one logic function
implementation in an FPGA with another) Device (multiple
contexts and FPGA within single device)!
CSRC Effort:!
First CSRC Prototype Device Has Been Produced!
World’s First!!!
LM - Sanders!Context Switchable
Reconfigurable Computer
J. L. Muñoz [email protected]
National Science FoundationWhere Discoveries Begin
Office ofCyberInfrastructure
Commercial ACS Board Product
13
NEW IDEAS!
•! Hybrid FPGA/DSP co-processor
architecture
•! Design methodology for FPGA/DSP
•! co-design
•! Automatic target recognition (ATR)
•! image processing kernels using hybrid
•! FPGA/DSP architecture
WASPP!
Completed Fall 1998!FUTURE !
•! Continued Involvement in ACS
community!
•! Evolution of new commercial products
within ACS community and ACS
applications!
IMPACT!
•! Formal design methodology greatly !
•! improves productivity of application !
•! designers using hybrid FPGA/DSP !
•! systems!
•! High-performance ATR system !
•! development made easier and better!
Annapolis Micro Systems!
J. L. Muñoz [email protected]
National Science FoundationWhere Discoveries Begin
Office ofCyberInfrastructure
Commercial Transitions
14
ACE2!
•!Hardware objects!•!Application design environments!•!Execution environments!
NEW IDEAS
IMPACT
GOAL To develop the enabling technologies that
allow adaptive hardware to be easily utilized by application developers!
•!Java bytecode, atomic and hardware objects!•!Adaptive computing class library and API!•!Automatic composition of library elements!
–!during application development!–!during execution!
•!Multi-threaded, multi-user execution environment for adaptive computing!
•!Scheduling, instantiation and communication of functional elements!
•!Java-based application design!
•!Faster implementation of ACS designs!•!Accelerate general purpose computing!•!10 X increase in user productivity -
reduced algorithm development and mapping time!
DEVELOPING
2/3Qtr FY99 Demonstration!
Two Primary Areas of Development!
–! Application Design (Abstractions)!–! Execution Environment (Infrastructure)!
Developed FPGA Card,
available as commercial
product, will use for
demonstrations!
ACS meets !
Java!
Commercial ACS Java Products
J. L. Muñoz [email protected]
National Science FoundationWhere Discoveries Begin
Office ofCyberInfrastructure
Electronic Countermeasures Analysis (ECMA)
15
!! Reduce a full frame to 1/2 VME Nest ( approx. 90% size reduction)
!! Exploit variable length fixed point signal processing
!! ACS-ECMA can be reconfigured as the mission changes, adapting in real-time to mission scenario
!! ACS-based ECMA will accommodate COTS upgrades as equipment and algorithm technology improves
!! Improvements to SPY-1 ECMA that otherwise would not be possible will be demonstrated
!! New ACS-ECMA architecture may transition to production radar upon navy review, demonstration, and approval
!! PMS-400 providing funding for ACS/ECMA effort
!! Potential Navy interest in a follow-on demonstration
4QFY99 Installation and Demo!
ACS technology can provide a size reduction of at least one tenth for equivalent functionality and provide system versatility to support future modifications
J. L. Muñoz [email protected]
National Science FoundationWhere Discoveries Begin
Office ofCyberInfrastructure
BenchMarks
16
GOALS:!•" Provide publicly available set of standard benchmarks
for evaluating configurable computing systems!
•" Address the entire range of issues in benchmarking, including benchmark specification, procedures, metrics, and wide availability!
Site: http://www.htc.honeywell.com/projects/acsbench!
Honeywell POC: Sanjaya Kumar, Honeywell, Minneapolis, MN !(612) 951-7107 [email protected]!
Developed Configurable Computing Stressmarks!•" Versatility - measures the ability to perform a varied computational sequence - Image
compression (2D Wavelet Transform, Quantization, Runlength encoding, and Entrophy encoding) !
•" Capacity - measures the usable reconfigurable capacity - Huffman encoding!•" Timing Sensitivity - measures the ability to implement a time-critical application - CORDIC
Algorithm!•" Scalability - measures the ability to implement an application across multiple-devices - Fast
Fourier Transform!•" Interfacing - measures the ability to operate within a heterogeneous architecture, interface to a
general-purpose and/\or application-specific processor - Continuous False Alarm Rate!•" CAD Benchmark - measure the ability to utilize/support an architecture - Boolean satisfiability!
Benchmarks will be made available at PI meeting!
J. L. Muñoz [email protected]
National Science FoundationWhere Discoveries Begin
Office ofCyberInfrastructure
Development Environments
17
Rap
id D
evel
opm
ent
Cyc
le Tools and Libraries in PlaceIntegrated C to ACS Development Environment
Khoros/MatLab
Application
Precompiled
ACS components
CPU FPGA FPGA
Compiler
Mapper
Estimator
C
VHDL/Verilog
(U. Tenn, NWU, Colo St
USC-ISI.)
(National Semi-Conductor,
Gatefield, Uwash, Lucent.)
(Rice, U. Cinn,FTL ..)
(Princeton U
Honeywell.)
(Rice, U. Tenn., U. Cinn.)
(TSI Telesys, Annapolis
MicroSystems)
Place & Route (Synopsis,USC ..)
Simulation (FTL,UTexas, ..)
J. L. Muñoz [email protected]
National Science FoundationWhere Discoveries Begin
Office ofCyberInfrastructure
load r1,xmult r1,xmult r1,aload r2,xmult r2,badd r2,cadd r1,r2store r1, y
Yet Another View of theVision
❖ y = a*x**2 + b*x + c
18
X
* * b
a +
y+c
*x2
ax2
bx
J. L. Muñoz [email protected]
National Science FoundationWhere Discoveries Begin
Office ofCyberInfrastructure
Development Environments
19
Motorola
embedded boards
(Power604)
Development Environment
(Sun and Windows)
Force 5V
(SPARC)
Algorithm in MATLAB
VME Bus in VME Chassis Ethernet
MATCH Compiler
Annapolis
Wildchild Xilinx 4010
Transtech
DSP board (TI C40)
NEW IDEAS!
!! Integrated fully automated compiler
!! Automatic parallelism detection/mapping
!! Novel algorithms for fast compilation
!! Tools for interactive evaluation
!! Application driven (e.g. STAP/ATR)
IMPACT!
!! Reduce code development times from weeks to hours
!! Compile high-level language (MATLAB programs) directly to embedded, DSP, and reconfigurable logic
!! Efficient execution (within a factor of 2-4 of manual approach) !!Satisfy throughput/latency constraints !!Satisfy code size constraints
!! 10-100X faster in compile times
Libraries and Applications Development of adaptive applications
Identification and implementation of libraries
Tools Interactive tools
Faster compilation tools Compiler Basic MATLAB Compiler
Automated parallelism and mapping
Compiler Directives
Testbed Development of
hardware/software testbed
Match: ACS Development Environments
J. L. Muñoz [email protected]
National Science FoundationWhere Discoveries Begin
Office ofCyberInfrastructure
Risk Analysis
20
Performance Tough to
program
Cost Minimal runtime
support
Adaptability Technology immature/risky
Efficiency Verification
Problem formulation
Building blocks inadequate
J. L. Muñoz [email protected]
National Science FoundationWhere Discoveries Begin
Office ofCyberInfrastructure
Things to Consider❖ Exploit uniqueness of reconfigurable computing
• bit manipulation
• variable precision arithmetic
• other?
❖ Libraries, libraries, libraries!!
❖ APIs
• CUDA, VSIPL, ??
❖ Graphical User Interfaces
• workflow representation
21
J. L. Muñoz [email protected]
National Science FoundationWhere Discoveries Begin
Office ofCyberInfrastructure
Things to Consider❖ Reconfigurable logic may actually be in a “better place”
now than it was 10 years ago
• increased acceptance of heterogeneous systems (e.g. RoadRunner)
- GPUs
• conventional components are no longer able to ride 2x clock speeds every 18-24 months
• multi-cores is driving “parallel thinking”
• increased attention to “green computing”
❖ However, having a strong DoD interest was beneficial
• embedded systems and mission requirements drivers
22
J. L. Muñoz [email protected]
National Science FoundationWhere Discoveries Begin
Office ofCyberInfrastructure
Things to Consider❖Application acceleration is only beneficial if
“cost” of achieving it is widely available and “relatively easy”… working against reconfigurable computing
• long “compile” times
• unique skills: HW/SW expertise
• problem formulation for discovery
• debugging
• flops are cheap(er)
• PRODUCTIVITY MATTERS!23
J. L. Muñoz [email protected]
National Science FoundationWhere Discoveries Begin
Office ofCyberInfrastructure
An Opportunity: Towards a Petascale Computing
Environment for Science and Engineering ❖ NSF 08-573 (replaces NSF 05-625) Track 2D
• Due 28Nov08
❖ $20M/4 awards
❖ An experimental high-performance computing system of innovative design. (Up to 5 years. Up to $12,000,000 in total budget to include development and/or acquisition, operations and maintenance, including user support. First-year budget not to exceed $4,000,000.)
• architectural design that is outside the mainstream of what is routinely available from computer vendors
24
J. L. Muñoz [email protected]
National Science FoundationWhere Discoveries Begin
Office ofCyberInfrastructure
Track 2D (cont’d)
❖ Additional information for experimental/innovative systems
• a lengthy development phase might be included
• explain why such a resource will expand the range of research projects that scientists and engineers can tackle and include some examples of science and engineering questions to which the system will be applied
• not necessary that the design of the proposed system be useful for all classes of computational science and engineering problems… but
25
J. L. Muñoz [email protected]
National Science FoundationWhere Discoveries Begin
Office ofCyberInfrastructure
Track 2D (cont’d)❖ When finally deployed, the system should be integrated into
the TeraGrid
❖ anticipated that the system, once deployed, will be an experimental TeraGrid resource, used by a smaller number of researchers than is typical for a large TeraGrid resource
❖ Up to $12,000,000 in total budget to include development and/or acquisition, operations and maintenance, including user support.
• First-year budget not to exceed $4,000,000.)
❖ Benchmarks: NSF 0605
• Proposers are also required to provide projections for additional benchmarks of their own choosing that demonstrate the unique features of the system proposed.
26
J. L. Muñoz [email protected]
National Science FoundationWhere Discoveries Begin
Office ofCyberInfrastructure
Bottom Line:Lower the Bar
27
FPG
A
Produc
tivity
J. L. Muñoz [email protected]
National Science FoundationWhere Discoveries Begin
Office ofCyberInfrastructure
Thank You
28