© 2006 mercury computer systems, inc. the cell broadband engine processor hardware, software,...
TRANSCRIPT
![Page 1: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/1.jpg)
© 2006 Mercury Computer Systems, Inc.
The Cell Broadband Engine Processor
Hardware, Software, Performance and ApplicationsJohn BrickmanDirector, Business Manager, Performance Computing Group
Aerospace & Electronic Systems Society
![Page 2: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/2.jpg)
© 2005 Mercury Computer Systems, Inc.2 © 2006 Mercury Computer Systems
Cell Chip Lives in Two Worlds
• Game console chip market Driven by “game physics” requirements, not just graphics
• Compute intensive, vector processing, floating and fixed point New consoles introduced every 5+ years, last about 10 years
• PS3 unveiled May 2005, will launch November 2006, about 6 years after PS2.
New chip architectures linked to console designs• Chip architecture unchanged during lifetime• Process shrinks targeted at lower cost and lower power
• High performance processor market Evolving architecture with backwards compatibility Piggy-back off largest volume processor platform
that is leading in performance• With affordable architecture increments to address high performance
needs Previously desktop PC, now game console
• Cell roadmap addresses both game console and high performance markets
![Page 3: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/3.jpg)
© 2005 Mercury Computer Systems, Inc.3 © 2006 Mercury Computer Systems
Mercury’s Relationship with IBM
In June 2005, Mercury announced a strategic alliance agreement
with IBM offering Mercury special access to IBM
expertise including the broadly publicized Cell technology.
Multicomputer-on-a-chip
![Page 4: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/4.jpg)
© 2005 Mercury Computer Systems, Inc.4 © 2006 Mercury Computer Systems
Cell BE Processor Block Diagram
• Cell BE processor boasts nine processors on a single die 1 Power® processor 8 vector processors
• Computational Performance 205 GFLOPS @ 3.2 GHz 410 GOPS @ 3.2 GHZ
• A high-speed data ring connects everything 205 GB/s maximum sustained bandwidth
• High performance chip interfaces 25.6 GB/s XDR main memory bandwidth
![Page 5: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/5.jpg)
© 2005 Mercury Computer Systems, Inc.5 © 2006 Mercury Computer Systems
• Standalone vector processor 128 bit SIMD model 128 registers each 128 bits wide
• AltiVec/VMX has only 32 registers, SSE3 only eight
• 256KB local store Load/store instructions can
access only local store
• Memory flow controller DMA engine built into each SPE SPE includes DMA instructions
for explicitly moving data between local store and main memory
• Performance Dual issue Two- to sixteen-way SIMD 25.6 GFLOPS (single precision), 51 GOPS (8 bit)
Synergistic Processing Element
![Page 6: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/6.jpg)
© 2005 Mercury Computer Systems, Inc.6 © 2006 Mercury Computer Systems
SPE 128 Bit SIMD Engine
• Operates on 128 bit vector registers 2 x 64 bits (DP float) 4 x 32 bits (SP float or integer) 8 x 16 bits (integer) 16 x 8 bits (integer)
• Example: Floating point multiply add 4 x 32 bit fma instruction can
complete eight floating point operations (FLOPS) every cycle
128 bits
fma vr, v1, v2, v3
v1
v2
v3
vr
X
+
X
+
X
+
X
+
![Page 7: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/7.jpg)
© 2005 Mercury Computer Systems, Inc.7 © 2006 Mercury Computer Systems
• 64-bit Power® core with complete AltiVec™/VMX
• High frequency
• Low power consumption
• Hardware multi-threading
• L2 is 512 KB
• Can use any SPE’s DMA engine
Power® Processing Element
Altivec is a registered trademark of Freescale Semiconductor Corp.
![Page 8: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/8.jpg)
© 2005 Mercury Computer Systems, Inc.8 © 2006 Mercury Computer Systems
Why is Cell So Fast?
• The SPE is a very fast, very lean core SPE (3.2 GHz) is up to 3 times faster than the fastest
Pentium core (3.6 GHz) when computing FFTs That’s 24X better performance chip to chip
• Huge internal chip bandwidth 205 GB/s sustained ring bandwidth 25.6 GB/s main memory bandwidth
• High performance DMA DMA can be fully overlapped with SPE computation Software controlled DMAs can bring exactly the right
data into local store at the right time
![Page 9: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/9.jpg)
© 2005 Mercury Computer Systems, Inc.9 © 2006 Mercury Computer Systems
![Page 10: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/10.jpg)
© 2005 Mercury Computer Systems, Inc.10 © 2006 Mercury Computer Systems
![Page 11: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/11.jpg)
© 2005 Mercury Computer Systems, Inc.11 © 2006 Mercury Computer Systems
![Page 12: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/12.jpg)
© 2006 Mercury Computer Systems, Inc.
Mercury Cell Hardware Products
![Page 13: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/13.jpg)
© 2005 Mercury Computer Systems, Inc.13 © 2006 Mercury Computer Systems
Mercury Cell Related Roadmap
2006 2007 2008
3Q 4Q 1Q 2Q 3Q 4Q 1Q 2QBlades
1U Servers
Dual Cell Based Blade2 BE, 2 SouthBridges, 1GB XDR
Dual Cell Based Blade 3Single slot, 2 BE, 2 Comp. Chips,
up to 32GB DDR2
Dual Cell Based Blade 2Single slot, 2 BE, 2 Comp. Chips,
4GB XDR+DDR2
Dual Cell Based Server 2 BE 2 Southbridges, 1GB XDR
Dual Cell Based Server 32 BE, 2 Comp. Chips,
up to 32GB DDR2
Embedded
PowerBlock™200 ½ ATR Concept
1 BE, 1 Companion Chip, 4 GB DDR2, 1GB XDR
Rugged
TurismoChassis Concept
ATCA Blade Concept1 BE, 1 Companion Chip, 4 GB DDR2
1GB XDR
VITA 46 / 48Concept PowerStreamTM
Concept
Dual Cell Based Server 2 2 BE, 2 Comp. Chips
4GB XDR+DDR2
CAB PCIe Add-In Card1 BE, 1 Companion Chip, 4 GB DDR2, 1GB XDR
![Page 14: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/14.jpg)
© 2005 Mercury Computer Systems, Inc.14 © 2006 Mercury Computer Systems
Dual Cell Based Blade
• Flexible blade solution based on the Cell BE processor Outstanding performance for HPC
applications Designed for distributed processing Cell-optimized software available About 11 TFLOPS in 5 feet of rack height
• Dual-width BladeCenterTM blade• Two PCI Express x4 expansion
slots Initially supports only Infiniband cards
• Evaluation units available sinceDecember 2005
• Production October 2006
![Page 15: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/15.jpg)
© 2005 Mercury Computer Systems, Inc.15 © 2006 Mercury Computer Systems
Dual Cell Based Blade Block Diagram
3.2 GHzCell
Processor
South-bridge
512 MB XDR DRAM
Power
3.2 GHzCell
Processor512 MB XDR DRAM
Power
BladeCenterMidplane
Connector
GbE
GbE
InfinibandDaughtercard
InfinibandDaughtercard
PCI Express x4
PCI Express x4
25.6 GB/s
25.6 GB/s
2.5 GB/seach way
20 GB/s each way
Serial Port
South-bridge
2.5 GB/seach way
![Page 16: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/16.jpg)
© 2005 Mercury Computer Systems, Inc.16 © 2006 Mercury Computer Systems
Cell Blade Systems
Complete 19” rack-based systems• 25U (42.75” high)
Up to 14 blades, 5.7 TFLOPS• 42U (73.5”) chassis
Up to 28 blades, 11.5 TFLOPS• Multi-rack systems scalable using Infiniband
and GbE
Cell Technology Evaluation System• Complete turn-key Cell HW & SW system• 25U rack• One Dual Cell-Based Blade
All components included to support expansion to 7 blade system
• MultiCore Plus SDK One year subscription to production SW
Monitor and keyboardSerial line concentratorXeon based Linux serverExternal GbE switchBladeCenter chassisPower distribution
25U 14-Blade System
front rear
![Page 17: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/17.jpg)
© 2005 Mercury Computer Systems, Inc.17 © 2006 Mercury Computer Systems
1U Dual-Cell Based Server
• Hardware Dual Cell processors at 3.2 GHz 1 GB of XDR DRAM Integrated dual Gigabit Ethernet Serial port Dual full size PCI Express x4 slots
• Initially supports only Infiniband cards
• Software Toolchain
• Native (PPE hosted)• Cross (x86 hosted)
GUI via X-Windows over GbE• No direct keyboard / video / mouse support
• Production Q1 2007
![Page 18: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/18.jpg)
© 2005 Mercury Computer Systems, Inc.18 © 2006 Mercury Computer Systems
Cell Companion Chip
• Under design by IBM since May 2005 With significant design input from Mercury
• First parts began preliminary testing June 2006• Second spin for production in December 2006
Cell BE Interface5 GB/s
GbE
GbE
UA
RT
GPI
OPC
I-X
Low latency, high capacity mailbox
Multichannel, striding DMA engine
DDR2 controllers• 5 GB/s each• Up to 4 GB each
PCIe 16x interfacesEach configurable:•8x, 4x, 2x and 1x•Endpoint or root complex
Cell BE Interface• 5 GB/s each way• Extends Cell global address
space to PCIe, DDR2 etc.• Non-coherent (non-cached)
DMA
Mailbox
405 PPC
PCIe 16x
PCIe 16x
DDR2 667 MHz DDR2 667 MHz
![Page 19: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/19.jpg)
© 2005 Mercury Computer Systems, Inc.19 © 2006 Mercury Computer Systems
Dual Cell Based Blade 2
CompanionChip1 GB XDR DRAM
PCIe x16 / PCI-X Daughtercard
Power
BladeCenter HHigh Speed
Daughtercard
CompanionChip1 GB XDR DRAM
PCIe x16 / PCI-X Daughtercard
Power
PCIe 16x
PCIe 16x
IB 4
x IB
4x
2 PC
Ie 8
x
One-Slot Processor Blade
One-Slot I/O Expansion Blade
GbE
GbE
3.2 GHzCell
Processor
3.2 GHzCell
Processor
25.6 GB/s
25.6 GB/s
5 GB/seach way
20 GB/s each way
5 GB/seach way
2-8 GB DDR2
2-8 GB DDR2
• Single slot blade Up to twice the
density
• Uses new companion chip Up to 10x I/O
bandwidth
• DDR2 I/O buffer memory
• Production available Q3 2007
![Page 20: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/20.jpg)
© 2005 Mercury Computer Systems, Inc.20 © 2006 Mercury Computer Systems
Dual Cell Based Blade 3 Concept
CompanionChip8-16 GB DDR2
PCIe / PCI-X x16 Daughtercard
Power
1-2 GB DDR2
CompanionChip8-16 GB DDR2
PCIe / PCI-X x16 Daughtercard
Power
2 IB
4x
2 IB
4xOne-Slot Processor Blade
One-Slot I/O Expansion Blade
GbE
GbE
1-2 GB DDR2
• Improved SPE double precision performance
• Expanded memoryDDR2
replaces XDR
• Production available Q1 2008
BladeCenter HHigh Speed
Daughtercard
PCIe 16x
PCIe 16x
2 PC
Ie 8
x3.2 GHzCell
Processor
3.2 GHzCell
Processor
25.6 GB/s
25.6 GB/s
5 GB/seach way
5 GB/seach way
20 GB/s each way
![Page 21: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/21.jpg)
© 2005 Mercury Computer Systems, Inc.21 © 2006 Mercury Computer Systems
1U Dual-Cell Based Server 2
• 1U solution using based on companion chip• Dual 3.2 GHz Cell processors• Memory
2 GB of XDR 4-16 GB of DDR2
• I/O Daughtercard site options under
consideration• PCI-E and PCI-X customer options
Dual GigE Dual IB 4x
• Production available Q3 2007
![Page 22: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/22.jpg)
© 2005 Mercury Computer Systems, Inc.22 © 2006 Mercury Computer Systems
1U Dual-Cell Based Server 3 Concept
• 1U solution with enhanced memory capacity• Dual 3.2 GHz Cell processors• Memory
16-32 GB of DDR2 Main memory is now DDR2 DIMMs 1-2 GB of DDR2 per companion chip
for IO buffering
• I/O PCIe / PCI-X daughtercards Dual GigE Dual IB 4x
• Production available Q1 2008
![Page 23: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/23.jpg)
© 2005 Mercury Computer Systems, Inc.23 © 2006 Mercury Computer Systems
Cell Accelerator Board
• PCI Express™ accelerator card compatible with high-end workstations
• More than 180 GFLOPS on a desktop
• 1 GB of XDR and 4GB of DDR2• Gigabit Ethernet on end bracket
• Internal prototype boards with FPGA bridge received July 2006
• Boards with the prototype bridge silicon received September 2006
• Volume production of boards Q1 2007
![Page 24: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/24.jpg)
© 2005 Mercury Computer Systems, Inc.24 © 2006 Mercury Computer Systems
Cell Accelerator Board Block Diagram
CompanionChip
4 GB DDR2
2.8 GHzCell
Processor
8 GB/s
1 GB XDR DRAM
22 GB/s
![Page 25: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/25.jpg)
© 2006 Mercury Computer Systems, Inc.
Software is the Key to Harnessing Cell Performance!
•Mercury’s MultiCore Plus SDK
![Page 26: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/26.jpg)
© 2005 Mercury Computer Systems, Inc.26 © 2006 Mercury Computer Systems
Cell BE Processor Architecture
• Resembles distributed memory multiprocessor with explicit DMA over a fabric
![Page 27: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/27.jpg)
© 2005 Mercury Computer Systems, Inc.27 © 2006 Mercury Computer Systems
Mercury Multi-DSP Board (1996)
![Page 28: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/28.jpg)
© 2005 Mercury Computer Systems, Inc.28 © 2006 Mercury Computer Systems
Programming Cell: What’s Good and What’s Hard
No second guessing about cache replacement algorithm
Very deterministic pipeline 128 registers mask pipeline
latency very well
DMA has negligible impact on SPE local store bandwidth
Generous ring bandwidth means topology is seldom an issue
Standard Power® core
Burden on software to get code and data into local store
Local store is small compared to ring latency
Branch prediction is manual and very restricted
128 byte alignment necessary for best performance
XDR bandwidth is a bottleneck Cell chips linked in coherent
mode increases latency
Performance is modest
SPE
Ring and XDR
PPE
Good Hard
![Page 29: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/29.jpg)
© 2005 Mercury Computer Systems, Inc.29 © 2006 Mercury Computer Systems
How Much Faster Is Cell?
Relative performance of Cell and leading general purpose processors
32
56
30
13
25
1.0 1.0 1.0 1.0 1.01.5 2.1 1.90.9 1.3 1.0
1.8 2.01.0
0
10
20
30
40
50
60
1K point FFT 8K point FFT 64K point FFT 15x15 16-bit filter 15x15 8-bit filter
Re
lati
ve
Pe
rfo
rma
nc
e
Cell BE 3.2 GHz
Freescale 744x 975 MHz
Pentium 3.6 GHz 2MB L2
Opteron 2.4 GHz
PPC 970 2.0 GHz
Single precision complex FFTs Symmetric image filters
• Performance relative to 1GHz Freescale 744x (i.e. Freescale = 1)
• In all cases, we are comparing Mercury optimized Cell algorithm implementations with the best available (Mercury or 3rd party) implementations on other processors
• Did not compare with dual core x86 processors
![Page 30: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/30.jpg)
© 2005 Mercury Computer Systems, Inc.30 © 2006 Mercury Computer Systems
Goals for Programming Cell
• Achieve high performance: The only reason for choosing Cell
• Ease of programming: An important aspect of this is programmer portability
• Code Portability Important for large legacy code bases written in C/C+
+, Fortran And new code developed for Cell should be portable
to current and anticipated multiprocessor architectures
![Page 31: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/31.jpg)
© 2005 Mercury Computer Systems, Inc.31 © 2006 Mercury Computer Systems
Linux OS
• Linux on Cell patches released by IBM Linux Technology Center Kernel Version 2.6.17 libspe version 1.1 Built and tested with Fedora Core 5 distribution IBM LTC releases packages through Barcelona Supercomputing Center to
official kernel websitewww.bsc.es/projects/deepcomputing/linuxoncell/
• Mercury works closely with IBM Linux team on performance optimization Linux now able to acheive maximum hardware performance
possible on Dual Cell-Based Blade NUMA support, PPE affinity, SPE affinity, 64KB and 16MB page support
• Mercury uses Terra Soft Solutions Y-HPC Distribution Mercury contracted TSS to port to Y-HPC to the Dual Cell Based Blade Distributions are tested and supported on Mercury hardware Mercury assists TSS with driver development
• GbE, uDAPL, Infiniband
![Page 32: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/32.jpg)
© 2006 Mercury Computer Systems, Inc.
The MultiCore Plus SDK
•MultiCore Framework (MCF)•Scientific Algorithm Library (SAL)•MultiCore Plus IDE•TATL•SPEAK
![Page 33: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/33.jpg)
© 2005 Mercury Computer Systems, Inc.33 © 2006 Mercury Computer Systems
Mercury Approach to Programming Cell
• Very pragmatic Can’t wait for tools to mature Develop our own tools when it makes sense
• Emphasis on explicitly programming the architecture rather than trying to hide it When the tools are immature, this allows us to get
maximum performance
• Achieve ease-of-use and portability through function offload model Run legacy code on PPE Offload compute intensive workload to SPEs
![Page 34: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/34.jpg)
© 2005 Mercury Computer Systems, Inc.34 © 2006 Mercury Computer Systems
MultiCore Framework
• An API for programming heterogeneous multicores that contain explicit non-cached memory hierarchies
• Provides an abstract view of the hardware oriented toward computation of multidimensional data sets
• First implementation is for the Cell BE processor
![Page 35: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/35.jpg)
© 2005 Mercury Computer Systems, Inc.35 © 2006 Mercury Computer Systems
MCF Abstractions
• Function offload model Worker Teams: Allocate tasks to SPEs Plug-ins: Dynamically load and unload functions
from within worker programs
• Data movement Distribution Objects: Defining how n-dimensional data is
organized in memory Tile Channels: Move data between SPEs and main
memory Re-org Channels: Move data among SPEs Multibuffering: Overlap data movement and computation
• Miscellaneous Barrier and semaphore synchronization DMA-friendly memory allocator DMA convenience functions Performance profiling
![Page 36: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/36.jpg)
© 2005 Mercury Computer Systems, Inc.36 © 2006 Mercury Computer Systems
MCF Abstractions
• Function offload model Worker Teams: Allocate tasks to SPEs Plug-ins: Dynamically load and unload functions
from within worker programs
• Data movement Distribution Objects: Defining how n-dimensional data is
organized in memory Tile Channels: Move data between SPE and main
memory Re-org Channels: Move data among SPEs Multibuffering: Overlap data movement and computation
• Miscellaneous Barrier and semaphore synchronization DMA-friendly memory allocator DMA convenience functions Performance profiling
![Page 37: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/37.jpg)
© 2005 Mercury Computer Systems, Inc.37 © 2006 Mercury Computer Systems
MCF Distribution Objects
One complete data set in main memory
Frame
• Distribution Object parameters: Number of dimensions Frame size Tile size and tile overlap Array indexing order Compound data type organization (e.g. split / interleaved) Partitioning policy across workers, including partition overlap
![Page 38: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/38.jpg)
© 2005 Mercury Computer Systems, Inc.38 © 2006 Mercury Computer Systems
MCF Distribution Objects
• Distribution Object parameters: Number of dimensions Frame size Tile size and tile overlap Array indexing order Compound data type organization (e.g. split / interleaved) Partitioning policy across workers, including partition overlap
One complete data set in main memory
Unit of work for an SPE
Tile
Frame
![Page 39: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/39.jpg)
© 2005 Mercury Computer Systems, Inc.39 © 2006 Mercury Computer Systems
MCF Partition Assignment
• Distribution Object parameters: Number of dimensions Frame size Tile size and tile overlap Array indexing order Compound data type organization (e.g. split / interleaved) Partitioning policy across workers, including partition overlap
Partitions
SPE 0
SPE 1
SPE 2
![Page 40: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/40.jpg)
© 2005 Mercury Computer Systems, Inc.40 © 2006 Mercury Computer Systems
MCF Tile Channels
• Distribution Object parameters: Number of dimensions Frame size Tile size and tile overlap Array indexing order Compound data type organization (e.g. split / interleaved) Partitioning policy across workers, including partition overlap
Partitions
SPE 0
SPE 1
SPE 2
Tile Channel
![Page 41: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/41.jpg)
© 2005 Mercury Computer Systems, Inc.41 © 2006 Mercury Computer Systems
manager (PPE) generates data set and injects it into input tile channel
input tile channel subdivides data set into tiles
each worker (SPE) extract tiles out of input tile channel ...
... computes on input tiles to produce output tiles...
...and inserts them into output tile channel
output tile channel automatically puts tiles into correct location in output data set
when output data set is complete, manager is notified and extracts data set
manager
worker 1
worker 2
worker 3
input tile channel
output tile channel
MCF Tile Channels
![Page 42: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/42.jpg)
© 2005 Mercury Computer Systems, Inc.42 © 2006 Mercury Computer Systems
MCF Manager Programmain(int argc, char **argv) {
mcf_m_net_create();mcf_m_net_initialize();
mcf_m_net_add_task();mcf_m_team_run_task();
mcf_m_tile_distribution_create_3d(“in”);mcf_m_tile_distribution_set_partition_overlap(“in”);mcf_m_tile_distribution_create_3d(“out”);
mcf_m_tile_channel_create(“in”); mcf_m_tile_channel_create(“out”);
mcf_m_tile_channel_connect(“in”);mcf_m_tile_channel_connect(“out”);
mcf_m_tile_channel_get_buffer(“in”);
// fill input data here
mcf_m_tile_channel_put_buffer(“in”);mcf_m_tile_channel_get_buffer(“out”);
// process output data here}
Add worker tasks
Specify data organization
Create and connectto tile channels
Get empty source buffer
Fill it with data
Send it to workers
Wait for results from workers
![Page 43: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/43.jpg)
© 2005 Mercury Computer Systems, Inc.43 © 2006 Mercury Computer Systems
MCF Worker Program
mcf_w_main (int n_bytes, void * p_arg_ls) {mcf_w_tile_channel_create(“in”);mcf_w_tile_channel_create(“out”);mcf_w_tile_channel_connect(“in”);mcf_w_tile_channel_connect(“out”);
while (! mcf_w_tile_channel_is_end_of_channel(“in”) {
mcf_w_tile_channel_get_buffer(“in”);
mcf_w_tile_channel_get_buffer(“out”);
// Do math here
mcf_w_tile_channel_put_buffer(“in”);
mcf_w_tile_channel_put_buffer(“out”);}
}
Create and connectto tile channels
Get full source buffer
Put back empty source buffer
Put back fulldestination buffer
Get empty destination bufferDo math and fill
destination buffer
![Page 44: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/44.jpg)
© 2005 Mercury Computer Systems, Inc.44 © 2006 Mercury Computer Systems
MCF Implementation
• Consists of PPE library SPE library and tiny executive (12 KB)
• Utilizes Cell Linux “libspe” support But amortizes expensive system calls Reduces overhead from milliseconds to microseconds Provides faster and smaller footprint memory allocation library
• Based on Data Reorg standard http://www.data-re.org
• Derived from existing Mercury technologies Other Mercury RDMA-based middleware DSP product experience with small footprint, non-cached
architectures
![Page 45: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/45.jpg)
© 2005 Mercury Computer Systems, Inc.45 © 2006 Mercury Computer Systems
Radar SonarMedical Imaging
Signals IntelligenceDefense Imaging
Semiconductor Inspection
SAL Primary Markets
![Page 46: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/46.jpg)
© 2005 Mercury Computer Systems, Inc.46 © 2006 Mercury Computer Systems
Scientific Algorithm Library
• SAL is a collection of optimized functions Baseline
• Arithmetic, data type conversions, data moves DSP
• FFTs, convolutions, correlation, filters, etc. Linear Algebra
• Linear systems, matrix decomposition, etc. Parallel Algorithms (future)
• High level algorithms on multiple cores• Invoked from application running on PPE• Automatically use one or more SPEs• Initial work done for 1D and 2D FFTs and fast convolutions
• PIXL – Image Processing Library• Edge detection, fixed point operations and analysis, filtering, manipulation,
erosion, dilation, histogram, lookup tables, etc.• Work in this area depend on customer demand.
• PPE SAL based on Altivec optimizations for G4 and G4A2 SAL C source code version also available
• SPE SAL is new implementation optimized for SPE architecture Backwards compatibility with existing SAL API except in very rare cases Some new APIs needed in order to extract best performance from SPE Static and plug-in component versions for each function
![Page 47: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/47.jpg)
© 2005 Mercury Computer Systems, Inc.47 © 2006 Mercury Computer Systems
Eclipse Framework
• Provides an open platform for creating an Integrated Development Environment (IDE)
• Eclipse Consortium manages continuous development of the tool
• Eclipse plug-ins extend the functionality of the framework
• Written in Java
• Compilers, debuggers, TATL, helpfiles, etc. are all be Eclipse plug-ins.
![Page 48: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/48.jpg)
© 2005 Mercury Computer Systems, Inc.48 © 2006 Mercury Computer Systems
Mercury MultiCore Plus IDE
• PPE and SPE cross build support for Gcc/gcc++ XLC/C++
• Eclipse CDT (C/C++ Development Toolkit) Syntax highlighting Code completion Content assistance Makefile generation Remote debugging of PPE and SPE applications TATL plug-in
![Page 49: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/49.jpg)
© 2005 Mercury Computer Systems, Inc.49 © 2006 Mercury Computer Systems
TATL™ Trace Analysis Tool
• Log events from PPE & SPE threads across multiple Cell chips
• Synchronized global timestamps
• Minimally intrusive in space and time
• Timeline trace and histogram viewers
• Structured log file for use in other tools
![Page 50: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/50.jpg)
© 2005 Mercury Computer Systems, Inc.50 © 2006 Mercury Computer Systems
SPE Assembly Development Kit (SPE-ADK)
• The SPE architecture encourages “bare metal programmers” Very deterministic architecture Performance benefits from hand tuning the pipelines
• SPE-ADK dramatically improves bare metal productivity• SPE-ADK consists of
Assembler preprocessor, optimizer and macro library
• Using SPE-ADK is similar to programming with SPE C extensions But with more deterministic control of instruction scheduling and hardware
resources
• SPE-ADK is a productized version of the internal development tool used by all Mercury SAL developers
![Page 51: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/51.jpg)
© 2005 Mercury Computer Systems, Inc.51 © 2006 Mercury Computer Systems
SPE-ADK Features
• Alignment of instructions for the even and odd pipelines of the SPU
• Automatic insertion of nop's and lnop's or instruction swapping to maintain dual dispatch
• Alignment of loops to minimize instruction fetching overhead
• Register assignment. It automatically: Finds symbolic register operands, Assigns registers to symbols to
minimize register usage, Eliminates bugs from inconsistent
register assignment.
• Mapping of register usage, both active line number extents per symbol, and active hardware registers per line
• Analysis of stall cycles due to register dependencies
• Optional C emulation for assembly development allows C-like debugging facilities
Hardware independence for assembly code,
Setting breakpoints at source line numbers,
Displaying source code rather than disassembling the object code,
Displaying register contents by symbol.
• Detection of errors to preclude bugs:
Inconsistent manual register assignment,
Write-only variables, Uninitialized variables, Updated but unused variables.
![Page 52: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/52.jpg)
© 2005 Mercury Computer Systems, Inc.52 © 2006 Mercury Computer Systems
Software Summary
• The Cell BE processor can achieve one to two orders of magnitude performance improvement over current general purpose processors Lean SPE core saves space and power And makes it easier for software to approach peak performance
• Cell is a distributed memory multiprocessor on a chip Prior experience on these architectures translates easily to Cell
• But for most programmers, Cell is a new architecture Successful adoption by programmers is Cell’s biggest challenge And the history of other new processor architectures is not
encouraging
• We need a range of tools that span the continuum from ease-of-use to high performance
![Page 53: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/53.jpg)
© 2006 Mercury Computer Systems, Inc.
Markets for Cell
•Aerospace and Defense•Semiconductor•Medical Imaging•Oil and Gas•Visualization
![Page 54: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/54.jpg)
© 2005 Mercury Computer Systems, Inc.54 © 2006 Mercury Computer Systems
Sales & Marketing Progress for Cell
Very Active• Semiconductor inspection – active sales engagements;
prototypes sold• Medical imaging – active sales engagements; prototypes sold• Semiconductor lithography – active sales engagements;
prototypes sold.• Defense signal & image processing – active sales
engagements; prototypes sold• Oil & Gas exploration – active sales engagements; prototypes
sold• Video transcoding – active sales engagementsLess Active for Mercury• Financial modeling (IBM)• Gaming• Animation & rendering• Defense simulation for training (specialized gaming)
![Page 55: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/55.jpg)
© 2005 Mercury Computer Systems, Inc.55 © 2006 Mercury Computer Systems
Summary
• Mercury has been developing computing solutions for applications well suited for Cell technology for many years.
• Cell technology represents a significant performance breakthrough similar to historical programming models.
• Customers can leverage Cell technology through Mercury to achieve: Unbiased assessment of risks and applicability of
deploying Cell-based solutions. Significant improvements in performance and
bandwidth for certain applications compared to conventional processors
![Page 56: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/56.jpg)
© 2005 Mercury Computer Systems, Inc.56 © 2006 Mercury Computer Systems
For More Information
(866) 627-6951 (US)(978) 967-1401 (International)
E-mail: [email protected]
Web: www.mc.com/cell
![Page 57: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/57.jpg)
© 2006 Mercury Computer Systems, Inc.
Backup Slides
![Page 58: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/58.jpg)
© 2006 Mercury Computer Systems, Inc.
Semiconductor DFM Requirements
![Page 59: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/59.jpg)
© 2005 Mercury Computer Systems, Inc.59 © 2006 Mercury Computer Systems
Moore’s Law Irrelevant
• Processing requirements of semiconductor industry are increasing at an even faster rate
• Driven by: Increased feature density Increased complexity of processing due to sub-
wavelength physics Tool specific features
Year 1 Year 2 Year 3 Year 4
Moore’s Law
ProcessingRequirements
4X4X
12X12X
Processing needs outpace mainstream computing as data rates and algorithm
complexity increase
![Page 60: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/60.jpg)
© 2005 Mercury Computer Systems, Inc.60 © 2006 Mercury Computer Systems
OPC/RET/DFM – The need for speed
CHALLENGES • Reduce OPC cycle times from days/weeks to hours
• Simulation models that ensure a mask will work when printed• Computing goes up by an order of magnitude at every design node (e.g. 65nm to 45nm)
• Resolution Enhancement Technologies (RET)
• Optical Proximity Correction (OPC)• Phase Shift Masks (PSM)• Off-axis Illumination (OAI)
• Design for Manufacturing (DFM)
Quotes from top chip designers:
“It takes 8 days with 500 nodes to do OPC on a single chip layer … and we need it to
be 10 to 100 times faster”
“We have 10,000 blades to do RET”• WYSIWYG no more
Source: AMD
![Page 61: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/61.jpg)
© 2005 Mercury Computer Systems, Inc.61 © 2006 Mercury Computer Systems
Cost of Ownership
• System sizes to do RET and Lithography simulation are expanding to the 1000s of 1U servers
• Dense racks of servers are expensive to maintain Cost of electricity to power computers Cost of capital infrastructure for electricity
delivery Cost of electricity to power HVAC
systems Cost of capital infrastructure for HVAC Challenge of managing air flow
![Page 62: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/62.jpg)
© 2005 Mercury Computer Systems, Inc.62 © 2006 Mercury Computer Systems
Cost of Ownership
• A rack of 84 such servers Costs $10K+ per year to power Comparable amounts for HVAC and capital costs
• Operators of data centers now see power and cooling costs as more significant than cost of computing hardware
• A single dual processor server Consumes 250-400 Watts Costs $100-200/year just to power
(at $.05/kWh) Comparable amounts for HVAC
and capital costs
![Page 63: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/63.jpg)
© 2005 Mercury Computer Systems, Inc.63 © 2006 Mercury Computer Systems
Processing Efficiency
• The metric of performance per dollar must be expanded to include not just the cost of the hardware but also the lifetime cost of operating the computer system
• Performance/Watt, which used to just be a metric for the embedded and defense industry, is now important for commercial customers as well
![Page 64: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/64.jpg)
© 2005 Mercury Computer Systems, Inc.64 © 2006 Mercury Computer Systems
Summary
• Cell processor technology provides: Order-of-magnitude improvement in computing
performance per processor for OPC/RET applications Significant improvement in performance per Watt Significant performance breakthrough for other critical
computationally intensive applications
• The right software infrastructure is critical for: Taking full advantage of specialized processing units Partitioning application among heterogeneous group or
processing cores Parallelizing application among multiple processing
nodes
• Cell can significantly improve OPC/RET turnaround time
![Page 65: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/65.jpg)
© 2006 Mercury Computer Systems, Inc.
Ray Tracing
•Mercury Computer Systems•Visualization and Sciences Group
![Page 66: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/66.jpg)
© 2005 Mercury Computer Systems, Inc.66 © 2006 Mercury Computer Systems
What is Ray Tracing?
Computer Graphics Rendering Technique which mathematically simulates rays of light
Capable of producing photo-realistic images
Used in a variety of markets
Automotive, aerospace and marine virtual prototyping
Architecture
Industrial Design
Digital Content Creation in film and video
![Page 67: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/67.jpg)
© 2005 Mercury Computer Systems, Inc.67 © 2006 Mercury Computer Systems
Basic Technique
For each pixel in the screen, send out a ray of light from the viewpoint.
Check every object in the scene and check for intersection.
If the ray does not intersect an object, set pixel to background color
If the ray does intersect an object, set the pixel color to the first object it intersects
![Page 68: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/68.jpg)
© 2005 Mercury Computer Systems, Inc.68 © 2006 Mercury Computer Systems
More Advanced Technique
![Page 69: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/69.jpg)
© 2005 Mercury Computer Systems, Inc.69 © 2006 Mercury Computer Systems
Characteristics of Ray Tracing
• Simulating the Physics of Light• Simulates light transport by following “photons”• Fully parallel: just as nature• Demand-driven: start from the camera• Correctly orders rendering effects (per pixel !!)• Can account for all global effects• All effects are orthogonal to each other• Makes content design easy and fast
• Requires very large amount of CPU in order to be interactive
• Driven by intersection calculations• Every ray checked against all objects• Each secondary ray becomes a primary ray in a recursive
algorithm• 800 x 600 screen, 3 light sources, 50 opaque objects
requires 600 billion intersection tests!
![Page 70: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/70.jpg)
© 2005 Mercury Computer Systems, Inc.70 © 2006 Mercury Computer Systems
Challenges Implementing on Cell
• In-order instruction access and SIMD Must carefully optimize instructions to avoid stalls Must parallelize code to take advantage of SIMD
instructions
• Memory Access DMA engines must move data into LS from XDR Hiding latency requires overlapped I/O and
processing (DMA read latency is a few hundered clock cycles)
Even more challenging for irregular data access
• Mapping to 8 SPEs Mapping algorithm very important with Cell
architecture
![Page 71: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/71.jpg)
© 2005 Mercury Computer Systems, Inc.71 © 2006 Mercury Computer Systems
Linear Speed-up Across SPEs
![Page 72: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/72.jpg)
© 2005 Mercury Computer Systems, Inc.72 © 2006 Mercury Computer Systems
Results
Frames per Second (Normalized to 2.4 GHz Opteron)
2.4 GHz x86 7.2 3.0 2.5
2.4 GHz SPE 7.4 (+3%) 2.6 (-13%) 1.9 (-24%)
2.4 GHz Cell 58.1 (8x) 20 (6.6x) 16.2 (6.4x)
2.4 GHz Dual Cell 110.9 (15.4x) 37.3 (12.4x) 30.6 (12.2x)
3.2 GHz Cell 67.8 (9.4x) 23.2 (7.7x) 18.9 (7.5x)
![Page 73: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/73.jpg)
© 2005 Mercury Computer Systems, Inc.73 © 2006 Mercury Computer Systems
What is OpenRTRT from Mercury?
• Highly optimized ray tracing rendering engine
• Enabling high-quality rendering at interactive frame rate
• Supports large model visualisation
• Complements GPU OpenGL-based rendering Realism and rendering effects Quality and accuracy Capacity for large models Performance scalability
with multiple CPUs and clusters
![Page 74: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/74.jpg)
© 2005 Mercury Computer Systems, Inc.74 © 2006 Mercury Computer Systems
OpenRTRT: Real-Time Ray Tracing
• Recognized as outstanding, breakthrough technologyCutting edge research and dramatic optimizations achieved by U. Saarland and inTrace: cache & data layout optimization, parallelization - SIMD/SSE,
multi-threading, distribution… Interactive even on a PC, enough for preparation work for instance
• Scalable performances with multiple CPUs Allow fully interactive visualization Performance depends linearly on the number of pixels, rays and
processors Logarithmic in scene size (20Mio triangles guaranteed)
• Available for Linux on x86, x86-64, and IA64 and Windows 32
![Page 75: © 2006 Mercury Computer Systems, Inc. The Cell Broadband Engine Processor Hardware, Software, Performance and Applications John Brickman Director, Business](https://reader038.vdocuments.us/reader038/viewer/2022110100/56649dd95503460f94acecca/html5/thumbnails/75.jpg)
© 2005 Mercury Computer Systems, Inc.75 © 2006 Mercury Computer Systems
Background
2000 Start of research at the University of Saarland
2001 Presentation of the first scientific results
2002 Initial projects with the Automotive industrySimulation of Ray Tracing hardware
2003 Foundation of inTrace GmbHVolkswagen AG as first customer (VR – Lab)
2004 New project visualization center at Wolfsburg based on Ray Tracing. First Ray tracing hardware prototype
2005 Projects with basically all German car manufacturers:VW, Audi, BMW, DaimlerChrysler + Airbus, Boeing, …First design of fully programmable chip for Ray Tracing
2005 Exclusive agreement for worldwide distribution with Mercury Computer Systems