title page cell broadband engine programming handbook · cell broadband engine programming handbook...

884
Cell Broadband Engine Programming Handbook Including the PowerXCell 8i Processor Version 1.11 May 12, 2008 Title Page

Upload: ngotu

Post on 14-Aug-2019

215 views

Category:

Documents


0 download

TRANSCRIPT

  • Cell Broadband Engine

    Programming Handbook

    Including the PowerXCell 8i Processor

    Version 1.11

    May 12, 2008

    Title Page

  • ®

    Copyright and Disclaimer© Copyright International Business Machines Corporation, Sony Computer Entertainment Incorporated, Toshiba Corpora-tion 2006, 2008.

    All Rights ReservedPrinted in the United States of America May 2008

    IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occur-rence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trade-marks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark infor-mation” at www.ibm.com/legal/copytrade.shtml

    Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom.

    Intel is a registered trademark of Intel Corporation or its subsidiaries in the United States and other countries.

    Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.

    Linux is a trademark of Linus Torvalds in the United States, other countries, or both.

    UNIX is a registered trademark of The Open Group in the United States and other countries.

    Other company, product, and service names may be trademarks or service marks of others.

    All information contained in this document is subject to change without notice. The products described in this document are NOT intended for use in applications such as implantation, life support, or other hazardous uses where malfunction could result in death, bodily injury, or catastrophic property damage. The information contained in this document does not affect or change IBM product specifications or warranties. Nothing in this document shall operate as an express or implied license or indemnity under the intellectual property rights of IBM or third parties. All information contained in this docu-ment was obtained in specific environments, and is presented as an illustration. The results obtained in other operating environments may vary.

    THE INFORMATION CONTAINED IN THIS DOCUMENT IS PROVIDED ON AN “AS IS” BASIS. In no event will IBM be liable for damages arising directly or indirectly from any use of the information contained in this document.

    IBM Systems and Technology Group2070 Route 52, Bldg. 330Hopewell Junction, NY 12533-6351

    The IBM home page can be found at ibm.com®The IBM semiconductor solutions home page can be found at ibm.com/chips

    Version 1.11May 12, 2008

    http://www.ibm.comhttp://www.ibm.com/chipshttp://www.ibm.com/legal/copytrade.shtml

  • Programming Handbook

    Cell Broadband Engine

    Version 1.11May 12, 2008

    ContentsPage 3 of 884

    Contents

    List of Figures ............................................................................................................... 19

    List of Tables ................................................................................................................. 23

    Preface ........................................................................................................................... 29Related Publications ............................................................................................................................. 29Conventions and Notation ..................................................................................................................... 30Referencing Registers, Fields, and Bit Ranges .................................................................................... 31Terminology .......................................................................................................................................... 32Reserved Regions of Memory and Registers ....................................................................................... 32

    Revision Log ................................................................................................................. 33

    1. Overview of CBEA Processors ................................................................................ 391.1 Background ..................................................................................................................................... 40

    1.1.1 Motivation .............................................................................................................................. 401.1.2 Power, Memory, and Frequency ........................................................................................... 421.1.3 Scope of this Handbook ........................................................................................................ 42

    1.2 Hardware Environment ................................................................................................................... 441.2.1 The Processor Elements ....................................................................................................... 441.2.2 Element Interconnect Bus ..................................................................................................... 441.2.3 Memory Interface Controller .................................................................................................. 451.2.4 Cell Broadband Engine Interface Unit ................................................................................... 45

    1.3 Programming Environment ............................................................................................................. 461.3.1 Instruction Sets ...................................................................................................................... 461.3.2 Storage Domains and Interfaces ........................................................................................... 461.3.3 Byte Ordering and Bit Numbering .......................................................................................... 481.3.4 Runtime Environment ............................................................................................................ 49

    2. PowerPC Processor Element ................................................................................... 512.1 PowerPC Processor Unit ................................................................................................................ 522.2 PowerPC Processor Storage Subsystem ....................................................................................... 542.3 PPE Registers ................................................................................................................................. 542.4 PowerPC Instructions ...................................................................................................................... 57

    2.4.1 Data Types ............................................................................................................................ 572.4.2 Addressing Modes ................................................................................................................. 572.4.3 Instructions ............................................................................................................................ 58

    2.5 Vector/SIMD Multimedia Extension Instructions ............................................................................. 592.5.1 SIMD Vectorization ................................................................................................................ 592.5.2 Data Types ............................................................................................................................ 612.5.3 Addressing Modes ................................................................................................................. 612.5.4 Instruction Types ................................................................................................................... 612.5.5 Instructions ............................................................................................................................ 622.5.6 Graphics Rounding Mode ...................................................................................................... 62

  • Programming Handbook

    Cell Broadband Engine

    ContentsPage 4 of 884

    Version 1.11May 12, 2008

    2.6 Vector/SIMD Multimedia Extension C/C++ Language Intrinsics ..................................................... 622.6.1 Vector Data Types ................................................................................................................. 622.6.2 Vector Literals ........................................................................................................................ 632.6.3 Intrinsics ................................................................................................................................. 63

    3. Synergistic Processor Elements .............................................................................. 653.1 Synergistic Processor Unit .............................................................................................................. 65

    3.1.1 Local Storage ......................................................................................................................... 663.1.2 Register File ........................................................................................................................... 693.1.3 Execution Units ...................................................................................................................... 703.1.4 Floating-Point Support ........................................................................................................... 70

    3.2 Memory Flow Controller .................................................................................................................. 723.2.1 Channels ................................................................................................................................ 743.2.2 Mailboxes and Signalling ....................................................................................................... 743.2.3 MFC Commands and Command Queues .............................................................................. 743.2.4 Direct Memory Access Controller .......................................................................................... 753.2.5 Synergistic Memory Management Unit .................................................................................. 76

    3.3 SPU Instruction Set ......................................................................................................................... 763.3.1 Data Types ............................................................................................................................. 763.3.2 Instructions ............................................................................................................................. 77

    3.4 SPU C/C++ Language Intrinsics ..................................................................................................... 773.4.1 Vector Data Types ................................................................................................................. 783.4.2 Vector Literals ........................................................................................................................ 783.4.3 Intrinsics ................................................................................................................................. 78

    4. Virtual Storage Environment .................................................................................... 794.1 Introduction ...................................................................................................................................... 794.2 PPE Memory Management ............................................................................................................. 80

    4.2.1 Memory Management Unit ..................................................................................................... 814.2.2 Address-Translation Sequence .............................................................................................. 824.2.3 Enabling Address Translation ................................................................................................ 834.2.4 Effective-to-Real-Address Translation ................................................................................... 834.2.5 Segmentation ......................................................................................................................... 854.2.6 Paging .................................................................................................................................... 874.2.7 Translation Lookaside Buffer ................................................................................................. 934.2.8 Real Addressing Mode ......................................................................................................... 1004.2.9 Effective Addresses in 32-Bit Mode ..................................................................................... 103

    4.3 SPE Memory Management ........................................................................................................... 1034.3.1 Synergistic Memory Management Unit ................................................................................ 1034.3.2 Enabling Address Translation .............................................................................................. 1044.3.3 Segmentation ....................................................................................................................... 1054.3.4 Paging .................................................................................................................................. 1084.3.5 Translation Lookaside Buffer ............................................................................................... 1084.3.6 Real Addressing Mode ......................................................................................................... 1174.3.7 Exception Handling and Storage Protection ........................................................................ 118

    5. Memory Map ............................................................................................................. 1215.1 Introduction .................................................................................................................................... 121

    5.1.1 Configuration-Ring Initialization ........................................................................................... 123

  • Programming Handbook

    Cell Broadband Engine

    Version 1.11May 12, 2008

    ContentsPage 5 of 884

    5.1.2 Allocated Regions of Memory .............................................................................................. 1235.1.3 Reserved Regions of Memory ............................................................................................. 1265.1.4 The Guarded Attribute ......................................................................................................... 126

    5.2 PPE Memory Map ......................................................................................................................... 1265.2.1 PPE Memory-Mapped Registers ......................................................................................... 1265.2.2 Predefined Real-Address Locations .................................................................................... 127

    5.3 SPE Memory Map ......................................................................................................................... 1275.3.1 SPE Local-Storage Memory Map ........................................................................................ 1285.3.2 SPE Memory-Mapped Registers ......................................................................................... 129

    5.4 BEI Memory-Mapped Registers .................................................................................................... 1305.4.1 I/O ........................................................................................................................................ 131

    6. Cache Management ................................................................................................ 1336.1 PPE Caches .................................................................................................................................. 133

    6.1.1 Configuration ....................................................................................................................... 1346.1.2 Overview of PPE Cache ...................................................................................................... 1346.1.3 L1 Caches ........................................................................................................................... 1366.1.4 Branch History Table and Link Stack .................................................................................. 1416.1.5 L2 Cache ............................................................................................................................. 1416.1.6 Instructions for Managing the L1 and L2 Caches ................................................................ 1466.1.7 Effective-to-Real-Address Translation Arrays ..................................................................... 1506.1.8 Translation Lookaside Buffer ............................................................................................... 1506.1.9 Instruction-Prefetch Queue Management ............................................................................ 1506.1.10 Load Subunit Management ............................................................................................... 150

    6.2 SPE Caches .................................................................................................................................. 1516.2.1 Translation Lookaside Buffer ............................................................................................... 1516.2.2 Atomic Unit and Cache ........................................................................................................ 151

    6.3 Replacement Management Tables ............................................................................................... 1546.3.1 PPE TLB Replacement Management Table ........................................................................ 1546.3.2 PPE L2 Replacement Management Table .......................................................................... 1576.3.3 SPE TLB Replacement Management Table ........................................................................ 158

    6.4 I/O Address-Translation Caches ................................................................................................... 159

    7. I/O Architecture ....................................................................................................... 1617.1 Overview ....................................................................................................................................... 161

    7.1.1 I/O Interfaces ....................................................................................................................... 1617.1.2 System Configurations ........................................................................................................ 1627.1.3 I/O Addressing ..................................................................................................................... 164

    7.2 Data and Access Types ................................................................................................................ 1657.2.1 Data Lengths and Alignments ............................................................................................. 1657.2.2 Atomic Accesses ................................................................................................................. 166

    7.3 Registers and Data Structures ...................................................................................................... 1667.3.1 IOCmd Configuration Register ............................................................................................ 1667.3.2 I/O Segment Table Origin Register ..................................................................................... 1667.3.3 I/O Segment Table .............................................................................................................. 1697.3.4 I/O Page Table .................................................................................................................... 1717.3.5 IOC Base Address Registers ............................................................................................... 1747.3.6 I/O Exception Status Register ............................................................................................. 176

  • Programming Handbook

    Cell Broadband Engine

    ContentsPage 6 of 884

    Version 1.11May 12, 2008

    7.4 I/O Address Translation ................................................................................................................. 1767.4.1 Translation Overview ........................................................................................................... 1767.4.2 Translation Steps ................................................................................................................. 178

    7.5 I/O Exceptions ............................................................................................................................... 1807.5.1 I/O Exception Causes .......................................................................................................... 1807.5.2 I/O Exception Status Register .............................................................................................. 1817.5.3 I/O Exception Mask Register ............................................................................................... 1817.5.4 I/O-Exception Response ...................................................................................................... 181

    7.6 I/O Address-Translation Caches ................................................................................................... 1817.6.1 IOST Cache ......................................................................................................................... 1817.6.2 IOPT Cache ......................................................................................................................... 183

    7.7 I/O Storage Model ......................................................................................................................... 1887.7.1 Memory Coherence ............................................................................................................. 1887.7.2 Storage-Access Ordering ..................................................................................................... 1897.7.3 I/O Accesses to Other I/O Units through an IOIF ................................................................. 1947.7.4 Examples ............................................................................................................................. 195

    8. Resource Allocation Management ......................................................................... 2038.1 Introduction .................................................................................................................................... 2038.2 Requesters .................................................................................................................................... 206

    8.2.1 PPE and SPEs ..................................................................................................................... 2068.2.2 I/O ........................................................................................................................................ 206

    8.3 Managed Resources ..................................................................................................................... 2078.4 Tokens ........................................................................................................................................... 208

    8.4.1 Tokens Required for Single-CBEA-Processor Systems ...................................................... 2088.4.2 Operations Requiring No Token .......................................................................................... 2128.4.3 Tokens Required for Multi-CBEA-Processor Systems ......................................................... 213

    8.5 Token Manager ............................................................................................................................. 2138.5.1 Request Tracking ................................................................................................................. 2138.5.2 Token Granting .................................................................................................................... 2148.5.3 Unallocated RAG ................................................................................................................. 2158.5.4 High-Priority Token Requests .............................................................................................. 2168.5.5 Memory Tokens ................................................................................................................... 2168.5.6 I/O Tokens ........................................................................................................................... 2208.5.7 Unused Tokens .................................................................................................................... 2208.5.8 Memory Banks, IOIF Allocation Rates, and Unused Tokens ............................................... 2208.5.9 Token Request and Grant Example ..................................................................................... 2218.5.10 Allocation Percentages ...................................................................................................... 2258.5.11 Efficient Determination of TKM Priority Register Values .................................................... 2268.5.12 Feedback from Resources to Token Manager ................................................................... 228

    8.6 Configuration of PPE, SPEs, MIC, and IOC .................................................................................. 2298.6.1 Configuration Register Summary ......................................................................................... 2298.6.2 SPE Address-Range Checking ............................................................................................ 231

    8.7 Changing Resource-Management Registers with MMIO Stores ................................................... 2338.7.1 Changes to the RAID ........................................................................................................... 2338.7.2 Changing a Requester’s Token-Request Enable ................................................................. 2348.7.3 Changing a Requester’s Address Map ................................................................................ 2358.7.4 Changing a Requester’s Use of Multiple Tokens per Access .............................................. 236

  • Programming Handbook

    Cell Broadband Engine

    Version 1.11May 12, 2008

    ContentsPage 7 of 884

    8.7.5 Changing Feedback to the TKM .......................................................................................... 2368.7.6 Changing TKM Registers .................................................................................................... 236

    8.8 Latency Between Token Requests and Token Grants .................................................................. 2378.9 Hypervisor Interfaces .................................................................................................................... 237

    9. PPE Interrupts ......................................................................................................... 2399.1 Introduction ................................................................................................................................... 2399.2 Summary of Interrupt Architecture ................................................................................................ 2409.3 Interrupt Registers ......................................................................................................................... 2449.4 Interrupt Handling .......................................................................................................................... 2459.5 Interrupt Vectors and Definitions ................................................................................................... 246

    9.5.1 System Reset Interrupt (Selectable or x‘00..00000100’) ..................................................... 2489.5.2 Machine Check Interrupt (x‘00..00000200’) ......................................................................... 2499.5.3 Data Storage Interrupt (x‘00..00000300’) ............................................................................ 2519.5.4 Data Segment Interrupt (x‘00..00000380’) .......................................................................... 2529.5.5 Instruction Storage Interrupt (x‘00..00000400’) ................................................................... 2539.5.6 Instruction Segment Interrupt (x‘00..00000480’) ................................................................. 2549.5.7 External Interrupt (x‘00..00000500’) .................................................................................... 2549.5.8 Alignment Interrupt (x‘00..00000600’) ................................................................................. 2559.5.9 Program Interrupt (x‘00..00000700’) .................................................................................... 2569.5.10 Floating-Point Unavailable Interrupt (x‘00..00000800’) ..................................................... 2579.5.11 Decrementer Interrupt (x‘00..00000900’) ........................................................................... 2579.5.12 Hypervisor Decrementer Interrupt (x‘00..00000980’) ........................................................ 2589.5.13 System Call Interrupt (x‘00..00000C00’) ............................................................................ 2589.5.14 Trace Interrupt (x‘00..00000D00’) ...................................................................................... 2599.5.15 VXU Unavailable Interrupt (x‘00..00000F20’) .................................................................... 2609.5.16 System Error Interrupt (x‘00..00001200’) .......................................................................... 2609.5.17 Maintenance Interrupt (x‘00..00001600’) ........................................................................... 2619.5.18 Thermal Management Interrupt (x‘00..00001800’) ............................................................ 263

    9.6 Direct External Interrupts .............................................................................................................. 2659.6.1 Interrupt Presentation .......................................................................................................... 2659.6.2 IIC Interrupt Registers ......................................................................................................... 2669.6.3 SPU and MFC Interrupts ..................................................................................................... 2719.6.4 Other External Interrupts ..................................................................................................... 272

    9.7 Mediated External Interrupts ......................................................................................................... 2769.7.1 Mediated External Interrupt Architecture ............................................................................. 2769.7.2 Mediated External Interrupt Implementation ........................................................................ 279

    9.8 SPU and MFC Interrupts Routed to the PPE ................................................................................ 2809.8.1 Interrupt Types and Classes ................................................................................................ 2809.8.2 Interrupt Registers ............................................................................................................... 2829.8.3 Interrupt Definitions ............................................................................................................. 2869.8.4 Handling SPU and MFC Interrupts ...................................................................................... 289

    9.9 Thread Targets for Interrupts ........................................................................................................ 2919.10 Interrupt Priorities ........................................................................................................................ 2919.11 Interrupt Latencies ...................................................................................................................... 2939.12 Machine State Register Settings Due to Interrupts ..................................................................... 2939.13 Interrupts and Hypervisor ............................................................................................................ 2959.14 Interrupts and Multithreading ...................................................................................................... 295

  • Programming Handbook

    Cell Broadband Engine

    ContentsPage 8 of 884

    Version 1.11May 12, 2008

    9.15 Checkstop ................................................................................................................................... 2959.16 Use of an External Interrupt Controller ........................................................................................ 2969.17 Relationship Between CBEA Processor and PowerPC Interrupts .............................................. 296

    10. PPE Multithreading ................................................................................................ 29910.1 Multithreading Guidelines ............................................................................................................ 29910.2 Thread Resources ....................................................................................................................... 301

    10.2.1 Registers ............................................................................................................................ 30110.2.2 Arrays, Queues, and Other Structures ............................................................................... 30210.2.3 Pipeline Sharing and Support for Multithreading ............................................................... 303

    10.3 Thread States .............................................................................................................................. 30510.3.1 Privilege States .................................................................................................................. 30510.3.2 Suspended or Enabled State ............................................................................................. 30610.3.3 Blocked or Stalled State ..................................................................................................... 306

    10.4 Thread Control and Status Registers .......................................................................................... 30610.4.1 Machine State Register (MSR) ............................................................................................. 30710.4.2 Hardware Implementation Register 0 (HID0) ...................................................................... 30810.4.3 Logical Partition Control Register (LPCR) ............................................................................ 30910.4.4 Control Register (CTRL) ...................................................................................................... 31010.4.5 Thread Status Register Local and Remote (TSRL and TSRR) .............................................. 31110.4.6 Thread Switch Control Register (TSCR) .............................................................................. 31210.4.7 Thread Switch Time-Out Register (TTR) ............................................................................. 313

    10.5 Thread Priority ............................................................................................................................. 31310.5.1 Thread-Priority Combinations ............................................................................................ 31310.5.2 Choosing Useful Thread Priorities ..................................................................................... 31410.5.3 Examples of Priority Combinations on Instruction Scheduling ........................................... 316

    10.6 Thread Control and Configuration ............................................................................................... 31910.6.1 Resuming and Suspending Threads .................................................................................. 31910.6.2 Setting the Instruction-Dispatch Policy: Thread Priority and Temporary Stalling ............... 31910.6.3 Preventing Starvation: Forward-Progress Monitoring ........................................................ 32110.6.4 Multithreading Operating-State Switch .............................................................................. 322

    10.7 Pipeline Events and Instruction Dispatch .................................................................................... 32210.7.1 Instruction-Dispatch Rules ................................................................................................. 32210.7.2 Pipeline Events that Stall Instruction Dispatch ................................................................... 323

    10.8 Suspending and Resuming Threads ........................................................................................... 32510.8.1 Suspending a Thread ......................................................................................................... 32510.8.2 Resuming a Thread ........................................................................................................... 32510.8.3 Exception and Interrupt Interactions With a Suspended Thread ....................................... 32710.8.4 Thread Targets and Behavior for Interrupts ....................................................................... 328

    11. Logical Partitions and a Hypervisor .................................................................... 33111.1 Introduction .................................................................................................................................. 331

    11.1.1 The Hypervisor and the Operating Systems ...................................................................... 33211.1.2 Partitioning Resources ....................................................................................................... 33211.1.3 An Example Flowchart ....................................................................................................... 334

    11.2 PPE Logical-Partitioning Facilities ............................................................................................... 33611.2.1 Enabling Hypervisor State ................................................................................................. 33611.2.2 Hypervisor-State Registers ................................................................................................ 336

  • Programming Handbook

    Cell Broadband Engine

    Version 1.11May 12, 2008

    ContentsPage 9 of 884

    11.2.3 Controlling Real Memory ................................................................................................... 33711.2.4 Controlling Interrupts and Environment ............................................................................. 343

    11.3 SPE Logical-Partitioning Facilities .............................................................................................. 34611.3.1 Access Privilege ................................................................................................................ 34611.3.2 Memory-Management Facilities ........................................................................................ 34711.3.3 Controlling Interrupts ......................................................................................................... 34911.3.4 Other SPE Management Facilities .................................................................................... 349

    11.4 I/O-Address Translation .............................................................................................................. 35111.4.1 IOC Memory Management Units ....................................................................................... 35111.4.2 I/O Segment and Page Tables .......................................................................................... 351

    11.5 Resource Allocation Management .............................................................................................. 35211.5.1 Combining Logical Partitions with Resource Allocation ..................................................... 35211.5.2 Resource Allocation Groups and the Token Manager ....................................................... 352

    11.6 Power Management .................................................................................................................... 35311.6.1 Entering Low-Power States ............................................................................................... 35311.6.2 Thread State Suspension and Resumption ....................................................................... 353

    11.7 Fault Isolation .............................................................................................................................. 35411.8 Code Sample .............................................................................................................................. 354

    11.8.1 Error Codes and Hypervisor-Call (hcall) Tokens ............................................................... 35411.8.2 C Functions for PowerPC 64-bit ELF Hypervisor Call ....................................................... 354

    12. SPE Context Switching ........................................................................................ 35712.1 Introduction ................................................................................................................................. 35712.2 Data Structures ........................................................................................................................... 358

    12.2.1 Local Storage Context Save Area ..................................................................................... 35812.2.2 Context Save Area ............................................................................................................ 358

    12.3 Overview of SPE Context-Switch Sequence ............................................................................... 35812.3.1 Save SPE Context ............................................................................................................. 36012.3.2 Restore SPE Context ........................................................................................................ 360

    12.4 Implementation Considerations ................................................................................................... 36212.4.1 Locking .............................................................................................................................. 36212.4.2 Watchdog Timers .............................................................................................................. 36212.4.3 Waiting for Events ............................................................................................................. 36212.4.4 PPE’s SPU Channel Access Facility ................................................................................. 36212.4.5 SPE Interrupts ................................................................................................................... 36212.4.6 Suspending the MFC DMA Queue .................................................................................... 36312.4.7 SPE Context-Save Sequence and Context-Restore Sequence Code .............................. 36312.4.8 SPE Parameter Passing .................................................................................................... 36312.4.9 Storage for SPE Context-Save Sequence and Context-Restore Sequence Code ............ 36312.4.10 Harvesting an SPE .......................................................................................................... 36412.4.11 Scheduling ....................................................................................................................... 36412.4.12 Light-Weight SPE Context Save ...................................................................................... 364

    12.5 Detailed Steps for SPE Context Switch ...................................................................................... 36512.5.1 Context-Save Sequence .................................................................................................... 36512.5.2 Context-Restore Sequence ............................................................................................... 371

    12.6 Considerations for Hypervisors ................................................................................................... 379

  • Programming Handbook

    Cell Broadband Engine

    ContentsPage 10 of 884

    Version 1.11May 12, 2008

    13. Time Base and Decrementers .............................................................................. 38113.1 Introduction .................................................................................................................................. 38113.2 Time-Base Facility ....................................................................................................................... 381

    13.2.1 Clock Domains ................................................................................................................... 38113.2.2 Time-Base Registers ......................................................................................................... 38213.2.3 Time-Base Frequency ........................................................................................................ 38313.2.4 Time-Base Sync Mode Controls ........................................................................................ 38413.2.5 Reading and Writing the TB Register ................................................................................ 38813.2.6 Computing Time-of-Day ..................................................................................................... 389

    13.3 Decrementers .............................................................................................................................. 38913.3.1 PPE Decrementers ............................................................................................................ 38913.3.2 SPE Decrementers ............................................................................................................ 39013.3.3 Using an SPU Decrementer to Monitor SPU Code Performance ...................................... 391

    14. Objects, Executables, and SPE Loading ............................................................. 39714.1 Introduction .................................................................................................................................. 39714.2 ELF Overview and Extensions .................................................................................................... 398

    14.2.1 Overview ............................................................................................................................ 39814.2.2 SPE-ELF Extensions ......................................................................................................... 399

    14.3 Runtime Initializations and Requirements ................................................................................... 40114.3.1 PPE Initial Machine State .................................................................................................. 40114.3.2 SPE Initial Machine State for Linux .................................................................................... 405

    14.4 Linker Requirements ................................................................................................................... 40714.4.1 SPE Linker Requirements .................................................................................................. 40714.4.2 PPE Linker Requirements .................................................................................................. 408

    14.5 The CESOF Format .................................................................................................................... 40814.5.1 CESOF Overview ............................................................................................................... 40914.5.2 CESOF Use Convention of ELF ........................................................................................ 40914.5.3 Embedding an SPE-ELF Executable in a PPE-ELF Object: The .spu.elf Section ......... 41014.5.4 The spe_program_handle Data Structure ........................................................................... 41114.5.5 The TOE: Accessing Symbol Values Defined in EA Space ............................................... 41314.5.6 Future Software Tool Chain Enhancements for CESOF ................................................... 417

    14.6 SPE Runtime Loader ................................................................................................................... 41814.6.1 Runtime Loader Overview ................................................................................................. 41814.6.2 SPE Runtime Loader Requirements .................................................................................. 41914.6.3 Example SPE Runtime Loader Framework Definition ....................................................... 421

    14.7 SPE Execution Environment ....................................................................................................... 42714.7.1 Signal Types for the SPE Stop-and-Signal Instruction ...................................................... 427

    15. Power and Thermal Management ........................................................................ 42915.1 Power Management .................................................................................................................... 429

    15.1.1 Slow State .......................................................................................................................... 43015.1.2 PPE Pause (0) State .......................................................................................................... 43115.1.3 SPU Pause State ............................................................................................................... 43215.1.4 MFC Pause State ............................................................................................................... 432

    15.2 Thermal Management ................................................................................................................. 43215.2.1 Thermal-Management Operation ....................................................................................... 43315.2.2 Configuration-Ring Settings ............................................................................................... 43515.2.3 Thermal Registers .............................................................................................................. 435

  • Programming Handbook

    Cell Broadband Engine

    Version 1.11May 12, 2008

    ContentsPage 11 of 884

    15.2.4 Thermal Sensor Status Registers ...................................................................................... 43515.2.5 Thermal Sensor Interrupt Registers .................................................................................. 43615.2.6 Dynamic Thermal-Management Registers ........................................................................ 438

    16. Performance Monitoring ...................................................................................... 44316.1 How It Works ............................................................................................................................... 44416.2 Events (Signals) .......................................................................................................................... 44416.3 Performance Counters ................................................................................................................ 44416.4 Trace Array ................................................................................................................................. 445

    17. SPE Channel and Related MMIO Interface ......................................................... 44717.1 Introduction ................................................................................................................................. 447

    17.1.1 An SPE’s Use of its Own Channels ................................................................................... 44717.1.2 Access to Channel Functions by the PPE and other SPEs ............................................... 44817.1.3 Channel Characteristics .................................................................................................... 44817.1.4 Channel Summary ............................................................................................................. 44917.1.5 Channel Instructions .......................................................................................................... 45217.1.6 Channel Capacity and Blocking ......................................................................................... 453

    17.2 SPU Event-Management Channels ............................................................................................ 45317.3 SPU Signal-Notification Channels ............................................................................................... 45417.4 SPU Decrementer ....................................................................................................................... 454

    17.4.1 SPU Write Decrementer Channel ...................................................................................... 45417.4.2 SPU Read Decrementer Channel ..................................................................................... 455

    17.5 MFC Write Multisource Synchronization Request Channel ........................................................ 45517.6 SPU Read Machine Status Channel ........................................................................................... 45617.7 SPU Write State Save-and-Restore Channel ............................................................................. 45617.8 SPU Read State Save-and-Restore Channel ............................................................................. 45717.9 MFC Command Parameter Channels ......................................................................................... 457

    17.9.1 MFC Local Storage Address Channel ............................................................................... 45917.9.2 MFC Effective Address High Channel ............................................................................... 46017.9.3 MFC Effective Address Low or List Address Channel ....................................................... 46017.9.4 MFC Transfer Size or List Size Channel ........................................................................... 46117.9.5 MFC Command Tag Identification Channel ...................................................................... 46217.9.6 MFC Class ID and MFC Command Opcode Channel ....................................................... 463

    17.10 MFC Tag-Group Management Channels .................................................................................. 46317.10.1 MFC Write Tag-Group Query Mask Channel .................................................................. 46417.10.2 MFC Read Tag-Group Query Mask Channel .................................................................. 46417.10.3 MFC Write Tag Status Update Request Channel ............................................................ 46417.10.4 MFC Read Tag-Group Status Channel ........................................................................... 46617.10.5 MFC Read List Stall-and-Notify Tag Status Channel ...................................................... 46617.10.6 MFC Write List Stall-and-Notify Tag Acknowledgment Channel ..................................... 467

    17.11 MFC Read Atomic Command Status Channel .......................................................................... 46817.12 SPU Mailbox Channels ............................................................................................................. 469

    18. SPE Events ............................................................................................................ 47118.1 Introduction ................................................................................................................................. 47118.2 Events and Event-Management Channels .................................................................................. 472

    18.2.1 Event Conditions and Bit Definitions for Event-Management Channels ............................ 472

  • Programming Handbook

    Cell Broadband Engine

    ContentsPage 12 of 884

    Version 1.11May 12, 2008

    18.2.2 Pending Event Register (Internal, SPE-Hidden) ................................................................ 47318.2.3 SPU Read Event Status ..................................................................................................... 47418.2.4 SPU Write Event Mask ...................................................................................................... 47518.2.5 SPU Write Event Acknowledgment .................................................................................... 47518.2.6 SPU Read Event Mask ...................................................................................................... 476

    18.3 SPU Interrupt Facility .................................................................................................................. 47618.4 Interrupt Address Save-and-Restore Channels .......................................................................... 477

    18.4.1 SPU Read State Save-and-Restore .................................................................................. 47718.4.2 SPU Write State Save-and-Restore ................................................................................... 47718.4.3 Nested Interrupts Using SPU Write State Save-and-Restore ............................................ 477

    18.5 Event-Handling Protocols ............................................................................................................ 47818.5.1 Synchronous Event Handling Using Polling or Stalling ...................................................... 47818.5.2 Asynchronous Event Handling Using Interrupts ................................................................ 47918.5.3 Protecting Critical Sections from Interruption ..................................................................... 480

    18.6 Event-Specific Handling Guidelines ............................................................................................ 48118.6.1 Protocol with Multiple Events Enabled ............................................................................... 48118.6.2 Procedure for Handling the Multisource Synchronization Event ........................................ 48318.6.3 Procedure for Handling the Privileged Attention Event ...................................................... 48418.6.4 Procedure for Handling the Lock-Line Reservation Lost Event ......................................... 48518.6.5 Procedure for Handling the Signal-Notification 1 Available Event ..................................... 48618.6.6 Procedure for Handling the Signal-Notification 2 Available Event ..................................... 48718.6.7 Procedure for Handling the SPU Write Outbound Mailbox Available Event ...................... 48818.6.8 Procedure for Handling the SPU Write Outbound Interrupt Mailbox Available Event ........ 48918.6.9 Procedure for Handling the SPU Decrementer Event ........................................................ 48918.6.10 Procedure for Handling the SPU Read Inbound Mailbox Available Event ....................... 49118.6.11 Procedure for Handling the MFC SPU Command Queue Available Event ...................... 49218.6.12 Procedure for Handling the DMA List Command Stall-and-Notify Event ......................... 49218.6.13 Procedure for Handling the Tag-Group Status Update Event .......................................... 494

    18.7 Developing a Basic Interrupt Handler .......................................................................................... 49518.7.1 Basic Interrupt Protocol Features and Design ................................................................... 49518.7.2 FLIH Design ....................................................................................................................... 49618.7.3 SLIH Design and Registering SLIH Functions ................................................................... 49818.7.4 Example Application Code ................................................................................................. 500

    18.8 Nested Interrupt Handling ........................................................................................................... 50118.8.1 Nested Handler Design ...................................................................................................... 50218.8.2 FLIH Design for Nested Interrupts ..................................................................................... 502

    18.9 Using a Dedicated Interrupt Stack ............................................................................................... 50418.10 Sample Applications .................................................................................................................. 506

    18.10.1 SPU Decrementer Event .................................................................................................. 50618.10.2 Tag-Group Status Update Event ...................................................................................... 50718.10.3 DMA List Command Stall-and-Notify Event ..................................................................... 50818.10.4 MFC SPU Command Queue Available Event .................................................................. 51018.10.5 SPU Read Inbound Mailbox Available Event ................................................................... 51118.10.6 SPU Signal-Notification Available Event .......................................................................... 51118.10.7 Lock-Line Reservation Lost Event ................................................................................... 51118.10.8 Privileged Attention Event ................................................................................................ 512

  • Programming Handbook

    Cell Broadband Engine

    Version 1.11May 12, 2008

    ContentsPage 13 of 884

    19. DMA Transfers and Interprocessor Communication ......................................... 51319.1 Introduction ................................................................................................................................. 51319.2 MFC Commands ......................................................................................................................... 514

    19.2.1 DMA Commands ............................................................................................................... 51619.2.2 DMA List Commands ......................................................................................................... 51819.2.3 Synchronization Commands .............................................................................................. 51819.2.4 Command Modifiers .......................................................................................................... 51919.2.5 Tag Groups ........................................................................................................................ 51919.2.6 MFC Command Issue ........................................................................................................ 52119.2.7 Replacement Class ID and Transfer Class ID ................................................................... 52119.2.8 DMA-Command Completion .............................................................................................. 522

    19.3 PPE-Initiated DMA Transfers ...................................................................................................... 52319.3.1 MFC Command Issue ........................................................................................................ 52319.3.2 MFC Command-Queue Control Registers ........................................................................ 52519.3.3 DMA-Command Issue Status and Errors .......................................................................... 525

    19.4 SPE-Initiated DMA Transfers ...................................................................................................... 52919.4.1 MFC Command Issue ........................................................................................................ 53019.4.2 MFC Command-Queue Monitoring Channels ................................................................... 53119.4.3 DMA Command Issue Status and Errors .......................................................................... 53219.4.4 DMA List Command Example ........................................................................................... 536

    19.5 Performance Guidelines for MFC Commands ............................................................................ 53919.6 Mailboxes .................................................................................................................................... 539

    19.6.1 Reading and Writing Mailboxes ......................................................................................... 54019.6.2 Mailbox Blocking ................................................................................................................ 54119.6.3 Dealing with Anticipated Messages ................................................................................... 54119.6.4 Uses of Mailboxes ............................................................................................................. 54119.6.5 SPU Outbound Mailboxes ................................................................................................. 54219.6.6 SPU Inbound Mailbox ........................................................................................................ 547

    19.7 Signal Notification ....................................................................................................................... 55119.7.1 SPU Signalling Channels .................................................................................................. 55119.7.2 Uses of Signaling ............................................................................................................... 55219.7.3 Mode Configuration ........................................................................................................... 55219.7.4 SPU Signal Notification 1 Channel .................................................................................... 55319.7.5 SPU Signal Notification 2 Channel .................................................................................... 55319.7.6 Sending Signals ................................................................................................................. 55319.7.7 Receiving Signals .............................................................................................................. 55619.7.8 Differences Between Mailboxes and Signal Notification ................................................... 559

    20. Shared-Storage Synchronization ........................................................................ 56120.1 Shared-Storage Ordering ............................................................................................................ 561

    20.1.1 Storage Model ................................................................................................................... 56120.1.2 PPE Ordering Instructions ................................................................................................. 56420.1.3 SPU Ordering Instructions ................................................................................................. 56820.1.4 MFC Ordering Mechanisms ............................................................................................... 57220.1.5 MFC Multisource Synchronization Facility ......................................................................... 57720.1.6 Scenarios for Using Ordering Mechanisms ....................................................................... 584

    20.2 PPE Atomic Synchronization ...................................................................................................... 58520.2.1 Atomic Synchronization Instructions .................................................................................. 585

  • Programming Handbook

    Cell Broadband Engine

    ContentsPage 14 of 884

    Version 1.11May 12, 2008

    20.2.2 PPE Synchronization Primitives ......................................................................................... 58720.2.3 SPE Synchronization Primitives ......................................................................................... 590

    20.3 SPE Atomic Synchronization ....................................................................................................... 59720.3.1 MFC Commands for Atomic Updates ................................................................................ 59720.3.2 The MFC Read Atomic Command Status Channel ........................................................... 59920.3.3 Avoiding Livelocks ............................................................................................................. 59920.3.4 Synchronization Primitives ................................................................................................. 601

    21. Parallel Programming ........................................................................................... 60921.1 Challenges .................................................................................................................................. 60921.2 Patterns of Parallel Programming ............................................................................................... 609

    21.2.1 Terminology ....................................................................................................................... 61021.2.2 Finding Parallelism ............................................................................................................. 61121.2.3 Strategies for Parallel Programming .................................................................................. 612

    21.3 Steps for Parallelizing a Program ................................................................................................ 61421.3.1 Step 1: Understand the Problem ........................................................................................ 61421.3.2 Step 2: Choose Programming Tools and Technology ....................................................... 61421.3.3 Step 3: Develop High-Level Parallelization Strategy ......................................................... 61521.3.4 Step 4: Develop Low-Level Parallelization Strategy .......................................................... 61521.3.5 Step 5: Design Data Structures for Efficient Processing .................................................... 61521.3.6 Step 6: Iterate and Refine .................................................................................................. 61621.3.7 Step 7: Fine-Tune .............................................................................................................. 616

    21.4 Levels of Parallelism in the CBEA Processors ............................................................................ 61721.4.1 SIMD Parallelization ........................................................................................................... 61821.4.2 Superscalar Parallelization ................................................................................................ 61821.4.3 Hardware Multithreading .................................................................................................... 61821.4.4 Multiple Execution Units ..................................................................................................... 61821.4.5 Multiple CBEA Processors ................................................................................................. 619

    21.5 Tools for Parallelization ............................................................................................................... 62021.5.1 Language Extensions: Intrinsics and Directives ................................................................ 62021.5.2 Compiler Support for Single Shared-Memory Abstraction ................................................. 62121.5.3 OpenMP Directives ............................................................................................................ 62121.5.4 Compiler-Controlled Software Cache ................................................................................ 62321.5.5 Compiler and Runtime Support for Code Partitioning ........................................................ 62621.5.6 Thread Library .................................................................................................................... 627

    22. SIMD Programming ............................................................................................... 62922.1 SIMD Basics ................................................................................................................................ 629

    22.1.1 Converting Scalar Data to SIMD Data ............................................................................... 63022.1.2 Approaching SIMD Coding Methodically ........................................................................... 63422.1.3 Coding for Effective Auto-SIMDization ............................................................................... 645

    22.2 Auto-SIMDizing Compilers .......................................................................................................... 64722.2.1 Motivation and Challenges ................................................................................................. 64822.2.2 Examples of Invalid and Valid SIMDization ....................................................................... 650

    22.3 SIMDization Framework for a Compiler ...................................................................................... 65422.3.1 Phase 1: Basic-Block Aggregation ..................................................................................... 65622.3.2 Phase 2: Short-Loop Aggregation ...................................................................................... 65622.3.3 Phase 3: Loop-Level Aggregation ...................................................................................... 65722.3.4 Phase 4: Alignment Devirtualization .................................................................................. 658

  • Programming Handbook

    Cell Broadband Engine

    Version 1.11May 12, 2008

    ContentsPage 15 of 884

    22.3.5 Phase 5: Length Devirtualization ....................................................................................... 66322.3.6 Phase 6: SIMD Code Generation and Instruction Scheduling ........................................... 66422.3.7 SIMDization Example: Multiple Sources of SIMD Parallelism ........................................... 66522.3.8 SIMDization Example: Multiple Data Lengths ................................................................... 66822.3.9 Vector Operations and Mixed-Mode SIMDization ............................................................. 673

    22.4 Other Compiler Optimizations ..................................................................................................... 67422.4.1 OpenMP ............................................................................................................................ 67422.4.2 Subword Data Types ......................................................................................................... 67422.4.3 Backend Scheduling for SPEs ........................................................................................... 67522.4.4 Interacting with Typical Optimizations ............................................................................... 676

    23. Vector/SIMD Multimedia Extension and SPU Programming ............................. 67923.1 Architectural Differences ............................................................................................................. 679

    23.1.1 Registers ........................................................................................................................... 68023.1.2 Data Types ........................................................................................................................ 68123.1.3 Instruction-Set Differences ................................................................................................ 682

    23.2 Porting SIMD Code from the PPE to the SPEs ........................................................................... 68423.2.1 Code-Mapping Considerations .......................................................................................... 68423.2.2 Simple Macro Translation .................................................................................................. 68523.2.3 Full Functional Mapping .................................................................................................... 68823.2.4 Code-Portability Typedefs ................................................................................................. 68923.2.5 Compiler-Target Definition ................................................................................................. 689

    24. SPE Programming Tips ........................................................................................ 69124.1 DMA Transfers ............................................................................................................................ 691

    24.1.1 Initiating DMA Transfers from SPEs .................................................................................. 69224.1.2 Overlapping DMA Transfers and Computation .................................................................. 69224.1.3 DMA Transfers and LS Accesses ...................................................................................... 697

    24.2 SPU Pipelines and Dual-Issue Rules .......................................................................................... 69824.3 Eliminating and Predicting Branches .......................................................................................... 699

    24.3.1 Function-Inlining and Loop-Unrolling ................................................................................. 70024.3.2 Predication Using Select-Bits Instruction ........................................................................... 70024.3.3 Branch Hints ...................................................................................................................... 70124.3.4 Program-Based Branch Prediction .................................................................................... 70524.3.5 Profile or Linguistic Branch-Prediction ............................................................................... 70624.3.6 Software Branch-Target Address Cache ........................................................................... 70724.3.7 Using Control Flow to Record Branch History ................................................................... 708

    24.4 Loop Unrolling and Pipelining ..................................................................................................... 70924.5 Offset Pointers ............................................................................................................................ 71224.6 Transformations and Table Lookups ........................................................................................... 712

    24.6.1 The Shuffle-Bytes Instruction ............................................................................................ 71224.6.2 Fast SIMD 8-Bit Table Lookups ......................................................................................... 713

    24.7 Integer Multiplies ......................................................................................................................... 71624.8 Scalar Code ................................................................................................................................ 716

    24.8.1 Scalar Loads and Stores ................................................................................................... 71624.8.2 Promoting Scalar Data Types to Vector Data Types ......................................................... 718

    24.9 Unaligned Loads ......................................................................................................................... 718

  • Programming Handbook

    Cell Broadband Engine

    ContentsPage 16 of 884

    Version 1.11May 12, 2008

    Appendix A. PPE Instruction Set and Intrinsics ....................................................... 723A.1 PowerPC Instruction Set ............................................................................................................... 723

    A.1.1 Data Types .......................................................................................................................... 723A.1.2 PPE Instructions .................................................................................................................. 723A.1.3 Microcoded Instructions ....................................................................................................... 733

    A.2 PowerPC Extensions in the PPE .................................................................................................. 740A.2.1 New PowerPC Instructions .................................................................................................. 740A.2.2 Implementation-Dependent Interpretation of PowerPC Instructions ................................... 743A.2.3 Optional PowerPC Instructions Implemented ...................................................................... 746A.2.4 PowerPC Instructions Not Implemented .............................................................................. 747A.2.5 Endian Support .................................................................................................................... 747

    A.3 Vector/SIMD Multimedia Extension Instructions ........................................................................... 748A.3.1 Data Types .......................................................................................................................... 748A.3.2 Vector/SIMD Multimedia Extension Instructions .................................................................. 748A.3.3 Graphics Rounding Mode .................................................................................................... 752

    A.4 C/C++ Language Extensions (Intrinsics) for Vector/SIMD Multimedia Extensions ....................... 754A.4.1 Vector Data Types ............................................................................................................... 754A.4.2 Vector Literals ...................................................................................................................... 755A.4.3 Intrinsics .............................................................................................................................. 756

    A.5 Issue Rules ................................................................................................................................... 760A.6 Pipeline Stages ............................................................................................................................. 762

    A.6.1 Instruction-Unit Pipeline ....................................................................................................... 762A.6.2 Vector/Scalar Unit Issue Queue .......................................................................................... 764A.6.3 Stall and Flush Points .......................................................................................................... 765

    A.7 Compiler Optimizations ................................................................................................................. 767A.7.1 Instruction Arrangement ...................................................................................................... 767A.7.2 Avoiding Slow Instructions and Processor Modes ............................................................... 767A.7.3 Avoiding Dependency Stalls and Flushes ........................................................................... 768A.7.4 General Recommendations ................................................................................................. 770

    Appendix B. SPU Instruction Set and Intrinsics ....................................................... 771B.1 SPU Instruction Set ....................................................................................................................... 771

    B.1.1 Data Types .......................................................................................................................... 771B.1.2 Instructions .......................................................................................................................... 771B.1.3 Fetch and Issue Rules ......................................................................................................... 779B.1.4 Inline Prefetch and Instruction Runout ................................................................................ 783

    B.2 C/C++ Language Extensions (Intrinsics) for SPU Instructions ..................................................... 784B.2.1 Vector Data Types ............................................................................................................... 784B.2.2 Vector Literals ...................................................................................................................... 786B.2.3 Intrinsics .............................................................................................................................. 787B.2.4 Inline Assembly ..........................