Download - Programming on IBM Cell Triblade
![Page 1: Programming on IBM Cell Triblade](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814558550346895db228fe/html5/thumbnails/1.jpg)
Programming on IBM Cell TribladeJagan Jayaraj ,Pei-Hung Lin, Mike Knox and Paul WoodwardUniversity of MinnesotaApril 1, 2009
![Page 2: Programming on IBM Cell Triblade](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814558550346895db228fe/html5/thumbnails/2.jpg)
•An instability of an interface between two fluids of different densities, which occurs when the lighter fluid is pushing the heavier fluid.
•Using multi-fluids Piecewise-Parabolic Method(PPM) to implement R-T instability simulation
•Program is written in Fortran
Rayleigh–Taylor instability
![Page 3: Programming on IBM Cell Triblade](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814558550346895db228fe/html5/thumbnails/3.jpg)
TriBlade
▫Two QS22 blades, each with 2 PowerXCell 8i CPUs
▫LS21 blade with two dual-core AMD Opterons
▫16GB memory for LS21 and 8GB memory for QS22
![Page 4: Programming on IBM Cell Triblade](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814558550346895db228fe/html5/thumbnails/4.jpg)
![Page 5: Programming on IBM Cell Triblade](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814558550346895db228fe/html5/thumbnails/5.jpg)
LCSE Cell Cluster
•6 Triblades
•4 QS22 Cell blades
•2 QS20 Cell blades
•4 AMD Quadcore Systems
![Page 6: Programming on IBM Cell Triblade](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814558550346895db228fe/html5/thumbnails/6.jpg)
![Page 7: Programming on IBM Cell Triblade](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814558550346895db228fe/html5/thumbnails/7.jpg)
Login instructions
•Account credentials should be in your email.
•Guest account: lcse / lcse$ncsa!•Login steps:
▫SSH to frodo.lcse.umn.edu▫Once logged in to frodo SSH to an assigned
Cell Processor host AMD – rra001a ~ rra006a Cell – rra001b / rra001c ~ rra006b/rra006c
![Page 8: Programming on IBM Cell Triblade](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814558550346895db228fe/html5/thumbnails/8.jpg)
Software available•Cell SDK 3.1•OpenMPI 1.3•DaCS Fortran bindings•Compilers
▫AMD: gfortran, gcc 4.1.2▫PPU: ppuxlf, ppu-gcc▫SPU: spuxlf, spu-gcc
•Example code is available on /mnt/scratch/NCSA_Example
![Page 9: Programming on IBM Cell Triblade](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814558550346895db228fe/html5/thumbnails/9.jpg)
Compilation and Execution
•On AMD node:▫make ppm4f-x86
•On Cell node:▫make ppm4f-ppu
•On AMD node:▫./ppm4f-x86
![Page 10: Programming on IBM Cell Triblade](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814558550346895db228fe/html5/thumbnails/10.jpg)
Three levels of parallelism:within-Cell within-node node-to-node
Compute-communication overlapDMADaCSMPI
Triblade programming paradigm
![Page 11: Programming on IBM Cell Triblade](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814558550346895db228fe/html5/thumbnails/11.jpg)
Single code for Roadrunner and non-RR systems◦Using lots #ifdef, #if, #endif…◦Using preprocessor to generate three codes
Minimize the manual translation for SPU code◦Using Fortran to Cell C translator,
Tedious portions of the SPU code can be translated.Fortran codes for PPU and AMD
◦Fortran binding programs for C intrinsic librariesKeep memory footprint small
Programming for IBM Cell Tri-blade
![Page 12: Programming on IBM Cell Triblade](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814558550346895db228fe/html5/thumbnails/12.jpg)
Single Source Code
Preprocessor
PPU Fortran codeSPU Fortran code AMD Fortran code
Translation
SPU C code Fortran Binding Programs
SPU C Compiler
PPU Fortran
Compiler
GNU Fortran
Compiler
AMD ExecutablePPU ExecutableSPU
Executable Embedded
![Page 13: Programming on IBM Cell Triblade](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814558550346895db228fe/html5/thumbnails/13.jpg)
Division of labor▫Define jobs for AMD, PPU and SPU clearly
AMD: I/O, MPI, relay data to Cell…
PPU: Transfer data, manage SPUs
SPU: Just compute
![Page 14: Programming on IBM Cell Triblade](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814558550346895db228fe/html5/thumbnails/14.jpg)
▫Three codes for three different ISAs
▫Different endian-ness between PPU and AMD Need to do byte-swapping
▫64bit/32bit conversion SPU supports 32bit address only, but DaCS
requires 64bit address mode
Items to care
![Page 15: Programming on IBM Cell Triblade](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814558550346895db228fe/html5/thumbnails/15.jpg)
Translator•Fortran to C with Cell extensions
•Needs directives
•Built with ANTLR
•Handles:▫Vector and scalar loops▫DMAs (Including List DMAs)▫Variable declarations▫Conditional vector moves
![Page 16: Programming on IBM Cell Triblade](https://reader036.vdocuments.us/reader036/viewer/2022062301/56814558550346895db228fe/html5/thumbnails/16.jpg)
References• Woodward, P. R., J. Jayaraj, P.-H. Lin, and P.-C. Yew, “Moving Scientific Codes to
Multicore Microprocessor CPUs,” Computing in Science & Engineering, special issue on novel architectures, Nov., 2008, p. 16-25. Also available at www.lcse.umn.edu/CiSE.
• Woodward, P. R., J. Jayaraj, P.-H. Lin, and D. Porter, “Programming Techniques for Moving Scientific Simulation Codes to Roadrunner,” tutorial given 3/12/08 at Los Alamos, link available at www.lanl.gov/roadrunner/rrtechnicalseminars2008.
• Woodward, P. R., J. Jayaraj, P.-H. Lin, and W. Dai, “First Experience of Compressible Gas Dynamics Simulationon the Los Alamos Roadrunner Machine,” submitted to Concurrency and Computation Practice and Experience, preprint available at www.lcse.umn.edu/RR-docs.
• http://www.lcse.umn.edu/NCSA_Workshop/