cse 291 high performance interconnect fall 2012 university of california, san diego course...
TRANSCRIPT
CSE 291
High Performance InterconnectFall 2012
University of California, San Diego
Course Information Instructor •CK Cheng, [email protected], 858 534-6184 Schedule •Lectures: 5:00-6:20PM, TTH, CSE 2217Textbooks •(H) High Speed Signal Propagation: Advanced Black Magic Howard Johnson and Martin Graham •(D) Digital Systems Engineering William J. Dally, John W. Poulton Content •1. Structure of Interconnect and Packaging •2. Electrical and Physical Scaling •3. Interconnect Modeling: Wire and Transmission Line Models•4. Interconnect Signaling•5. Transmitters and Receivers•6. Power Distribution Network•7. Clock Distribution•8. Extraction and Simulation•9. Thermal Issues
1
System Example: Blue Gene/L 2005
Overall View•8x8 Racks: 65,536 compute nodes•25KW/Rack•2 Midplanes/Rack•16 Node cards/Midplane•16 Compute cards/Node card•2 PUs/Compute card•64x32x32 Torus•1.4Gb/s differential link, 700MHz clock
2
System Example: Blue Gene/L 2005
Compute card•14.3W/ASIC node: power density 10.4W/cm2
•206mm x 55mm/compute card•14 layers: 6 signal, 8 power
3
System Example: Blue Gene/L 2005
Air cooling•25KW/ 0.91 x 0.91 m2 ≈ 3W/cm2
•Air displacement 1.4m3/s, Average velocity 6.7m/s•Fan speed is optimized individually•Plenum: θ, β, EMI screen•Elliptical vane
4
System Example: Blue Gene/L 2005Clock
•Length matching, differential pairs with terminationInterconnect•Pre-emphasis, On-chip termination•Vdd/Vss noise: 185-100 ps delayMidplane: reduce longest path between boards•18 layers•190-215um width trace at 1.0ounce copper for 100ohm differential pairs•100um width trace at 0.5 ounce copper for short wires
5
System Example: z196
z196 (2012): 45nm tech, released 9/2010• 96 cores, 5.2GHz, 770GB Memory/node• 3KW/PU book, 4PU books/backplaneMCM• 1MCM/PU book, 2KA/MCM• 6PCs, 2 Cache/MCM • 96x96mm2, 103 layers, 7,356pins/MCM
6
System Example: z196
Water cooling option•humidity and atmospheric pressure -> dew point + 6°C•3.25 gallons/minute for each processor module•Lower temperatures -> lower processor power consumption•No refrigeration compressors•Air conditioning of the room: energy reduced by a factor of 3•Save 4 kW/4PU books
7
System Example: z196Power Distribution at ±5% tolerance•Locate power conversion close to the chip
– DCA-> 40-48V– Gearboxes -> 1.1V, 17 power domains– Feedback control
•Redundancy N+2 (N=2), V, I, T sensing for failure detection•Previous version: 600W copper losses, 5 ounces metal plane•Now 400W (1/3 on copper, 2/3 on power conversion)•Deep trench capacitor: 25 times density, 15uF -> 5.2 GHz on chip
8
System Example: z196
Power network impedance evaluation (10mΩ)•Set on-off sequence for clock tree to create stimulus pattern•Measure voltage with probes•Average 7,864 times, 2M samples for 2ms interval at 1GHz sample rate•Z(f)=V(f)/I(f)
9