the alpha 21364 network architecture by shubhendu s. mukherjee, peter bannon steven lang, aaron...
TRANSCRIPT
![Page 1: The Alpha 21364 Network Architecture By Shubhendu S. Mukherjee, Peter Bannon Steven Lang, Aaron Spink, and David Webb Compaq Computer Corporation Presented](https://reader036.vdocuments.us/reader036/viewer/2022072014/56649e8e5503460f94b91912/html5/thumbnails/1.jpg)
The Alpha 21364 Network The Alpha 21364 Network ArchitectureArchitecture
By Shubhendu S. Mukherjee, Peter Bannon Steven By Shubhendu S. Mukherjee, Peter Bannon Steven Lang, Aaron Spink, and David WebbLang, Aaron Spink, and David Webb
Compaq Computer CorporationCompaq Computer CorporationPresented by Presented by
Luis Alfredo CamposLuis Alfredo Campos
![Page 2: The Alpha 21364 Network Architecture By Shubhendu S. Mukherjee, Peter Bannon Steven Lang, Aaron Spink, and David Webb Compaq Computer Corporation Presented](https://reader036.vdocuments.us/reader036/viewer/2022072014/56649e8e5503460f94b91912/html5/thumbnails/2.jpg)
Alpha 21364 GoalsAlpha 21364 Goals
Support communication-intensive server applicationsSupport communication-intensive server applications– High performance technical computingHigh performance technical computing– Database serversDatabase servers– Web serversWeb servers– Telecommunication applicationsTelecommunication applications
Achieve:Achieve:– Extremely low latencyExtremely low latency– Enormous bandwidthEnormous bandwidth– Support directory cache coherenceSupport directory cache coherence
Improve:Improve:– ReliabilityReliability– AvailabilityAvailability
![Page 3: The Alpha 21364 Network Architecture By Shubhendu S. Mukherjee, Peter Bannon Steven Lang, Aaron Spink, and David Webb Compaq Computer Corporation Presented](https://reader036.vdocuments.us/reader036/viewer/2022072014/56649e8e5503460f94b91912/html5/thumbnails/3.jpg)
OverviewOverview
Alpha 21264 core with enhancementsTightly-Coupled multiprocessor Tightly-Coupled multiprocessor networknetwork– Connects up to 128 Connects up to 128
processorsprocessors– Two-Dimensional torus Two-Dimensional torus
networknetwork
Integrated L2 CacheIntegrated memory controllerRouter Router – Directory-Based CCDirectory-Based CC– Separate Virtual ChannelsSeparate Virtual Channels– Packet ClassesPacket Classes
![Page 4: The Alpha 21364 Network Architecture By Shubhendu S. Mukherjee, Peter Bannon Steven Lang, Aaron Spink, and David Webb Compaq Computer Corporation Presented](https://reader036.vdocuments.us/reader036/viewer/2022072014/56649e8e5503460f94b91912/html5/thumbnails/4.jpg)
Network Packet ClassesNetwork Packet Classes
Seven Packet ClassesSeven Packet Classes– Request (3 Flits)Request (3 Flits)– Forward (3 Flits)Forward (3 Flits)– Block Response (18 or 19 Flits)Block Response (18 or 19 Flits)– Non-Block Response (2 or 3 Flits)Non-Block Response (2 or 3 Flits)– Write I/O (19 Flits)Write I/O (19 Flits)– Read I/O (3 Flits)Read I/O (3 Flits)– Special (1 or 3 Flits)Special (1 or 3 Flits)
Flits Are 32 Bits Data Plus 7 Bits ECCFlits Are 32 Bits Data Plus 7 Bits ECC
![Page 5: The Alpha 21364 Network Architecture By Shubhendu S. Mukherjee, Peter Bannon Steven Lang, Aaron Spink, and David Webb Compaq Computer Corporation Presented](https://reader036.vdocuments.us/reader036/viewer/2022072014/56649e8e5503460f94b91912/html5/thumbnails/5.jpg)
Network ArchitectureNetwork Architecture
Two-dimensional Two-dimensional torustorus– Limited Support for Limited Support for
Imperfect ToriImperfect Tori
Allows Fault Allows Fault RemappingRemapping
Virtual Cut-Through Virtual Cut-Through RoutingRouting– Buffer space for 316 Buffer space for 316
packetspackets
![Page 6: The Alpha 21364 Network Architecture By Shubhendu S. Mukherjee, Peter Bannon Steven Lang, Aaron Spink, and David Webb Compaq Computer Corporation Presented](https://reader036.vdocuments.us/reader036/viewer/2022072014/56649e8e5503460f94b91912/html5/thumbnails/6.jpg)
Adaptive RoutingAdaptive Routing
Four Rectangles With Four Rectangles With Current and Current and Destination At Destination At DiagonalsDiagonalsPackets route within Packets route within the minimum the minimum rectanglerectangleMaximize the Maximize the bandwidth between bandwidth between source and source and destinationdestination
![Page 7: The Alpha 21364 Network Architecture By Shubhendu S. Mukherjee, Peter Bannon Steven Lang, Aaron Spink, and David Webb Compaq Computer Corporation Presented](https://reader036.vdocuments.us/reader036/viewer/2022072014/56649e8e5503460f94b91912/html5/thumbnails/7.jpg)
Avoiding Deadlocks in Adaptive RoutingAvoiding Deadlocks in Adaptive Routing
““Adaptive routing will not Adaptive routing will not deadlock a network as deadlock a network as long as packets can drain long as packets can drain via a deadlock-free path”via a deadlock-free path”19 Virtual Channels19 Virtual Channels– 3 sets of virtual channel per 3 sets of virtual channel per
Packet class except for the Packet class except for the Special Class (only one Special Class (only one channel)channel)
Adaptive, VC0, and VC1Adaptive, VC0, and VC1
– Adaptive Is First ChoiceAdaptive Is First Choice– VC0 and VC1 combination VC0 and VC1 combination
creates deadlock-free creates deadlock-free networknetwork
![Page 8: The Alpha 21364 Network Architecture By Shubhendu S. Mukherjee, Peter Bannon Steven Lang, Aaron Spink, and David Webb Compaq Computer Corporation Presented](https://reader036.vdocuments.us/reader036/viewer/2022072014/56649e8e5503460f94b91912/html5/thumbnails/8.jpg)
Router ArchitectureRouter Architecture
9 pipeline types9 pipeline types– Input and Output: Local, Interprocessor, and Input and Output: Local, Interprocessor, and
I/OI/O
Pin to pin latency of 13 cyclesPin to pin latency of 13 cycles– Running at 1.2 GhzRunning at 1.2 Ghz
Network Links run 33% slowerNetwork Links run 33% slower– Running at 0.8 GhzRunning at 0.8 Ghz– Synchronous with outgoing linksSynchronous with outgoing links– Asynchronous with incoming linksAsynchronous with incoming links
![Page 9: The Alpha 21364 Network Architecture By Shubhendu S. Mukherjee, Peter Bannon Steven Lang, Aaron Spink, and David Webb Compaq Computer Corporation Presented](https://reader036.vdocuments.us/reader036/viewer/2022072014/56649e8e5503460f94b91912/html5/thumbnails/9.jpg)
ArbitrationArbitration
Needs to avoid central bottleneckNeeds to avoid central bottleneck– 16 local arbiters16 local arbiters– 7 global arbiters7 global arbiters
Least Recently Selected (LRS) Least Recently Selected (LRS) SchemeScheme– Local ArbitersLocal Arbiters
ClassesClassesVirtual ChannelVirtual Channel
– Global ArbitersGlobal ArbitersInput portsInput ports
Rotary Rule modeRotary Rule mode– Priority to oldest packetsPriority to oldest packets
Coherence Dependence Priority Coherence Dependence Priority (CDP) Rule mode(CDP) Rule mode– Priority depending on class orderingPriority depending on class ordering
![Page 10: The Alpha 21364 Network Architecture By Shubhendu S. Mukherjee, Peter Bannon Steven Lang, Aaron Spink, and David Webb Compaq Computer Corporation Presented](https://reader036.vdocuments.us/reader036/viewer/2022072014/56649e8e5503460f94b91912/html5/thumbnails/10.jpg)
QuestionsQuestions
How Is the 1.2 GHz Internal/800 MHz How Is the 1.2 GHz Internal/800 MHz External Clock OK?External Clock OK?
Why 2-d Torus?Why 2-d Torus?– What Are the Limitations Imposed?What Are the Limitations Imposed?