Download - Presented by: Deniz Balkan
![Page 1: Presented by: Deniz Balkan](https://reader035.vdocuments.us/reader035/viewer/2022062314/56813025550346895d95addf/html5/thumbnails/1.jpg)
Defining Wakeup Width for Efficient Dynamic Scheduling
A. Aggarwal, O. Ergin – Binghamton UniversityM. Franklin – University of Maryland
Presented by: Deniz Balkan
![Page 2: Presented by: Deniz Balkan](https://reader035.vdocuments.us/reader035/viewer/2022062314/56813025550346895d95addf/html5/thumbnails/2.jpg)
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Dynamic Scheduler
• Workings of a dynamic scheduler– Wakeup dependent instructions
– Select instructions from a pool of ready instructions
• Both these operations form a critical path
• Increase of a single cycle in this critical path impacts performance
![Page 3: Presented by: Deniz Balkan](https://reader035.vdocuments.us/reader035/viewer/2022062314/56813025550346895d95addf/html5/thumbnails/3.jpg)
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Implications of a large Dynamic Scheduler
• Large dynamic scheduler has the potential to exploit more ILP
– Larger issue queue– Larger issue width
• Implications– Longer wire delays associated with driving register tags– Longer wire delays in driving tag comparison results– Longer select logic latency
• Overall increased scheduler latency, resulting in slower clock speed
![Page 4: Presented by: Deniz Balkan](https://reader035.vdocuments.us/reader035/viewer/2022062314/56813025550346895d95addf/html5/thumbnails/4.jpg)
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Contributions of this paper
• Wakeup width definition – effective number of results used for instruction wakeup
– Usually equal to the issue width
• Reduced wakeup width dynamic scheduler– Issue width remains the same
– Reduces instruction wakeup latency, energy consumption, and area
– Less than 2% reduction in IPC
![Page 5: Presented by: Deniz Balkan](https://reader035.vdocuments.us/reader035/viewer/2022062314/56813025550346895d95addf/html5/thumbnails/5.jpg)
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Program Behavior Study
• Not all instructions produce a result– Branch and store instructions form about 30%
• Entire issue width of the processor not used in every cycle
• Average number of tags generated per cycle considerably less than the processor issue width
![Page 6: Presented by: Deniz Balkan](https://reader035.vdocuments.us/reader035/viewer/2022062314/56813025550346895d95addf/html5/thumbnails/6.jpg)
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Tags generated in a cycle
• To generate more tags per cycle, used a fetch, issue and commit width of 12
• Almost 50% of cycles have either 0 or 1 tag generated, even with a large issue width
• About 80% of the cycles have 3 or less tags generated per cycle
![Page 7: Presented by: Deniz Balkan](https://reader035.vdocuments.us/reader035/viewer/2022062314/56813025550346895d95addf/html5/thumbnails/7.jpg)
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Useful tags
• Not all the generated tags are immediately useful
– Branch mispredictions lead to tags generated along wrong path, and tags not immediately required
– Dependent instructions not present in issue queue or waiting for other operands
• Average number of useful tags in a cycle even less than the average number of tags generated in a cycle
![Page 8: Presented by: Deniz Balkan](https://reader035.vdocuments.us/reader035/viewer/2022062314/56813025550346895d95addf/html5/thumbnails/8.jpg)
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Useful tags
Only about 50-60% of instructions produce a tag that is immediately required
![Page 9: Presented by: Deniz Balkan](https://reader035.vdocuments.us/reader035/viewer/2022062314/56813025550346895d95addf/html5/thumbnails/9.jpg)
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Reduced Wakeup Width Dynamic Scheduler
• Wakeup width reduced while retaining the issue width intact
– Some tags may have to wait before waking up the dependent instructions
• Performance impact is not expected to be high
– Soon there will be cycles with fewer tags
– Waiting tags can use the available wakeup slots
– Delays in not immediately useful tags may not have any performance impact
![Page 10: Presented by: Deniz Balkan](https://reader035.vdocuments.us/reader035/viewer/2022062314/56813025550346895d95addf/html5/thumbnails/10.jpg)
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Hardware Implementation – Conventional DS
• Select logic decides which instruction executes on which FU
• Register tags of issued instructions placed in tag-latches
• Enable signals controlled to enable the drivers that drive the tags across the instruction window
![Page 11: Presented by: Deniz Balkan](https://reader035.vdocuments.us/reader035/viewer/2022062314/56813025550346895d95addf/html5/thumbnails/11.jpg)
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Hardware Implementation – RWW DS
• Wakeup width reduced to half the issue width
• Two tag latches/FUs share common tag-lines
• If both tag-latches hold tags, only one of them is driven, the other remains in the tag-latch
• To prevent overwriting, 1-bit indicator latch used to control the selection process
![Page 12: Presented by: Deniz Balkan](https://reader035.vdocuments.us/reader035/viewer/2022062314/56813025550346895d95addf/html5/thumbnails/12.jpg)
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
FU arbiter
• Decides the instruction to be executed on the FU
• Conventional arbiter giving priority to oldest instruction
• Arbiter with RWW dynamic scheduler, where “a” is the value of the indicator latch for the arbiter
Grant1 = req0 AND req1 AND enable
Grant1 = req0 AND a AND req1 AND enable
![Page 13: Presented by: Deniz Balkan](https://reader035.vdocuments.us/reader035/viewer/2022062314/56813025550346895d95addf/html5/thumbnails/13.jpg)
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Experimental Setup
• Simulator based on Simplescalar to collect the performance statistics
• Delay, energy, and area estimation from the actual VLSI layouts using SPICE, in a 0.18 micron 6 metal layer CMOS process (TSMC)
• Dynamic scheduler size – 128-entry issue queue, 6-way issue width
![Page 14: Presented by: Deniz Balkan](https://reader035.vdocuments.us/reader035/viewer/2022062314/56813025550346895d95addf/html5/thumbnails/14.jpg)
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Performance Results
• Compared to I6W6 (Issue Width 6, Wakeup Width 6) configuration
– I6W3 has 15% lower wakeup logic latency
• IPC impact about 5% for I6W3– Higher for high IPC FP benchmarks
– Significantly better than I3W3, with the same wakeup logic latency as I6W3
![Page 15: Presented by: Deniz Balkan](https://reader035.vdocuments.us/reader035/viewer/2022062314/56813025550346895d95addf/html5/thumbnails/15.jpg)
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
IPC of FP benchmarks with RWW
Reasons of IPC impact• Instructions delayed due to waiting tags• Issue slots wasted because of waiting tags
![Page 16: Presented by: Deniz Balkan](https://reader035.vdocuments.us/reader035/viewer/2022062314/56813025550346895d95addf/html5/thumbnails/16.jpg)
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Reasons of IPC impact
• Delayed register tags have more impact than issue slot wastage
• With reducing wakeup width, the impact of delayed register tags increases dramatically
![Page 17: Presented by: Deniz Balkan](https://reader035.vdocuments.us/reader035/viewer/2022062314/56813025550346895d95addf/html5/thumbnails/17.jpg)
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Area and Energy Results
• Activation statistics obtained through simulations, and the energy consumption values from our detailed layouts
– I6W3 reduced wakeup logic energy consumption by 10%
• Area of the CAM cells (tag part of the instruction window) reduces by about 30% for I6W3
![Page 18: Presented by: Deniz Balkan](https://reader035.vdocuments.us/reader035/viewer/2022062314/56813025550346895d95addf/html5/thumbnails/18.jpg)
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Reduced Issue Slots Wastage (RWIS)
• Issue slots wasted because no instructions issued to FUs with already waiting tags
• Classified instructions into– Tag-producing instructions– Non-tag-producing instructions
• Can still issue non-tag-producing instructions to FUs with waiting tags without overwriting the tag value
• Type bit included with the instruction to control issue
![Page 19: Presented by: Deniz Balkan](https://reader035.vdocuments.us/reader035/viewer/2022062314/56813025550346895d95addf/html5/thumbnails/19.jpg)
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Reduced Tag Delays (RTD)
• Register tags delayed when multiple tag-producing instructions issued to the FUs sharing the tag-lines (FU-group)
• RTD limits the number of tag-producing instructions issued to an FU-group
– Waiting tags of the previous cycle used for this purpose
• Non-tag-producing instructions can still be issued to FUs with indicator bits set
![Page 20: Presented by: Deniz Balkan](https://reader035.vdocuments.us/reader035/viewer/2022062314/56813025550346895d95addf/html5/thumbnails/20.jpg)
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Enhanced Performance
• RTD-1 (with a maximum of 1 waiting tag) is the most effective
• RWIS reduces the wastage of issue slots, RTD also reduces waiting register tags
• RTD-2 results in more instructions getting delayed (compared to RTD-1) due to waiting register tags
![Page 21: Presented by: Deniz Balkan](https://reader035.vdocuments.us/reader035/viewer/2022062314/56813025550346895d95addf/html5/thumbnails/21.jpg)
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton Univ., M. Franklin – Univ. of Maryland
Conclusions
• Larger dynamic schedulers can exploit more ILP, thus increasing performance
• Larger dynamic scheduler results in longer scheduler latency
• Reduced wakeup width (RWW) dynamic scheduler exploits the property that the number of useful tags generated per cycle are significantly less than the issue width
• Significant reduction in wakeup logic latency and dynamic scheduler area and energy consumption with minimal IPC impact