1 farhan mohamed ali (w2-1) jigar vora (w2-2) sonali kapoor (w2-3) avni jhunjhunwala (w2-4)...
Post on 19-Dec-2015
224 views
TRANSCRIPT
![Page 1: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/1.jpg)
1
Farhan Mohamed Ali (W2-1)Jigar Vora (W2-2)Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4)
Presentation 12
MAD MAC 525
26th April, 2006Short Final Presentation
W2
Project Objective:Design a crucial part of a GPU called the Multiply Accumulate Unit (MAC) which will revolutionize graphics.
Design Manager: Zack Menegakis
![Page 2: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/2.jpg)
2
Agenda
• Marketing (Jigar)• Project Description (Farhan)• Algorithmic Description (Farhan)• Design Process (Sonali)• Floorplan Evolution (Sonali)• Layout (Avni)• Design Specifications (Avni)• Conclusion (Jigar)
![Page 3: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/3.jpg)
3
MARKETING
• Application of product: HDR rendering in gaming graphics
• Why HDR? Used in games like Far Cry
• Optimization for speed( chose this because of market)
• Competition- if enter market, possible barriers to entry
![Page 4: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/4.jpg)
4
MAD MAC and HDR
• What is HDR?
• Show animation explaining concept
![Page 5: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/5.jpg)
5
MAD MAC and HDR• MAD MAC accelerates FP16 blending to enable true HDR graphics
• What is HDR?
• HDR = High Dynamic Range
• Dynamic range is defined as the ratio of the largest value of a signal to the lowest measurable value
• Dynamic range of luminance in real-world scenes can be 100,000 : 1
• With HDR rendering, pixel intensity are allowed to extend beyond [0..1] range of traditional graphics
•Nature isn’t clamped to [0..1] and neither should CG
• In lay terms:
• Bright things can be really bright
• Dark things can be really dark
• And the details can be seen in both
![Page 6: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/6.jpg)
6
![Page 7: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/7.jpg)
7
• Multiply Accumulate unit (MAC)
• Executes function AB+C on 16 bit floating point inputs. Inputs will be OpenEXR format.
• Multiply and add in parallel to greatly speed up operation
• Rounding is only performed only once so greater accuracy than individual multiply and add functions.
• Also known as:
• Fused Multiply Add (FMA)
• Multiply Add (MAD/MADD) in graphics shader programs
• Many applications benefit from a fast FMA
• Graphics – HDR rendering, blending and shader ops
• DSPs – computing vector dot-products in digital filters
• Fast division, square root – eliminates extra hardware
• Available in many newer CPUs and DSPs because it’s so cool
• One ring (circuit) to rule them all!
PROJECT DESCRIPTION
![Page 8: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/8.jpg)
8
ALGORITHMIC DESCRIPTION
• Step through entire process
• Multiply and align occurs concurrently- always align C to A*B
• Outputs go to adder, normalize, round, overflow checker and output register
![Page 9: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/9.jpg)
9
RegArray A RegArray B RegArray C
Multiplier Exp Calc Align
Adder/SubtractorControlLogic
&Sign
Dtrmin
Normalize
Round
Ovf Checker
Leading 0 Anticipator
10 10 10
5
55
1435225
4
36
14
101
5
5
Input Input Input
Output
16 16 16
16RegY
15
1
1
1
Block Diagram
![Page 10: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/10.jpg)
10
IMPLEMENTATION
• Implementation of each module- how and why we chose a particular method keeping in mind goal of speed( multiplier, adder)
![Page 11: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/11.jpg)
11
Design Decisions (contd.):• Multiplier Implementation
– 11 x 11 Carry-Save Multiplier– Reasons:
• Fast because it avoids having ripple carry in every stage
• Enables Compact Layout
![Page 12: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/12.jpg)
12
Design Process
• Verilog-> Schematic-> Layout– Behavioral -> Structural Verilog– Transistors/gates -> Full Schematic– Gate/Component Layout -> Top Level
• Transistor Count fluctuated from 20,200 to 12,800• Major design decisions
– Decided against implementing denormal arithmetic because it would increase the complexity of the project beyond the scope of the class
– Round performed only once at the end.– Picked nPass over Tgate in the normalize shifter– Adder: variable length carry select-> Han-Carlson binary tree
adder
![Page 13: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/13.jpg)
13
VERIFICATION OF DESIGN
Verilog Simulations ( show outputs)– Overview– How/Why it works– Behavioral/Structural
Explain why we couldn’t get a high-level simulator and how we tested our verilog design.
![Page 14: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/14.jpg)
14
SCHEMATICS
• Show schematics of major blocks: adder, multiplier, and top-level
• HOW WE VERIFIED: analog simulation
![Page 15: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/15.jpg)
15
Top Level Schematic
![Page 16: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/16.jpg)
16
Multiplier Schematic
![Page 17: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/17.jpg)
17
Adder Schematic
![Page 18: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/18.jpg)
18
FLOORPLAN EVOLUTION
• Initial floorplan
• How it evolved (with animation)- why and how we changed it
![Page 19: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/19.jpg)
19
Multiplier
Align C
Reg A
Reg
BExpCalc
Reg C
Pipeline Reg Pipeline Reg
AdderLd
Zero
Pipeline Reg
NormalizeRound
Reg Y
Main Floorplan
![Page 20: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/20.jpg)
20
Floorplan
![Page 21: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/21.jpg)
21
Full Chip LayoutExponent
AlignZero
Adder
MultiplierNormalize
Round
Ovf
![Page 22: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/22.jpg)
22
Pipelining
• Initially planned 5-6 pipeline stages
• Reduced to 4 pipeline stages – made possible by implementing fast carry lookahead adders in critical path modules (adder and multiplier)
![Page 23: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/23.jpg)
23
Pipeline Reg
Pipelining Stages
MultiplierAlign
C
Reg A
Reg
BExpCalc
Reg C
Pipeline Reg Pipeline Reg
AdderLd
Zero
Pipeline Reg
NormalizeRound
Reg Y
Pipeline Reg
Overflow checker
![Page 24: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/24.jpg)
24
LAYOUT
• Final Layout
• Layout of large blocks such as multiplier, adder and normalize
![Page 25: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/25.jpg)
25
Layout Decisions
• 3 standard cell heights
• Uniform width vdd and ground rails
• Wider vdd and ground rails in power hungry modules
• Max of 8 flip flops per clock pulse generator
• Metal directionality
![Page 26: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/26.jpg)
26
Multiplier Layout with pipelining
![Page 27: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/27.jpg)
27
Adder Layout
![Page 28: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/28.jpg)
28
Normalize Layout
![Page 29: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/29.jpg)
29
FINAL LAYOUT
![Page 30: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/30.jpg)
30
Design Specifications
• Worst case delay = 2.25ns
• Long buses are all buffered (not tested yet)
• Estimated clocking speed = 400MHz
• Height by width = 193.86 um * 301.545 um
• Area = 58,458 um^2
• Aspect ratio = 1:1.55
• Total Transistor density = 0.22
![Page 31: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/31.jpg)
31
Layout densities
• Active : 14.05%
• Poly : 9.25%
• Metal 1 : 33.89%
• Metal 2 : 18.00%
• Metal 3 : 14.99%
• Metal 4 : 6.29%
![Page 32: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/32.jpg)
32
Layer Masks - Poly
![Page 33: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/33.jpg)
33
Layer Masks – Metal 1
![Page 34: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/34.jpg)
34
Layer Masks – Metal 2
![Page 35: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/35.jpg)
35
Layer Masks – Metal 3
![Page 36: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/36.jpg)
36
Layer Masks – Metal 4
![Page 37: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/37.jpg)
37
Schematic Power: mW (350Mhz)
Layout Power: mW
Schematic Delay
Layout Delay
Multiplier
-w/ pipeline
2.97
??
N/A
??
3.38n
1.9n
N/A
2.25n
Exponents 1.608 2.21 1.01n 1.2n
Align 0.094 0.113 480p 637p
Adder 8.48 9.73 1.34n 1.7n
Leading 0 0.232 0.857 506p 551p
Normalize 1.458 1.546 407p 437p
Round 0.631 1.21 864p 986p
OvfCheck 0.13 0.19 453p 475p
Registers ?? ?? 179p 193p
Total ?? ?? - -
![Page 38: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/38.jpg)
38
Area:
um2
Transistor Count
Transistor
Density
Multiplier
-w/ pipeline
20388 4496 0.22
Exponents 5,163 738 0.14
Align 3,995 500 0.13
Adder 13,202 3174 0.24
Leading 0 1,253 364 0.29
Normalize 3,190 942 0.3
Round 1,802 494 0.28
OvfCheck 200 70 0.35
Registers, etc
N/A 1948 N/A
Total 58,458 12,730 0.22
![Page 39: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/39.jpg)
39
Conclusion
• More marketing
• Summarize chip functionality
• Extending applications of chip
![Page 40: 1 Farhan Mohamed Ali (W2-1) Jigar Vora (W2-2) Sonali Kapoor (W2-3) Avni Jhunjhunwala (W2-4) Presentation 12 MAD MAC 525 26 th April, 2006 Short Final Presentation](https://reader036.vdocuments.us/reader036/viewer/2022062320/56649d405503460f94a19c5f/html5/thumbnails/40.jpg)
40
Comments?