joseph schneider february 23, 2010 1. fused multiply-add (fma) is a unit designed to perform (a x...

20
BRIDGE FLOATING-POINT FUSED MULTIPLY-ADD DESIGN BY ERIC QUINNELL, EARL E SWARTZLANDER JR., AND CARL LEMONDS Joseph Schneider February 23, 2010 1

Upload: quentin-allen

Post on 17-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Joseph Schneider February 23, 2010 1.  Fused Multiply-Add (FMA) is a unit designed to perform (A x B) + C as a single instruction  Faster, more precise

1

BRIDGE FLOATING-POINT FUSED MULTIPLY-ADD DESIGN

BY ERIC QUINNELL, EARL E SWARTZLANDER JR., AND CARL LEMONDS

Joseph SchneiderFebruary 23, 2010

Page 2: Joseph Schneider February 23, 2010 1.  Fused Multiply-Add (FMA) is a unit designed to perform (A x B) + C as a single instruction  Faster, more precise

2

FUSED MULTIPLY-ADD

Fused Multiply-Add (FMA) is a unit designed to perform (A x B) + C as a single instruction

Faster, more precise than using two consecutive instructions with standard multiplier and adder

Can perform standard addition and multiplication with appropriate constants

Page 3: Joseph Schneider February 23, 2010 1.  Fused Multiply-Add (FMA) is a unit designed to perform (A x B) + C as a single instruction  Faster, more precise

3

FUSED MULTIPLY-ADD

Performing standard addition and multiplication suffers greater latencies than when using a standard adder or multiplier

When using an FMA instead, can’t perform addition and multiplication in parallel

Page 4: Joseph Schneider February 23, 2010 1.  Fused Multiply-Add (FMA) is a unit designed to perform (A x B) + C as a single instruction  Faster, more precise

4

FUSED MULTIPLY-ADD

Goal: To design architecture between FADD and FMUL units.

Reuse components to minimize area and power consumption

Allow both standard operations and the FMA functionality

Page 5: Joseph Schneider February 23, 2010 1.  Fused Multiply-Add (FMA) is a unit designed to perform (A x B) + C as a single instruction  Faster, more precise

5

FORMAT

Floating-point units all assume double-precision (64-bit) IEEE-754 standard format

Page 6: Joseph Schneider February 23, 2010 1.  Fused Multiply-Add (FMA) is a unit designed to perform (A x B) + C as a single instruction  Faster, more precise

6

BASIS OF COMPARISON

Compare adder standalone, multiplier standalone, FMA standalone, and the FMA bridge

Compared on basis of latency, area, and power

Page 7: Joseph Schneider February 23, 2010 1.  Fused Multiply-Add (FMA) is a unit designed to perform (A x B) + C as a single instruction  Faster, more precise

7

FMA ARCHITECTURE

(A x B) + C A and B multiplied while C is aligned

based on exponent difference Carry-save adder implemented Result is rounded- only once as

opposed to two roundings necessary for performing the equation in two operations

Page 8: Joseph Schneider February 23, 2010 1.  Fused Multiply-Add (FMA) is a unit designed to perform (A x B) + C as a single instruction  Faster, more precise

8

Page 9: Joseph Schneider February 23, 2010 1.  Fused Multiply-Add (FMA) is a unit designed to perform (A x B) + C as a single instruction  Faster, more precise

9

BRIDGE FMA

Follows same architecture of FMA, only reusing parts from FADD and FMUL as appropriate

From FMUL, uses multiplier array. From FADD, uses rounding unit. In this method, FADD and FMUL can be used

individually or in parallel, while the FMA is used only when needed.

Clock-gating used to ensure bridge is only powered when needed

Page 10: Joseph Schneider February 23, 2010 1.  Fused Multiply-Add (FMA) is a unit designed to perform (A x B) + C as a single instruction  Faster, more precise

10

Page 11: Joseph Schneider February 23, 2010 1.  Fused Multiply-Add (FMA) is a unit designed to perform (A x B) + C as a single instruction  Faster, more precise

11

FMUL

Same as a standard unit, only with additional outputs from multiplier array leading to FMA

Round element shut down via clock-gating when performing an FMA operation

Page 12: Joseph Schneider February 23, 2010 1.  Fused Multiply-Add (FMA) is a unit designed to perform (A x B) + C as a single instruction  Faster, more precise

12

Page 13: Joseph Schneider February 23, 2010 1.  Fused Multiply-Add (FMA) is a unit designed to perform (A x B) + C as a single instruction  Faster, more precise

13

FADD

Uses Farmwald dual-path FADD design; Two paths available based on exponent difference of inputs

Multiplexer used to select between paths for rounding unit now include option for FMA input

In this manner, FMA uses FADD’s rounding unit

Page 14: Joseph Schneider February 23, 2010 1.  Fused Multiply-Add (FMA) is a unit designed to perform (A x B) + C as a single instruction  Faster, more precise

14

Page 15: Joseph Schneider February 23, 2010 1.  Fused Multiply-Add (FMA) is a unit designed to perform (A x B) + C as a single instruction  Faster, more precise

15

Page 16: Joseph Schneider February 23, 2010 1.  Fused Multiply-Add (FMA) is a unit designed to perform (A x B) + C as a single instruction  Faster, more precise

16

BRIDGE FMA

End result, Bridge FMA hardware is essentially the original FMA hardware, only without the multiplier array and rounding unit.

Page 17: Joseph Schneider February 23, 2010 1.  Fused Multiply-Add (FMA) is a unit designed to perform (A x B) + C as a single instruction  Faster, more precise

17

Page 18: Joseph Schneider February 23, 2010 1.  Fused Multiply-Add (FMA) is a unit designed to perform (A x B) + C as a single instruction  Faster, more precise

18

RESULTS

FMUL, FADD, FMA, and Bridge FMA all implemented in Verilog

Uses AMD 65-nm silicon-on-insulator design set

Page 19: Joseph Schneider February 23, 2010 1.  Fused Multiply-Add (FMA) is a unit designed to perform (A x B) + C as a single instruction  Faster, more precise

19

RESULTS

Bridge architecture 30%-70% faster than FMA architecture when performing FADD or FMUL instructions with significant savings in power consumption

Also allows for an FADD and FMUL instruction in parallel, further improving speed

12% performance gain when executing FMA instruction over consecutive operations on individual FADD and FMUL.

Page 20: Joseph Schneider February 23, 2010 1.  Fused Multiply-Add (FMA) is a unit designed to perform (A x B) + C as a single instruction  Faster, more precise

20

RESULTS

Takes 40% more area to include Bridge FMA with FADD and FMUL Unit

60% increase in power for FMA instruction over consecutive FADD and FMUL instructions in worst case conditions

Increased latency and power over standalone FMA unit