dynamic precision numerics using a variable-precision … · | 12 unum format is variable length...
TRANSCRIPT
ARITH’26 | BOCCO Andrea | 11 June 2019
DYNAMIC PRECISION NUMERICS USING A VARIABLE-PRECISION
UNUM TYPE I HW COPROCESSOR
| 2
INTRODUCTION: STATE OF THE ART
➢ Variable Precision (VP) computing has been investigated to improve
convergence of algorithms. It has been investigated in:
▪ Software (SW): GMP[2] and MPFR[3]
▪ Slow, they might not met requirements in high speed applications
▪ Hardware (HW):▪ Kulisch[4] : large fixed point accumulator
▪ Schulte and Swartzlander[5] : mantissas divided in multiple words
➢ None of the previous works show how to store efficiently VP Floating
Point (FP) number in main memory
▪ They support IEEE 754 FP format in main memory
[1] IEEE754-2008 2008. IEEE Standard for Floating-Point Arithmetic. IEEE 754-2008 https://doi.org/10.1109/IEEESTD.2008.4610935
[2] Torbjörn Granlund and the GMP development team. 2012. GNU MP: The GNU Multiple Precision Arithmetic Library. https://gmplib.org/
[3] Laurent Fousse, et al. MPFR: A Multiple precision Binary Floating-point Library with Correct Rounding. https://doi.org/10.1145/1236463.1236468
[4] Ulirich Kulisch. 2013. Computer arithmetic and validity: Theory, implementation, and applications
[5] M. J. Schulte and E. E. Swartzlander. 2000. A family of variable precision interval arithmetic processors. https://doi.org/10.1109/12.859535
| 3
INTRODUCTION: MY WORK
Our previous work[6]: a VP FP hardware accelerator:
• Supports the UNUM type I format in
main memory
• Does computation internally with another
(hardware friendly) FP format
• Supports Interval Arithmetic (IA)
This work:
▪ Refines the UNUM type I FP format.
▪ Proposes a new VP FP architecture.
▪ Proposes a new programming model.
▪ Benchmarks our system.
[6] A. Bocco, Y. Durand, F. Dinechin, 2019, SMURF: Scalar Multiple-precision UNUM RISC-V Floating-point Accelerator for Scientific Computing.
Rocket tile
UNUM
co-proc
RoCC
LSU
FPU
LSU$
L1
R
A
M
Scratchpad
$
L1
R
A
M
1
2
3
4
5RISC-VRocket
Chip
| 4
OUTLINE
• Choice of the memory format: the UNUM type I
• Refinements on the UNUM type I FP format
• The adopted VP FP Architecture
• The programming model
• System benchmark: gauss elimination solver
• Conclusions
| 5
OUTLINE
• Choice of the memory format: the UNUM type I
• Refinements on the UNUM type I FP format
• The adopted VP FP Architecture
• The programming model
• System benchmark: gauss elimination solver
• Conclusions
| 6
CHOICE OF THE MEMORY FORMAT: THE UNUM TYPE I
We decided to use the UNUM type I FP format in main memory
• It is 6 sub-fields self-descriptive FP format
3 more that conventional IEEE 754 FP numbers
• WHY?
• UNUM is a VP FP format
• It self-encodes the exponent and fraction field lengths
However UNUM type I has some peculiarities to be fixed:
• How to organize UNUM arrays in main memory
• How to organize the UNUM fields in memory
s e f u es-1 fs-1
sign exponent fraction ubit exponent
size
fraction
size
es bits fs bits
| 7
OUTLINE
• Choice of the memory format: the UNUM type I
• Refinements on the UNUM type I FP format
• The adopted VP FP Architecture
• The programming model
• System benchmark: gauss elimination solver
• Conclusions
| 8
REFINEMENTS ON THE UNUM TYPE I FP FORMAT:
- UNUM FIELD ORGANIZATION
For a UNUM/ubound which spans multiple addresses in main memory it is
important to have the descriptor fields present in the lower addresses.
➢ We have re-organized the order of the fields for UNUM and ubound
left right left right left right
s u es-1 fs-1 s u es-1 fs-1 e e f f
s u es-1 fs-1 e f
2
1
LSB MSB
@1’:
pFF--FF
00--00
U1
?
?
?
?
?
?
p
@1’:
FF--FF
00--00
U1
?
@2’:U2 ?
| 9
REFINEMENTS ON THE UNUM TYPE I FP FORMAT:
- UNUM ARRAY ORGANIZATION
Handling a two-element UNUM array on main memory with p bits parallelism
U2_0 U2_1 U2_2
U1_0 U1_1
p p
2p 3p0 p
p
U2 :
U1 :
bit
length
p
@2’:
@1’:
FF--FF
00--00 1
U1_1
U1_0
U2_1
U2_0
U2_2
@2’’:
@1’:
pFF--FF
00--00 2
U1_1
U1_0
U2_2
U2_1
U2_0 U3_2
U3_1
U3_0
U3_2
U3_1
U3_0
!
U3=U1*U2
Array support:
Guarantee affine
addressing
scheme
| 10
OUTLINE
• Choice of the memory format: the UNUM type I
• Refinements on the UNUM type I FP format
• The adopted VP FP Architecture
• The programming model
• System benchmark: gauss elimination solver
• Conclusions
| 11
• 1 integer register file (iRF): 32 integer general purpose register
(GPR) + pc, in the main processor.
• 1 g-bound register file (gRF): 32 entries, in the co-processor.
• UNUMs/u-bounds are strictly considered as memory formats:
• Load operations:• Load UNUMs/u-bounds from the main memory, and converts them into internal g-bounds.
• Store operations:• Convert internal g-bounds (entries of the internal gRF) into u-bounds. Store the latter the
main memory.
• The coprocessor internal parallelism is fixed to 64 bits
• Coprocessor’s status registers:
• DUE
• SUE
• MBB
• WGP
THE ADOPTED VP FP ARCHITECTURE
Rocket tile
UNUM
co-proc
RoCC
LSU
FPU
LSU$
L1
R
A
M
Scratchpad
$
L1
R
A
M
1
2
3
4
5RISC-V
Rocket
ChipNEW!
| 12
UNUM format is variable length (up to a maximum length)
▪ It is impossible to have compacted arrays having random access to its
elements
➢ We define the Maximum Byte Budget (MBB) as the maximum length
that a UNUM number can have in main memory
➢ The user can address VP FP numbers specifying their length with Byte
granularity.
THE MBB: MAXIMUM BYTE BUDGET
LSU
g0
g1
g2
g3
g4
G2U BMF
u0
u1
u2
u3
u4
u’0
u’1
u’2
u’3
u’4
MBB
MBB
MBB
| 13
s u es-1 fs-1
1a) 0 1 1-----1 1-----1
2a) 1 1 1-----1 1-----1
3a) 0 0 1-----1 1-----1
4a) 1 0 1-----1 1-----1
5a) 0 1 1-----1 1-----1
6a) 1 1 1-----1 1-----1
7a) 0 1 es-1 fs-1
8a) 1 1 es-1 fs-1
9a) s u es-1 fs-1
1b) 0 1 1--------1 1--------1
2b) 1 1 1--------1 1--------1
3b) 0 0 1--------1 1--------1
4b) 1 0 1--------1 1--------1
5b) 0 1 es-1 fs-1
6b) 1 1 es-1 fs-1
7b) s u es-1 fs-1
s u es-1 fs-1
0
-∞↓
+∞) right
(-∞ left
x
+∞↓
1--------------1
1------1
1------------1
e
1--------------1
fs_maxes_max
1---------------------------------1
1---------------------1
1------------------------1
f
1---------------------------------1
sNaN
qNaN
1--------------1
1--------------1
1---------------------------------1
1---------------------------------1
1--------------1
1--------------1
1-------------------------------10
1-------------------------------10
UN
US
ED
BIT
S
fss’’ess’’ bit
length
MBB*8
fses
1------1
1------------1
e
1---------------------1
1------------------------1
f
-∞↓
+∞) right
(-∞ left
x
+∞↓
sNaN
qNaN
+∞) right
(-∞ left
fss’ess’
UNUSED BITS
THE BMF: BOUNDED MEMORY FORMAT
MBB
>=
max unum lengh
MBB
<
max unum lengh
| 14
OUTLINE
• Choice of the memory format: the UNUM type I
• Refinements on the UNUM type I FP format
• The adopted VP FP Architecture
• The programming model
• System benchmark: gauss elimination solver
• Conclusions
| 15
01: k = 0
02: while convergence not reached do
03: for i := 1:n do
04: =0
05: for j := 1:n do
06: if j ≠ i then
07: 𝝈 += 𝒂𝒊𝒋𝒙𝒋(𝒌)
08: end
09: end
10: 𝒙𝒊(𝒌+𝟏)
=𝟏
𝒂𝒊𝒊(𝒃𝒊 − 𝝈)
11: end
12: k=k+1
13: end
Rocket tile
UNUM
co-proc
RoCC
LSU
FPU
LSU
Scratchpad
$
L1
R
A
M1
2
3
RISC-V
Our hardware is best suited for VP kernels which exploit three
different storage types:
• The external (main memory) storage
• The intermediate (L1 cache) storage
• The internal (register-level) storage
THE COPROCESSOR PROGRAMMING MODEL
bĀ x· =
x
Legend:Outermost loop
Intermediate loop
Innermost loop
UNUM
co-proc
𝝈
| 16
OUTLINE
• Choice of the memory format: the UNUM type I
• Refinements on the UNUM type I FP format
• The adopted VP FP Architecture
• The programming model
• System benchmark: gauss elimination solver
• Conclusions
| 17
SYSTEM BENCHMARK: GAUSS ELIMINATION SOLVER
Our system benchmarked with a Gauss elimination solver, both in
UNUM (scalar) and ubound (interval), showed:
• A gain of up to 65 decimal digits on IEEE double
• The result precision is constrained by the adopted precision in memory.
• Intervals do not converge always but it is useful in the computational
error estimation (Ax-b).
• A speed up of 4-10x with respect to the MPFR software library
| 18
OUTLINE
• Choice of the memory format: the UNUM type I
• Refinements on the UNUM type I FP format
• The adopted VP FP Architecture
• The programming model
• System benchmark: gauss elimination solver
• Conclusions
| 19
CONCLUSIONS
This work proposes a Variable Precision (VP) Floating Point (FP) computing
system, based on RISC-V, for high performance computing servers as an
alternative to VP FP software routines.
• It supports UNUM/ubound format in main memory
• It supports several Unum Environments: from (1,1) to (4,8), up to 256 mantissa bits
• It supports a dedicated internal format in its Register File
• 32 intervals; Each interval endpoint can have up to 512 mantissa bits
• With the adopted memory format (BMF) it supports VP FP in main memory
• User can decide the memory footprint of data with a Byte definition
• With the adopted programming model, it is possible to extend VP FP high
precision variables in main memory.
• The result precision can be significantly improved.
• Its flops performances are better than software libraries (MPFR) and they
stays within the same range of a regular fixed-precision IEEE FPU.
Leti, technology research institute
Commissariat à l’énergie atomique et aux énergies alternatives
Minatec Campus | 17 rue des Martyrs | 38054 Grenoble Cedex | France
www.leti.fr
THANK YOU FOR
YOUR ATTENTION!
Contacts:
Andrea BOCCO