a probabilistic self-annealing compute fabric based on 560...
TRANSCRIPT
2020 Symposia on VLSI Technology and Circuits
A Probabilistic Self-Annealing Compute Fabric Based on 560 Hexagonally Coupled
Ring Oscillators for Solving Combinatorial Optimization Problems
Ibrahim Ahmed1, Po-Wei Chiu1, and Chris H. Kim1
1 University of Minnesota
2020 Symposia on VLSI Technology and Circuits
Outline• Motivation and Prior Art• Circuit and Architecture Design• Mapping Graph Problems to Hardware• Measured Results• Conclusion
Slide 1CA3.3
2020 Symposia on VLSI Technology and Circuits
Outline• Motivation and Prior Art• Circuit and Architecture Design• Mapping Graph Problems to Hardware• Measured Results• Conclusion
Slide 2CA3.3
2020 Symposia on VLSI Technology and Circuits
Motivation
• Solution space and time increases rapidly with the number of variables
• Intractable to solve using traditional von Neumann computer
Slide 3CA3.3
0 200 400 600 800 1000Number of variables, n
10010201040106010801010010120
Solu
tion
time,
T
COP: Boolean satisfiability
𝑻 = 𝟏. 𝟑𝟎𝟕𝒏
R. Paturi, JACM, 2005 (UCSD)
2020 Symposia on VLSI Technology and Circuits
Motivation
Slide 4CA3.3
• Ising computation uses energy transfer between spins to reach global minima• A network of coupled oscillators can solve many NP-hard and NP-complete
problems, such as COPs
𝐻(𝑠) = −'𝐽𝑖𝑗 ⋅ 𝑠𝑖𝑠𝑗𝑖,𝑗
−'ℎ𝑖𝑠𝑖𝑖
𝐽89 𝐽78
𝐽56 𝐽45
𝐽23 𝐽12 𝐽34 𝐽14
𝐽69 𝐽58 𝐽47 Local minima
Global minimaH
amilt
onia
n H
(s)
Spin states
𝐽25
COP mapped to graph Network of Spin𝐽!" = Coupling between spins 𝑖 and 𝑗𝑠! = State of spin 𝑖, ℎ! = Local field to spin 𝑖𝐻(𝑠) = Hamiltonian of the system
A. Lucas, Front. Phys., 2014
2020 Symposia on VLSI Technology and Circuits
Prior Art
Slide 5CA3.3
Volume= 700 ft3
D-Wave
Quantum computer
GooglePros: Can theoretically solve a wide variety or problemsCons:1. Requires sophisticated hardware2. Very sensitive to noise3. Operating temperature -273.14°C
Power =25KW
Nano-magnet based
Special process
D. Pierangeli, PRL, 2019
P. Debashis, IEDM, 2016
LASER based
Pros: Can achieve coupling dynamicsCons:1. Requires special process with
fabrication challenges2. Limited practical demonstrations
2020 Symposia on VLSI Technology and Circuits
Prior Art
Slide 6CA3.3
M. Yamaoka, JSSC, 2016T. Takemoto, ISSCC, 2019
Pros: Can theoretically solve a wide variety or problemsCons:1. No coupling dynamics : computation
time and energy may be higher2. Search for better solution is time
dependent
ASIC implementation Breadboard and PCB implementation
T. Wang, arXiv, 2017
T. Wang, arXiv, 2019
Pros: Can achieve coupling dynamicsCons:Not practical for large number of programmable spins
CMOS integrated Ising computer has the best of both worlds
2020 Symposia on VLSI Technology and Circuits
Outline• Motivation and Prior Art• Circuit and Architecture Design• Mapping Graph Problems to Hardware• Measured Results• Conclusion
Slide 7CA3.3
2020 Symposia on VLSI Technology and Circuits
Choice of CMOS Spin
Slide 8
LC oscillator:• Pros: low phase noise, frequency
insensitive to PVT• Cons: very large area, narrow
frequency tuning range
Ring oscillator (ROSC):• Pros: small area, simple design,
wide frequency tuning range• Cons: higher phase noise, higher
sensitivity to PVT
Type of oscillator not important for solving Ising problems (e.g. mechanical, electrical, chemical, etc.)
464 µm
Inductor layout
RO
SC la
yout
448
µm
L
RC+-
LC oscillator Ring oscillator
CA3.3
2020 Symposia on VLSI Technology and Circuits Slide 9CA3.3
ROSC Coupled Using Digital Latches
1 2 3 4
1 2 3 4
EN1
EN2
Phase comparator
1 2 3 4
1 2 3 4
EN1
EN2
Phase comparator
a) Positive coupling b) Negative coupling
V1
V2
V1
V2 V1V2
V1V2
Negative coupling
Positive coupling
c) Phase comparator
Phase,𝜙1
Phase,𝜙2
𝑠 = 0
𝑠 = 1
ROSC1 ROSC1
ROSC2 ROSC2
Phase,𝜙1
Phase,𝜙2
• Any coupling medium that enables energy transfer may couple ROSCs• ROSC and digital latches are designed with global and local enable signals
2020 Symposia on VLSI Technology and Circuits
Choice of Architecture
Slide 10CA3.3
s2 = 0s3 = 0s4 = 1
s559 = 0s560 = 1
s1 = 1
Hexagonally coupled 28x20 ROSC array
Latch-based coupling
i-th cell and neighbors
Osc. 1Osc. 2Osc. 3Osc. 4
Osc. 560
Osc. 559
Osc. 5 s5 = 1
ni = number of coupled neighbors of i-th cell, Jij = coupling weight between cells i and j
ROSC waveform
Cell states (2560≈3.8×10168 combinations)
Isin
g H
amito
nian
(Esy
stem
)
Local minima
Global minima
Cell state𝐸𝑠𝑦𝑠𝑡𝑒𝑚 = −))𝐽𝑖𝑗 ∙ 𝑠𝑖 ∙ 𝑠𝑗
𝑛𝑖
𝑗=1
𝑁
𝑖=1
... ...
Proposed architecture ROSC waveform and unit cell states System energy of the Ising computer
• Hexagonal unit cell maximizes the number of neighbors in 2D plane• Latch based coupling between cells is digitally controlled
2020 Symposia on VLSI Technology and Circuits
ROSC and Latch Design
Slide 11
NAND gate
123
5 6
4
7
CLK
Ctrl
Ctrl
Ctrl
7
InverterSynchronizing Clock signal
G_REN
L_REN
Seven stage Ring oscillator
ROSC:• Seven stage ROSC• Global and local enable signals,
G_REN and L_REN, respectively, controls ROSC activation
• Measured frequency: 120MHz
Digital Latch:• Designed with back to back inverters• Global and local enable signals, G_LEN
and L_LEN, respectively, controls latch activation
Digital latch
IN2Va
Vb IN1
Va
Vb
Vg1
Vg2
Vg1
Vg2Va
Vg1Vg2
Vb
Latch Control signals
G_LEN
L_LEN
Va
G_LEN
CA3.3
2020 Symposia on VLSI Technology and Circuits
Modular Unit Cell: Circuit Blocks
Slide 12CA3.3
G_REN
L_REN
CLK
Ai<0>
Ai<4>
Aj<4> 6
2
3 2
D Q
c) Read block
Ai<4>
Qij
a) ROSC block
Aj<0>G_LEN
L_LEN<1:3>
3
b) Latch coupling block
Ai<0>
j ≠ i
D Q D Q D Q D Q
d) Scan block
L_LEN<1:3>L_REN
SCLK_ISCLK_O
DATA_I DATA_O
• Read block samples one of the neighboring ROSC• Scan block programs the graph: four program bit per cell, 2240 bit for the chip
2020 Symposia on VLSI Technology and Circuits
Timing Based Annealing Mechanism
Slide 13CA3.3
Time (µs)
RO
SC p
erio
d (n
s)
0.0 0.2 0.4 0.6 0.8 1.0 1.223
4
5
67
Stable states Stable states Stable states
G_LEN
• Annealing helps the chip to find a better local minima• We periodically turn off the coupling latches and let spin states to change• To reduce delay, we only annealed the chip three times for measurement
65nm CMOS, VDD 1.0V, 25°C
2020 Symposia on VLSI Technology and Circuits
Die Photo and Chip Summary
Slide 14
Application
Process
Architecture
Voltage
Peak powerPower per cell
Combinatorial optimization problems
65nm CMOSROSC, latch based
coupling, self-annealing1.0V
Core: 0.53mm2
23mW41µW
AreaChip: 1.44mm2
Unit cell: 0.00095mm2
28x20 hexagonally coupled ring oscillator
array
On-chip CLK
760µm
701µ
m
Read scans
28x20 array
Peripheral circuit
Die photo Full chip layout Chip summary1.2mm
1.2m
m
• 28x20=560 coupled oscillators (only limiting factor is chip area)
CA3.3
2020 Symposia on VLSI Technology and Circuits
Outline• Motivation and Prior Art• Circuit and Architecture Design• Mapping Graph Problems to Hardware• Measured Results• Conclusion
Slide 15CA3.3
2020 Symposia on VLSI Technology and Circuits
Mapping Graph Problems to Hardware
Slide 16
State 0State 1Edge
b) “Difficult” COP (measured)a) “Easy” COP (measured)
• Randomly generate spatial graph using an algorithm that can be directly mapped to the chip without complicated embedding algorithm
• “Easy” COP: 2-color graphs such as checker board pattern• “Difficult” COP: 4 or more color graphs
CA3.3
2020 Symposia on VLSI Technology and Circuits
Introduction to Max Cut Problem
Slide 17
σ1 1
1
2
23
12
2
2
1
3
σ6
σ5 σ2
σ3 σ4
• The problem of finding a maximum cut in a graph is known as the Max-Cut Problem
• Finding max-cut of a graph is an NP-hard problem
1
1
2
2 3
12
2
2
1
3
1
1
2
2 3
12
2
2
1
3
1
1
2
2 3
12
2
2
1
3
1
1
2
2 3
12
2
2
1
3
Cut size = 7 Cut size = 9
Cut size = 12 Cut size = 16 (max)
CA3.3
2020 Symposia on VLSI Technology and Circuits
Introduction to Max Cut Problem
Slide 18CA3.3
σ1 1
1
2
23
12
2
2
1
3
σ6
σ5 σ2
σ3 σ4 𝐻 = Hamiltonian of the system𝜎! = Spin status of magnet 𝑖 {+1 or -1}𝑤!" = weight between magnets 𝑖 and 𝑗
𝐻 𝜎 = −+",$
(−𝑤"$) 𝜎"𝜎$
= +%"&& '()*+
(−𝑤"$) + +,-./ '()*+
𝑤"$
= +%"&& '()*+
(−𝑤"$) + +",$
𝑤"$ − +%"&& '()*+
𝑤"$
=+-00
𝑤"$ − 2× +%"&& '()*+
𝑤"$
Cut size
• Ising Hamiltonian = [sum of all weights] – 2×[cut size]
2020 Symposia on VLSI Technology and Circuits
Outline• Motivation and Prior Art• Circuit and Architecture Design• Mapping Graph Problems to Hardware• Measured Results• Conclusion
Slide 19CA3.3
2020 Symposia on VLSI Technology and Circuits
Repeated Measurement of Same Graph
Slide 20CA3.3
Rows (1-26)
Col
umns
(1-1
8)
0.00.20.40.60.8EdgeVertex
Graph mapped to chip
Probability of cell state flip (100 iterations)
1.0
b)a)
0 20 40 60 80 1000
0.20.40.60.81.0
Iteration
Max
-cut
(nor
mal
ized
)
c)
Measured solution for same graph
Hamming distance (between iterations)0 0.2 0.4 0.6 0.8 1
05
10152025
Prob
abili
ty (%
)
d)
• Unit cells may change states between iterations while the final cut values remain similar : proving chip is exploring various local minima
2020 Symposia on VLSI Technology and Circuits
Max-Cut Results for Graph Size 15x15
Slide 21
150 different max-cut problems1M random solutions per problem
0 0.2 0.4 0.6 0.8 1.00
10
20
30
40
Prob
abili
ty (%
)
Normalized cut-value
Graph size: 15x15Search space: 215x15≈5.4x1067
• 150 difficult COPs are mapped and max-cut results are measured• Annealing is done only three times as compared to thousands of times in quantum computers
2020 Symposia on VLSI Technology and Circuits
Max-Cut Results for Different Graph Sizes
Slide 22CA3.3
0 0.2 0.4 0.6 0.8 10
10
20
30
40150 max-cut problems1M random solutions per problem
0 0.2 0.4 0.6 0.8 10
10
20
30
40
0 0.2 0.4 0.6 0.8 10
10
20
30
40
0 0.2 0.4 0.6 0.8 10
10
20
30
40
Prob
abili
ty (%
)Pr
obab
ility
(%)
Prob
abili
ty (%
)Pr
obab
ility
(%)
Normalized cut-value Normalized cut-value
Normalized cut-value Normalized cut-value
Graph size: 6x6Search space: 26x6≈6.9x1010
Graph size: 15x15Search space: 215x15≈5.4x1067
Graph size: 10x10Search space: 210x10≈1.3x1030
Graph size: 26x18Search space: 226x18≈7.6x10140
• Consistently better max-cut solution compared to 1 million random search
2020 Symposia on VLSI Technology and Circuits
Comparison with Prior Arts
Slide 23
PRL ’19 [1] IEDM ’19 [2] ISSCC ’19 [3] ISSCC ’20 [4] Quantum computer [5] This work
Architecture All to all All to all Near neighbor Near neighbor Near neighbor Near neighbor
Technology Photonics IMT devices CMOS 40nm CMOS 65nm Superconductor CMOS 65nm
Coupling Light modulation
IMT interaction Digital logic Digital logic Qubit
interaction ROSC interaction
Operating temperature
Room temperature
Room temperature
Room temperature
Room temperature -273.14°C Room
temperature
Total measured solution 16 1 10 1 Many 1000
Delay 1000 cycles 1ms 22µs 30 cycles - 1µs - 10µs (est.)
Peak power Not reported Not reported Not reported Not reported 25KW 23mW
Measured accuracy
>95% (87% cases)
97.6% (4 spins) 98.8% 100%
(Easy COP) - 98%-100% (Easy)82%-100% (Diff.)
CA3.3
[1] D. Pierangeli, PRL, 2019 [2] S. Dutta, IEDM, 2019 [3] T. Takemoto, ISSCC, 2019
[4] Y. Su, ISSCC, 2020 [5] Z. Bian, D-Wave Systems, 2010
2020 Symposia on VLSI Technology and Circuits
Outline• Motivation and Prior Art• Circuit and Architecture Design• Mapping Graph Problems to Hardware• Measured Results• Conclusion
Slide 24CA3.3
2020 Symposia on VLSI Technology and Circuits
Conclusion• First hardware demonstration of a true coupling based
integrated CMOS Ising computer
• Probabilistic exploration of various local minima
• Mapped and solved 1000 COPs in the chip with an accuracy
of 82%-100%
Slide 25CA3.3