drisa: a dram-based reconfigurable in-situ acceleratorshuangchenli/tr/drisa v1.0.pdf · scalable...

63
Scalable and Energy-efficient Architecture Lab (SEAL) http://seal.ece.ucsb.edu/ SEAL@UCSB Scalable and Energy-efficient Architecture Lab (SEAL) DRISA: A DRAM-based Reconfigurable In-Situ Accelerator Shuangchen Li, Dimin Niu, Krishna T. Malladi, Hongzhong Zheng, Bob Brennan, Yuan Xie University of California, Santa Barbara Memory Solutions Lab, Samsung Semiconductor Inc.

Upload: others

Post on 02-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

http://seal.ece.ucsb.edu/ SEAL@UCSB

Scalable and Energy-efficient Architecture Lab (SEAL)

DRISA: A DRAM-based

Reconfigurable In-Situ Accelerator

Shuangchen Li, Dimin Niu, Krishna T. Malladi,

Hongzhong Zheng, Bob Brennan, Yuan Xie

University of California, Santa Barbara

Memory Solutions Lab, Samsung Semiconductor Inc.

Page 2: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Motivation and Observation

• Merging the computing resources

and memory fabrics

2

1.E+00

1.E+01

1.E+02

1.E+03

1E+00 1E+01 1E+02 1E+03 1E+04

Norm

aliz

ed O

n-c

hip

M

em

.Capacity p

er A

rea

Normalized Peak Perf. per Area

Page 3: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Motivation and Observation

• Merging the computing resources

and memory fabrics

– Memory-rich processor: low memory

capacity

2

Shidiannao (ASICs)

Dadiannao

TITAN X (GPU)

1.E+00

1.E+01

1.E+02

1.E+03

1E+00 1E+01 1E+02 1E+03 1E+04

Norm

aliz

ed O

n-c

hip

M

em

.Capacity p

er A

rea

Normalized Peak Perf. per Area

Memory-rich

Processor

Page 4: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Motivation and Observation

• Merging the computing resources

and memory fabrics

– Memory-rich processor: low memory

capacity

– Compute-capable memory: low

performance

2

Shidiannao (ASICs)

BufferedComp

NeuroCube

Dadiannao

TITAN X (GPU)

1.E+00

1.E+01

1.E+02

1.E+03

1E+00 1E+01 1E+02 1E+03 1E+04

Norm

aliz

ed O

n-c

hip

M

em

.Capacity p

er A

rea

Normalized Peak Perf. per Area

Compute-capable

Memory (PIM)

Memory-rich

Processor

Page 5: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Motivation and Observation

• Merging the computing resources

and memory fabrics

– Memory-rich processor: low memory

capacity

– Compute-capable memory: low

performance

2

Shidiannao (ASICs)

BufferedComp

NeuroCube

Dadiannao

This Work

TITAN X (GPU)

1.E+00

1.E+01

1.E+02

1.E+03

1E+00 1E+01 1E+02 1E+03 1E+04

Norm

aliz

ed O

n-c

hip

M

em

.Capacity p

er A

rea

Normalized Peak Perf. per Area

Compute-capable

Memory (PIM)

Memory-rich

Processor

Page 6: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Motivation and Observation

• Merging the computing resources

and memory fabrics

– Memory-rich processor: low memory

capacity

– Compute-capable memory: low

performance

2

To have BOTH:

(1) Use DRAM technology

(2) Remove “sys-memory” constraints

Building an accelerator with DRAM

technology

Shidiannao (ASICs)

BufferedComp

NeuroCube

Dadiannao

This Work

TITAN X (GPU)

1.E+00

1.E+01

1.E+02

1.E+03

1E+00 1E+01 1E+02 1E+03 1E+04

Norm

aliz

ed O

n-c

hip

M

em

.Capacity p

er A

rea

Normalized Peak Perf. per Area

Compute-capable

Memory (PIM)

Memory-rich

Processor

Page 7: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Key Ideas and Approaches

3

To have BOTH:

(1) Use DRAM technology

(2) Remove “sys-memory” constraints

Building an accelerator with DRAM

technology

Page 8: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Key Ideas and Approaches

3

To have BOTH:

(1) Use DRAM technology

(2) Remove “sys-memory” constraints

Building an accelerator with DRAM

technology

DRAM technology

Page 9: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Key Ideas and Approaches

3

To have BOTH:

(1) Use DRAM technology

(2) Remove “sys-memory” constraints

Building an accelerator with DRAM

technology

DRAM technology

Logic Incompatible

Page 10: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Key Ideas and Approaches

3

To have BOTH:

(1) Use DRAM technology

(2) Remove “sys-memory” constraints

Building an accelerator with DRAM

technology

DRAM technology

Logic Incompatible

Simple Boolean logic

Operation

Bitline

SA

Cells

NOR

Page 11: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Key Ideas and Approaches

3

To have BOTH:

(1) Use DRAM technology

(2) Remove “sys-memory” constraints

Building an accelerator with DRAM

technology

DRAM technology

Logic Incompatible

Simple Boolean logic

Operation

General Purpose

Reconfigurable

Bitline

SA

Cells

NOR

SHIFT

Page 12: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Key Ideas and Approaches

3

To have BOTH:

(1) Use DRAM technology

(2) Remove “sys-memory” constraints

Building an accelerator with DRAM

technology

DRAM technology

Logic Incompatible

Simple Boolean logic

operations

General Purpose

Reconfigurable

High Pref. Improve Parallelism

Unblock Data Mov.

Optimize Activation

Multi-subarray

active

Multi-bank active

Bitline

SA

Cells

NOR

SHIFT

Page 13: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Architecture Overview

• DRAM modifications:

4

(a) Chip

Group

Bank

Group

BankBank

Bank

Group

Page 14: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Architecture Overview

• DRAM modifications:

4

(a) Chip (b) Bank

Group

Bank

Group

bC

trl

Mat

Subarry

Mat

BankBank

Bank

Group

Page 15: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Architecture Overview

• DRAM modifications:

4

(a) Chip (b) Bank

Group

Bank

Group

bC

trl

Mat

(c) Subarray and mat

sCtrl

DRAM Cells

SA supports Boolean logic operations

Shifter Subarry

Mat

BankBank

Bank

Group

Page 16: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Architecture Overview

• DRAM modifications:

– Change decoders to controllers

4

(a) Chip (b) Bank

Group

Bank

Group

bC

trl

Mat

(c) Subarray and mat

sCtrl

DRAM Cells

SA supports Boolean logic operations

Shifter Subarry

Mat

BankBank

Bank

Group

Page 17: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Architecture Overview

• DRAM modifications:

– Change decoders to controllers

– Change SA to support logic operations

4

(a) Chip (b) Bank

Group

Bank

Group

bC

trl

Mat

(c) Subarray and mat

sCtrl

DRAM Cells

SA supports Boolean logic operations

Shifter Subarry

Mat

BankBank

Bank

Group

Page 18: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Architecture Overview

• DRAM modifications:

– Change decoders to controllers

– Change SA to support logic operations

– Add shifters

4

(a) Chip (b) Bank

Group

Bank

Group

bC

trl

Mat

(c) Subarray and mat

sCtrl

DRAM Cells

SA supports Boolean logic operations

Shifter Subarry

Mat

BankBank

Bank

Group

Page 19: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Architecture Overview

• DRAM modifications:

– Change decoders to controllers

– Change SA to support logic operations

– Add shifters

– Others: Group/Bank buffers helps internal data transfer, Bank/Subarray reorganization,

Spitted cell array regions 4

(a) Chip (b) Bank

Group

Bank

Group

bC

trl

Mat

(c) Subarray and mat

sCtrl

DRAM Cells

SA supports Boolean logic operations

Shifter Subarry

Mat

BankBank

Bank

Group

(a) Chip (b) Bank

Group

Bank

Group

bC

trl

Mat

(c) Subarray and mat

sCtrl

DRAM Cells

SA supports Boolean logic operations

Shifter Subarry

Mat

BankBank

Bank

Group

Page 20: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Make BL Be Able To Compute (1/2)

• Three solutions:

5

Bitline

SA

Cells

NOR

SHIFT

Page 21: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Make BL Be Able To Compute (1/2)

• Three solutions:

– 3T1C: natural NOR on BL

5

Rs

Rt

Rr

rWL

wBL

rBL

SA

wWL

3T1C-NOR

Bitline

SA

Cells

NOR

SHIFT

Page 22: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Make BL Be Able To Compute (1/2)

• Three solutions:

– 3T1C: natural NOR on BL

– 1T1C: adds gates or adopting AMBIT’s methods

5

Rs

Rt

Rr

rWL

wBL

rBL

SA

wWL

3T1C-NOR

10

0 1

01

0.3 0.6

0 1

<0.5 >0.5SA

and

Pre-load

orRs

Rt

Rr latch

logic gate

Rs

Rt

Rr

SAOr

1T1C-NOR/MIX

Bitline

SA

Cells

NOR

SHIFT

Page 23: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Make BL Be Able To Compute (1/2)

• Three solutions:

– 3T1C: natural NOR on BL

– 1T1C: adds gates or adopting AMBIT’s methods

– 1T1C-adder: adds full-adders to BL

5

Rs

Rt

Rr

rWL

wBL

rBL

SA

wWL

3T1C-NOR

10

0 1

01

0.3 0.6

0 1

<0.5 >0.5SA

and

Pre-load

orRs

Rt

Rr latch

logic gate

Rs

Rt

Rr

SAOr

1T1C-NOR/MIX

...

...

...

...latches

n-bit adder

Rs

Rt

Rr

SA

1T1C-ADDER

Bitline

SA

Cells

NOR

SHIFT

Page 24: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Make BL Be Able To Compute (2/2)

• Example: selector

6

𝑅 = (𝑆 == 1)? 𝑋: 𝑌

Bitline

SA

Cells

NOR

SHIFT

Page 25: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Make BL Be Able To Compute (2/2)

• Example: selector

6

𝑅 = 𝑆 ⋅ 𝑋 + ሚ𝑆 ⋅ 𝑌

𝑅 = (𝑆 == 1)? 𝑋: 𝑌

Bitline

SA

Cells

NOR

SHIFT

Page 26: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Make BL Be Able To Compute (2/2)

• Example: selector

6

𝑅 = 𝑆 ⋅ 𝑋 + ሚ𝑆 ⋅ 𝑌

෨𝑅 = NOR( NOR( ሚ𝑆, ෨𝑋), NOR(𝑆, ෨𝑌) )

NOR-only logic

𝑅 = (𝑆 == 1)? 𝑋: 𝑌

Bitline

SA

Cells

NOR

SHIFT

Page 27: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Make BL Be Able To Compute (2/2)

• Example: selector

6

𝑅 = 𝑆 ⋅ 𝑋 + ሚ𝑆 ⋅ 𝑌

෨𝑅 = NOR( NOR( ሚ𝑆, ෨𝑋), NOR(𝑆, ෨𝑌) )

NOR-only logic

𝑅 = (𝑆 == 1)? 𝑋: 𝑌

X

Y

S

Bitline

SA

Cells

NOR

SHIFT

Page 28: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Make BL Be Able To Compute (2/2)

• Example: selector

6

𝑅 = 𝑆 ⋅ 𝑋 + ሚ𝑆 ⋅ 𝑌

෨𝑅 = NOR( NOR( ሚ𝑆, ෨𝑋), NOR(𝑆, ෨𝑌) )

NOR-only logic

Step-1: ෨𝑋 = NOR(0, 𝑋)

𝑅 = (𝑆 == 1)? 𝑋: 𝑌

X

Y

S

!X

Bitline

SA

Cells

NOR

SHIFT

Page 29: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Make BL Be Able To Compute (2/2)

• Example: selector

6

𝑅 = 𝑆 ⋅ 𝑋 + ሚ𝑆 ⋅ 𝑌

෨𝑅 = NOR( NOR( ሚ𝑆, ෨𝑋), NOR(𝑆, ෨𝑌) )

NOR-only logic

Step-1: ෨𝑋 = NOR(0, 𝑋)

𝑅 = (𝑆 == 1)? 𝑋: 𝑌

X

Y

S

!X

!Y

Step-2: ෨𝑌 = NOR(0, 𝑌)

Bitline

SA

Cells

NOR

SHIFT

Page 30: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Make BL Be Able To Compute (2/2)

• Example: selector

6

𝑅 = 𝑆 ⋅ 𝑋 + ሚ𝑆 ⋅ 𝑌

෨𝑅 = NOR( NOR( ሚ𝑆, ෨𝑋), NOR(𝑆, ෨𝑌) )

NOR-only logic

Step-1: ෨𝑋 = NOR(0, 𝑋)

𝑅 = (𝑆 == 1)? 𝑋: 𝑌

X

Y

S

!X

!Y

!S

Step-2: ෨𝑌 = NOR(0, 𝑌)

Step-3: ሚ𝑆 = NOR(0, 𝑆)

Bitline

SA

Cells

NOR

SHIFT

Page 31: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Make BL Be Able To Compute (2/2)

• Example: selector

6

𝑅 = 𝑆 ⋅ 𝑋 + ሚ𝑆 ⋅ 𝑌

෨𝑅 = NOR( NOR( ሚ𝑆, ෨𝑋), NOR(𝑆, ෨𝑌) )

NOR-only logic

𝑅 = (𝑆 == 1)? 𝑋: 𝑌

X

Y

S

!X

!Y

!S

!(!X+!S)Step-4: tmp1 = NOR( ሚ𝑆, ෨𝑋)

Bitline

SA

Cells

NOR

SHIFT

Page 32: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Make BL Be Able To Compute (2/2)

• Example: selector

6

𝑅 = 𝑆 ⋅ 𝑋 + ሚ𝑆 ⋅ 𝑌

෨𝑅 = NOR( NOR( ሚ𝑆, ෨𝑋), NOR(𝑆, ෨𝑌) )

NOR-only logic

𝑅 = (𝑆 == 1)? 𝑋: 𝑌

X

Y

S

!X

!Y

!S

!(!X+!S)

!(!Y+S)

Step-4: tmp1 = NOR( ሚ𝑆, ෨𝑋)

Step-5: tmp2 = NOR(𝑆, ෨𝑌)

Bitline

SA

Cells

NOR

SHIFT

Page 33: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Make BL Be Able To Compute (2/2)

• Example: selector

6

𝑅 = 𝑆 ⋅ 𝑋 + ሚ𝑆 ⋅ 𝑌

෨𝑅 = NOR( NOR( ሚ𝑆, ෨𝑋), NOR(𝑆, ෨𝑌) )

NOR-only logic

𝑅 = (𝑆 == 1)? 𝑋: 𝑌

X

Y

S

!X

!Y

!S

!(!X+!S)

!(!Y+S)

!R

Step-4: tmp1 = NOR( ሚ𝑆, ෨𝑋)

Step-5: tmp2 = NOR(𝑆, ෨𝑌)

Step-6: ෨𝑅 = NOR(tmp1,tmp2)

Bitline

SA

Cells

NOR

SHIFT

Page 34: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Make BL Be Able To Compute (2/2)

• Example: selector

6

X

Y

S

!X

!Y

!S

!(!X+!S)

!(!Y+S)

!R

R

𝑅 = 𝑆 ⋅ 𝑋 + ሚ𝑆 ⋅ 𝑌

෨𝑅 = NOR( NOR( ሚ𝑆, ෨𝑋), NOR(𝑆, ෨𝑌) )

NOR-only logic

𝑅 = (𝑆 == 1)? 𝑋: 𝑌

Step-7: 𝑅 = NOR(0, ෨𝑅)

Bitline

SA

Cells

NOR

SHIFT

Page 35: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Shifters (1/2)

• Why include shifters:

– E.g., carry-in propagation

7

Bitline

SA

Cells

NOR

SHIFT

Page 36: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Shifters (1/2)

• Why include shifters:

– E.g., carry-in propagation

7

X0

Y0

Cin0

X1

Y1

Bitline

SA

Cells

NOR

SHIFT

Page 37: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Shifters (1/2)

• Why include shifters:

– E.g., carry-in propagation

7

X0

Y0

Cin0

S0

X1

Y1

Bitline

SA

Cells

NOR

SHIFT

Page 38: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Shifters (1/2)

• Why include shifters:

– E.g., carry-in propagation

7

X0

Y0

Cin0

S0

Cout0

X1

Y1

Bitline

SA

Cells

NOR

SHIFT

Page 39: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Shifters (1/2)

• Why include shifters:

– E.g., carry-in propagation

7

X1

Y1

X1

Y1

X0

Y0

Cin0

S0

Cout0

Cin1

Bitline

SA

Cells

NOR

SHIFT

Page 40: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Shifters (2/2)

• Multiple hierarchies:

8

Bitline

SA

Cells

NOR

SHIFT

Page 41: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Shifters (2/2)

• Multiple hierarchies:

– Intra-lane: bit shift inside 8 bit lane

8

Virtual lane (INT8) Virtual lane (INT8)

Bitline

SA

Cells

NOR

SHIFT

Page 42: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Shifters (2/2)

• Multiple hierarchies:

– Intra-lane: bit shift inside 8 bit lane

– Inter-lane: array element shift

8

Virtual lane (INT8) Virtual lane (INT8)

Bitline

SA

Cells

NOR

SHIFT

Page 43: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Shifters (2/2)

• Multiple hierarchies:

– Intra-lane: bit shift inside 8 bit lane

– Inter-lane: array element shift

– Forwarding: access any element in the array

8

Virtual lane (INT8) Virtual lane (INT8)

Bitline

SA

Cells

NOR

SHIFT

Page 44: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Putting Compute-capable BLs and Shifters Together

• Observations:

– CSA is preferred: reduction works fine

9

0

10

20

30

40

2 4 8 16

Cycle

s

Operand bit length

CSA FA

Page 45: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Putting Compute-capable BLs and Shifters Together

• Observations:

– CSA is preferred: reduction works fine

– Affordable MUL: need to have one operand within 2-bit

9

0

10

20

30

40

2 4 8 16

Cycle

s

Operand bit length

CSA FA

1

10

100

1000

1 2 4 8 16

Cycle

s

Operand-1 bit length

Operand-2 bit length = 2 bit 4 8 16

Page 46: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Optimizations for high performance

10

Page 47: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Optimizations for high performance

10

DRAM technology

Logic Incompatible

Simple Boolean logic+ Serially run

General Purpose

Reconfigurable

High Pref.

Page 48: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Optimizations for high performance

10

DRAM technology

Logic Incompatible

Simple Boolean logic+ Serially run

General Purpose

Reconfigurable

High Pref.

Page 49: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Optimizations for high performance

• Adopting commodity DRAM:

– 13-cycles for 8-bit CSA

– tRC (46ns) 10

DRAM technology

Logic Incompatible

Simple Boolean logic+ Serially run

General Purpose

Reconfigurable

High Pref.

1.E+00

1.E+01

1.E+02

1.E+03

1E+00 1E+01 1E+02 1E+03 1E+04

No

rma

lize

d O

n-c

hip

M

em

.Ca

pa

city p

er A

rea

Normalized Peak Perf. per Area

Compute-capable

Memory (PIM)

Memory-rich

Processor

Page 50: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Optimizations for high performance

• Adopting commodity DRAM:

– 13-cycles for 8-bit CSA

– tRC (46ns) 10

DRAM technology

Logic Incompatible

Simple Boolean logic+ Serially run

General Purpose

Reconfigurable

High Pref.

1.E+00

1.E+01

1.E+02

1.E+03

1E+00 1E+01 1E+02 1E+03 1E+04

No

rma

lize

d O

n-c

hip

M

em

.Ca

pa

city p

er A

rea

Normalized Peak Perf. per Area

Compute-capable

Memory (PIM)

Memory-rich

Processor

un-optimized

Page 51: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Optimizations for high performance

• Adopting commodity DRAM:

– 13-cycles for 8-bit CSA

– tRC (46ns) 10

DRAM technology

Logic Incompatible

Simple Boolean logic+ Serially run

General Purpose

Reconfigurable

High Pref. Improve Parallelism

Unblock Data Mov.

Optimize Activation

Target

1.E+00

1.E+01

1.E+02

1.E+03

1E+00 1E+01 1E+02 1E+03 1E+04

No

rma

lize

d O

n-c

hip

M

em

.Ca

pa

city p

er A

rea

Normalized Peak Perf. per Area

Compute-capable

Memory (PIM)

Memory-rich

Processor

un-optimized

Page 52: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Experiment Setup

• DRISA circuit simulator:

– Heavily modified CACTI

– Digital circuit (controller, logic gates)

• From Design Compiler synthesis

• Scaled to DRAM process with 20% perf.

Overhead and 80% area overhead (ISCAS’99)

• DRISA performance simulator:

– A behavior-level simulator

– Including a mapping optimization

framework

11

Performance

Simulator

[In-house]

Mapping

scheme

Design

options

# mat/

subarr

y/bank

Speed

Power

Circuit Simulator

[DesignCompiler+

CACTI-3DD]

Devise

parameter

Design

options

Circuits

Latency/

cyclesPower/ops

Area

Leakage

NN

topology

Page 53: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Binary weight, 8-bit activation CNN inference

case study

12

1E-02

1E-01

1E+00

1E+01

1E+02

1 8 64 1 8 64 1 8 64 1 8 64

AlexNet vgg-16 vgg-19 resnet-152 GM

Perf

/Are

a (

fr./

s/m

m2)

3T1C 1T1C-nor

1T1C-mixed 1T1C-adder

GPU-INT

Page 54: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Binary weight, 8-bit activation CNN inference

case study

12

1E-02

1E-01

1E+00

1E+01

1E+02

1 8 64 1 8 64 1 8 64 1 8 64

AlexNet vgg-16 vgg-19 resnet-152 GM

Perf

/Are

a (

fr./

s/m

m2)

3T1C 1T1C-nor

1T1C-mixed 1T1C-adder

GPU-INT

Page 55: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Binary weight, 8-bit activation CNN inference

case study

12

1E-02

1E-01

1E+00

1E+01

1E+02

1 8 64 1 8 64 1 8 64 1 8 64

AlexNet vgg-16 vgg-19 resnet-152 GM

Perf

/Are

a (

fr./

s/m

m2)

3T1C 1T1C-nor

1T1C-mixed 1T1C-adder

GPU-INT

Page 56: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Binary weight, 8-bit activation CNN inference

case study

12

1E-02

1E-01

1E+00

1E+01

1E+02

1 8 64 1 8 64 1 8 64 1 8 64

AlexNet vgg-16 vgg-19 resnet-152 GM

Perf

/Are

a (

fr./

s/m

m2)

3T1C 1T1C-nor

1T1C-mixed 1T1C-adder

GPU-INT

Page 57: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Binary weight, 8-bit activation CNN inference

case study

12

1E-02

1E-01

1E+00

1E+01

1E+02

1 8 64 1 8 64 1 8 64 1 8 64

AlexNet vgg-16 vgg-19 resnet-152 GM

Perf

/Are

a (

fr./

s/m

m2)

3T1C 1T1C-nor

1T1C-mixed 1T1C-adder

GPU-INT

Page 58: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Binary weight, 8-bit activation CNN inference

case study

12

1E-02

1E-01

1E+00

1E+01

1E+02

1 8 64 1 8 64 1 8 64 1 8 64

AlexNet vgg-16 vgg-19 resnet-152 GM

Perf

/Are

a (

fr./

s/m

m2)

3T1C 1T1C-nor

1T1C-mixed 1T1C-adder

GPU-INT

Page 59: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Binary weight, 8-bit activation CNN inference

case study

• 3T1C is not good

– The lowest area overhead

– Large memory cells

12

1E-02

1E-01

1E+00

1E+01

1E+02

1 8 64 1 8 64 1 8 64 1 8 64

AlexNet vgg-16 vgg-19 resnet-152 GM

Perf

/Are

a (

fr./

s/m

m2)

3T1C 1T1C-nor

1T1C-mixed 1T1C-adder

GPU-INT

Page 60: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Binary weight, 8-bit activation CNN inference

case study

• 3T1C is not good

– The lowest area overhead

– Large memory cells

• 1T1C-adder is not the

best

– The best peak performance

– Low effective performance

• 1T1C-mixed is the best

solution

12

1E-02

1E-01

1E+00

1E+01

1E+02

1 8 64 1 8 64 1 8 64 1 8 64

AlexNet vgg-16 vgg-19 resnet-152 GM

Perf

/Are

a (

fr./

s/m

m2)

3T1C 1T1C-nor

1T1C-mixed 1T1C-adder

GPU-INT

Page 61: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

More in the paper

• Microarchitectures of BL-logic operations and shifter

• Interface design

• Optimizations for high performance

• Impact of variation

• CNN mapping and optimizations

• Detail experiment setup and more results

13

Page 62: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

Summary

• In-situ computing: building an accelerator with DRAM

technology

– DRAM for large memory capacity

– BL-computing logic design + Shifter for general purpose instructions

– Optimized for high computing performance

14

• Experiments on binary CNN

acceleration:

– perf. per area 8.8x than

ASIC,7.7x than GPU

– energy efficiency per area:

1.2x than ASIC, 15x than GPU

Multi-subarray

active

Multi-bank active

Bitline

SA

Cells

NOR

Bitline

SA

Cells

NOR

SHIFT

Page 63: DRISA: A DRAM-based Reconfigurable In-Situ Acceleratorshuangchenli/TR/DRISA v1.0.pdf · Scalable and Energy-efficient Architecture Lab (SEAL) Key Ideas and Approaches 3 To have BOTH:

Scalable and Energy-efficient Architecture Lab (SEAL)

http://seal.ece.ucsb.edu/ SEAL@UCSB

Scalable and Energy-efficient Architecture Lab (SEAL)

DRISA: A DRAM-based

Reconfigurable In-Situ AcceleratorShuangchen Li, Dimin Niu, Krishna T. Malladi,

Hongzhong Zheng, Bob Brennan, Yuan Xie

University of California, Santa Barbara

Memory Solutions Lab, Samsung Semiconductor Inc.

Questions?