memory driven architecture: flipping the inequality computing vs. memory - technion … ·...

37
Uri Weiser Professor of Engineering Technion Memory driven architecture: flipping the inequality computing vs. memory 1 1 The talk covers research done by: Prof. Y. Etsion, Dr. Z. Guz, Prof. I. Keidar, Prof. A. Kolodny, S. Kvatinsky, Prof I. Keslassy, T. Zidenberg, Prof. A. Mendelson, Y. Nacson, Prof E. Friedman, Prof. U. Weiser

Upload: others

Post on 05-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Memory driven architecture: flipping the inequality computing vs. memory - Technion … · 2015-11-19 · Uri Weiser Professor of Engineering Technion Memory driven architecture:

Uri WeiserProfessor of Engineering Technion

Memory driven architecture:

flipping the inequality computing vs. memory

11

The talk covers research done by: Prof. Y. Etsion, Dr. Z. Guz, Prof. I. Keidar, Prof. A. Kolodny, S. Kvatinsky, Prof I. Keslassy, T. Zidenberg, Prof. A. Mendelson, Y. Nacson, Prof E. Friedman, Prof. U. Weiser

Page 2: Memory driven architecture: flipping the inequality computing vs. memory - Technion … · 2015-11-19 · Uri Weiser Professor of Engineering Technion Memory driven architecture:

“The large energy consumption associated with the ever increasing

internet use and the lack of efficient renewable energy sources to support it”

*Energy problems in data-com systems

*Energy problems in computers:

from systems to the chip level

*Advanced solar energy harvesting

Scent of Solutions?

This conference’s message

Page 3: Memory driven architecture: flipping the inequality computing vs. memory - Technion … · 2015-11-19 · Uri Weiser Professor of Engineering Technion Memory driven architecture:

The Trend Our Customers Expect

3From:

Page 4: Memory driven architecture: flipping the inequality computing vs. memory - Technion … · 2015-11-19 · Uri Weiser Professor of Engineering Technion Memory driven architecture:

The Trend Our Customers Expect

4From:

Page 5: Memory driven architecture: flipping the inequality computing vs. memory - Technion … · 2015-11-19 · Uri Weiser Professor of Engineering Technion Memory driven architecture:

Outline

The trends

The implications

The opportunities

Heterogeneous systems – some thoughts

Memristor Memory Intensive Architecture (MIA)

Energy: Optimal resource allocation in a

Heterogeneous system

How to start to think about Memory Intensive

Architecture

5

Page 6: Memory driven architecture: flipping the inequality computing vs. memory - Technion … · 2015-11-19 · Uri Weiser Professor of Engineering Technion Memory driven architecture:

The Trends

6

Page 7: Memory driven architecture: flipping the inequality computing vs. memory - Technion … · 2015-11-19 · Uri Weiser Professor of Engineering Technion Memory driven architecture:

Process Technology: Minimum Feature Size

Source: Intel, SIA Technology RoadmapSIA: Semiconductor Industry Association

0.01

Feature Size

(microns)

0.1

1

10

’68 ’71 ’76 ’80 ’84 ’88 ’92 ’96 ’00 ’04 ’08

IntelSIA

’14

130nm90nm

65nm45nm

32nm

180nm

22nm

7

14nm22nm

Page 8: Memory driven architecture: flipping the inequality computing vs. memory - Technion … · 2015-11-19 · Uri Weiser Professor of Engineering Technion Memory driven architecture:

Putting It All Together !

!!

!!

8

Page 9: Memory driven architecture: flipping the inequality computing vs. memory - Technion … · 2015-11-19 · Uri Weiser Professor of Engineering Technion Memory driven architecture:

The Trend

Where are we going?

The power wall

9

Page 10: Memory driven architecture: flipping the inequality computing vs. memory - Technion … · 2015-11-19 · Uri Weiser Professor of Engineering Technion Memory driven architecture:

Microarchitecture

VLSI Microarchitecture has been influenced by

concepts that have been around for a long time

We hit a power wall

Solutions

Top down – improve performance/power or

Throughput/power Heterogeneous Architecture

Bottom up – new devices ? Memory resistive devices?

10

Page 11: Memory driven architecture: flipping the inequality computing vs. memory - Technion … · 2015-11-19 · Uri Weiser Professor of Engineering Technion Memory driven architecture:

Hetero vs. Memory Intensive

Heterogeneous Architecture

For a while no major breakthrough in CPU technology

But the main reason is the POWER wall and energy/task

Accelerators to the rescue

Memory Intensive Architecture

Either a huge amount of memory cells close to logic, or

Logic cells close to lots of memory

Does it imply Symmetric processing?

11

Page 12: Memory driven architecture: flipping the inequality computing vs. memory - Technion … · 2015-11-19 · Uri Weiser Professor of Engineering Technion Memory driven architecture:

Flying machines - are they all the same?

Heterogeneous Systems

12

Page 13: Memory driven architecture: flipping the inequality computing vs. memory - Technion … · 2015-11-19 · Uri Weiser Professor of Engineering Technion Memory driven architecture:

Heterogeneous Computing:

Application Specific AcceleratorsPerformance/power

Apps range

Continue performance trend using Heterogeneous computing to

bypass current technological hurdles

Accelerators

13

Page 14: Memory driven architecture: flipping the inequality computing vs. memory - Technion … · 2015-11-19 · Uri Weiser Professor of Engineering Technion Memory driven architecture:

Heterogeneous Computing

Pe

rfo

rman

ces/

Po

we

r

General Purpose

Accelerator

14

Page 15: Memory driven architecture: flipping the inequality computing vs. memory - Technion … · 2015-11-19 · Uri Weiser Professor of Engineering Technion Memory driven architecture:

Heterogeneous Systems’

Environment

Environment with limited resources

Need to optimize system’s targets within

resource constrains

Resources may be:- Power, energy, area, space, $

System's targets may be:- Performance, power, energy, area, space, $

15

Page 16: Memory driven architecture: flipping the inequality computing vs. memory - Technion … · 2015-11-19 · Uri Weiser Professor of Engineering Technion Memory driven architecture:

Heterogeneous Computing

Heterogeneous system design under resource

constrainthow to divide resources (e.g. area, power, energy) to achieve maximum

system’s output (e.g. performance, throughput)

Accelerator target (an example): Minimize execution time under Area constraint

𝑎1𝑎2

𝑎3

𝑎𝑛

𝑎4

𝑨 =

𝒊=𝟏

𝒊=𝒏

𝒂𝒊

t2 t3 tnt1

time

ti = execution time of an application’s section (run on a reference computing system)

Example:

16

Page 17: Memory driven architecture: flipping the inequality computing vs. memory - Technion … · 2015-11-19 · Uri Weiser Professor of Engineering Technion Memory driven architecture:

MultiAmdahl:

t1* F1(a1)+ t2* F2(a2) + + tn* Fn(an)

a4

𝑎1

𝑎2

𝑎3

𝑎𝑛

t2 t3 tnt1

F1(a1) F2(a2) Fn(an)

T =

A = a1 + a2 + a3 + … + an

Target: Minimize T under a constraint A

17

Page 18: Memory driven architecture: flipping the inequality computing vs. memory - Technion … · 2015-11-19 · Uri Weiser Professor of Engineering Technion Memory driven architecture:

MultiAmdahl:

Optimization using Lagrange

multipliersMinimize execution time (T)

under an Area (a) constraint

t2 t3 tnt1

F1(a1) F2(a2) Fn(an)

18

tj F’j(aj) = ti F’i(ai)

F’= derivation of the accelerator function

ai = Area of the i-th accelerator

ti = Execution time on reference computer

Page 19: Memory driven architecture: flipping the inequality computing vs. memory - Technion … · 2015-11-19 · Uri Weiser Professor of Engineering Technion Memory driven architecture:

MultiAmdahl Framework

Applying known techniques* to

new environments

Can be used during system’s

definition and/or dynamically to

tune system

* Gossen’s second law (1854), Marginal utility, Marginal rate of substitution (Finance)

19

Page 20: Memory driven architecture: flipping the inequality computing vs. memory - Technion … · 2015-11-19 · Uri Weiser Professor of Engineering Technion Memory driven architecture:

Example: CPU vs. Accelerators

Future GP CPU size vs. transistor budget growth

Test case: 4 accelerators and GP (big) CPU

Applications: evenly distributed benchmarks mix w/ 10% sequential code

Heterogeneous Insight:

In an increased-transistor-budget-environment,

General Purpose (big) CPU importance will grow 20

Page 21: Memory driven architecture: flipping the inequality computing vs. memory - Technion … · 2015-11-19 · Uri Weiser Professor of Engineering Technion Memory driven architecture:

Example: CPU vs. Accelerators

GP CPU size vs. power budget

Test case: 4 accelerators and GP (big) CPU

Applications: evenly distributed benchmarks mix w/ 10% sequential code

21

Heterogeneous Insight:

In a decreased-power-budget-environment,

Accelerators importance will grow

Page 22: Memory driven architecture: flipping the inequality computing vs. memory - Technion … · 2015-11-19 · Uri Weiser Professor of Engineering Technion Memory driven architecture:

Environment Changes

Is it time for a change in implementation?

Throughput became an essential Microprocessor

target

Data footprint became bigger

Multi-Core systems are everywhere

more performance = more memory usage

Memory pressure is increasing

Significant CPU die power (>30%) is consumed by IO (access to out-of-die memory)

22

Page 23: Memory driven architecture: flipping the inequality computing vs. memory - Technion … · 2015-11-19 · Uri Weiser Professor of Engineering Technion Memory driven architecture:

Bottom up approach:

New device - Memristor?

23

Page 24: Memory driven architecture: flipping the inequality computing vs. memory - Technion … · 2015-11-19 · Uri Weiser Professor of Engineering Technion Memory driven architecture:

What is a Memristor?

2-terminal resistive nonvolatile device

Device’s resistivity depends on past

electrical current

Device is constructed of 2 metal layers with

oxide in between (e.g. TiO2)

Can be implemented in Multi (physical) layer memory

RON

ROFF

Voltage [V]

Cu

rren

t [m

A]

24

Jul 30, 2013

Panasonic Starts World's First Mass Production of ReRAM Mounted Microcomputers[1] ReRAM (Resistive Random Access Memory)

A type of non-volatile memory which records "0" and "1" digital information by generating large resistance changes with a pulsed voltage applied to a thin-film metal oxide.

The simple structure of the metal oxide sandwiched by electrodes makes the manufacturing process easier and provides excellent low power-consumption and high-speed rewriting characteristics.

Page 25: Memory driven architecture: flipping the inequality computing vs. memory - Technion … · 2015-11-19 · Uri Weiser Professor of Engineering Technion Memory driven architecture:

• Theoretical idea by Chua

in 1971

• Implemented today by

Hewlett Packard

SK Hynix, HRL Labs

• Memory products by

Pannasonic

Memristor

50

nmArray of 17 oxygen-depleted titanium dioxide

memristors (HP Labs)

25

Page 26: Memory driven architecture: flipping the inequality computing vs. memory - Technion … · 2015-11-19 · Uri Weiser Professor of Engineering Technion Memory driven architecture:

Memristor Microarchitecture “Vision”

Layers of memory cells above logic

Does this new structure open the possibility

for new Microarchitecture?

26

Page 27: Memory driven architecture: flipping the inequality computing vs. memory - Technion … · 2015-11-19 · Uri Weiser Professor of Engineering Technion Memory driven architecture:

Memristors to the Rescue?

Huge amount of memory cells

Very close to logic

Non volatileNo need for power to keep alive

~ transistor size

Fast

No leakage

27

Page 28: Memory driven architecture: flipping the inequality computing vs. memory - Technion … · 2015-11-19 · Uri Weiser Professor of Engineering Technion Memory driven architecture:

Sea of Memory Cells Impact - Conventional vs. Out of the box

Enhance Multithreading architecture (Graphics like)

Increase on-die prediction structures

Instruction queues

Back to LUT (look-up-tables) implementations

New caches (e.g. NAHALAL, MC vs. MT, Cache specific content)

Non-Register Architecture (memory-to-memory operations)

Continues Flow Multithreading (improved SoE MT)

Instruction reuse (memoization)

Computation at the memory level*

* Ref Dr. Avidan Akerib General manager NeoMagic 28

??

?

?

Page 29: Memory driven architecture: flipping the inequality computing vs. memory - Technion … · 2015-11-19 · Uri Weiser Professor of Engineering Technion Memory driven architecture:

Memory Intensive

Architectures

Bandwidth demonsBandwidth demons

Traversing on a constant-throughput-line ?

? increase on-die-memory (e.g. cache, new ideas)

The trend

Bandwidth

Th

rou

gh

pu

t/B

an

dw

idth

TP3

TP1

TP2

TP4

Throughput and Bandwidth*

*Influenced by ISCA 1995 paper: Performance Evaluation of the PowerPC 620 Microarchitecture; (graph: frequency vs. performance/frequency)

Chip boundary

Bandwidth

to out-of Chip

devices

energy waste

Throughput

engine

29

Page 30: Memory driven architecture: flipping the inequality computing vs. memory - Technion … · 2015-11-19 · Uri Weiser Professor of Engineering Technion Memory driven architecture:

Thread A

Thread B

Fetch

Execute

Write back

Cache miss!!!

30

Switch on Event MultithreadingExample- processor pipeline

Thread C

Pip

elin

e s

tag

es

Page 31: Memory driven architecture: flipping the inequality computing vs. memory - Technion … · 2015-11-19 · Uri Weiser Professor of Engineering Technion Memory driven architecture:

Continuous Flow MT (CFMT)Example – processor’s pipeline

SOE deficiencies

Instructions flush beyond the “event instruction”

waste of energy

performance degradation

Can we use Memristor to reduce thread switch penalty

(bubbles)?

Yes

do not flush, store the thread-pipe-state in

memristors (Multistage Pipeline Register)

31

Page 32: Memory driven architecture: flipping the inequality computing vs. memory - Technion … · 2015-11-19 · Uri Weiser Professor of Engineering Technion Memory driven architecture:

Continues Flow MT (CFMT)Example – processor’s pipeline

Th

rea

d A

Pip

eli

ne

re

gis

ter

R/W

R/W

R/W

R/W

R/W

Fetch

Execute

Write back

R/W

Multistate

Pipeline

Register (MPR)

Th

rea

d B

Pip

eli

ne

re

gis

ter

32

Pip

elin

e s

tages

Page 33: Memory driven architecture: flipping the inequality computing vs. memory - Technion … · 2015-11-19 · Uri Weiser Professor of Engineering Technion Memory driven architecture:

MPR

MPR

MPR

MPR

MPR

MPR

Continuous Flow MT (CFMT)Example – processor’s pipeline Thread A

Thread BFetch

Execute

Write back

Cache miss!!!

MP

R=

Multis

tate

Pip

elin

eR

egis

ter

33

Thread C

Pip

elin

e s

tag

es

Page 34: Memory driven architecture: flipping the inequality computing vs. memory - Technion … · 2015-11-19 · Uri Weiser Professor of Engineering Technion Memory driven architecture:

CFMT Initial Simulation (preliminary) (ARM like Microarch V7, lbm from Spec CPU 2006

CFMT for multiple cycle events? not sure yet…

SoE; no CFMT

CFMT (Mem and MCE)

34

CFMT (mem only)

IPC(Performance)

# of threads

Page 35: Memory driven architecture: flipping the inequality computing vs. memory - Technion … · 2015-11-19 · Uri Weiser Professor of Engineering Technion Memory driven architecture:

Memory Intensive ArchitectureLooking Forward

• Large on die memory may save energy and change

the way we architect our computational machines

Reduction in Data-Transfer

Opportunity for dramatic improvement in

Performance/Power or Throughput/Power

Performance improvement (@same power) => energy

reduction

Reduction of static/leakage power

Energy saving in reactive systems

(0 memory energy when no operation)

NEW!!!

35

Page 36: Memory driven architecture: flipping the inequality computing vs. memory - Technion … · 2015-11-19 · Uri Weiser Professor of Engineering Technion Memory driven architecture:

Summary

Saving energy via optimal Heterogeneous

system

The introduction of on-die huge memory

should alter the way we design

computational machines for low energy

consumption

36

Page 37: Memory driven architecture: flipping the inequality computing vs. memory - Technion … · 2015-11-19 · Uri Weiser Professor of Engineering Technion Memory driven architecture:

Thank You

37