accelerators

17
Accelerators Ran Ginosar Avinoam Kolodny Yuval Cassuto Koby Crammer Shmuel Wimer Dani Lichinski

Upload: makana

Post on 25-Feb-2016

55 views

Category:

Documents


0 download

DESCRIPTION

Accelerators. Ran Ginosar Avinoam Kolodny Yuval Cassuto Koby Crammer Shmuel Wimer Dani Lichinski. (research)(motivation) questions. We love accelerators, but… What accelerators ? What workload? What “killer applications” ? Why study / develop them? Who needs them? - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Accelerators

Accelerators

Ran GinosarAvinoam Kolodny

Yuval CassutoKoby CrammerShmuel WimerDani Lichinski

Page 2: Accelerators

(research)(motivation) questions

• We love accelerators, but…• What accelerators ?• What workload? What “killer applications” ?• Why study / develop them?• Who needs them?• What architecture(s) ?• What goals are we seeking to fulfill ?– In addition to winning ICRI-CI research grants

Page 3: Accelerators

Why accelerators?

• Semiconductor industry sells $300B/year (10% INTC)

– 1M high profit chips/day• $100/chip, $100M/day. Mostly CPU. • 10% of revenues. 100-1000% gross profit

– 90M low cost chips/day• $10/chip, $900M/day. 50% gross profit

• Growth < 10%• In the year 2023?– Need to expand into another rich industry

• Store-and-compute accelerators will be the driver

Page 4: Accelerators

• Which industry is– Rich• Much richer than semiconductors

– Under-utilized– Begs for progress (and can pay for it)– Critical, will not disappear

• Video? Entertainment? Communication?

Page 5: Accelerators

Health Care

• $2.5 Trillion in US alone– Already 10x the entire global semiconductor industry– $4.5T by 2020– Global is probably 3X, $15T by 2020

• Key challenge:– Today: imprecise, statistics-based diagnosis and

treatment– Develop into more efficient, more successful

discipline by combining science & computing

Page 6: Accelerators

Future health care is computerized (store and compute)

• Medical/health data about 10B people– Genomics, proteomics (5 GB/person)– Health & medical record (1 GB/person)– Continuous accumulating readings of sensors

(4 GB/person)• Medical, environmental, food & drugs

• Monitor and process all individuals– Machine learning– Predict and alert medical conditions– Individualize drugs, diets, treatments

Page 7: Accelerators

Storage required

• 10 GB/person• 10B people• 1020 Bytes (100 ExaBytes,

100 Mega-TeraBytes)– 100 million of today’s 1 TBytes disk.

100+ data centers– 500 MegaWatts to store, read and write• $350 Million / year

Page 8: Accelerators

Computing required

• Run through 50% of data each day• Perform 10 op / byte• 1021 OP/day = 1016 OP/sec– Only 10M cores of 1 GOPS each– 100 data centers

• Power: only 10 MegaWatt– 2% of storage power

Page 9: Accelerators

Solution: move computing closer to data

• The HMC industry already makes the first step

• 100,000 TSV vertical interconnects

Page 10: Accelerators
Page 11: Accelerators
Page 12: Accelerators

Not yet there

• Wish to get closer: stack memory on top CPU ?• NO. Too hot– CPU operates above 100ºC– DRAM is useless above 85ºC

• Solution– Dispose of the CPU– Create 3D low-power (low temperature),

uniform-power-density, high-performance store & compute machine

Page 13: Accelerators

Store & Compute

• 1 Tbyte / chip in 2020– Combined DRAM + NVM

• Accelerators– 1000 cores “many-core”• MIMD• Associative Processors• SIMD

• Internal + external networks

NOC2D Accelerator

p-m NOCp-m NOCp-m NOCp-m NOCp-m NOCp-m NOCp-m NOC3D Accelerator

NVMNVMNVMNVMNVM

DRAM+SRAM

NVMNVMNVMNVMNVMNVMNVMNVM

Page 14: Accelerators

Challenges• Need 100M chips• Max 0.1 W / chip– Total 10 MWatt– 100-1000

data centers

5 mm

NOC2D Acceleratorp-m NOCp-m NOCp-m NOC

NVM

20 mm 20 mm

500 chips50 Watt

Page 15: Accelerators

More challenges

• Understand workload• Understand algorithms• Architect the store & compute accelerators• Low low low power• High (data-intensive) performance

Page 16: Accelerators

Approaches

1. Associative processors– Classic store & compute– Uniform power distribution– Massive parallelism– Very low power

2. Orthogonal access SIMD processors – Sequential and parallel access– Mitigate data-movement

bottleneck

Page 17: Accelerators

Approaches

3. Average case computing– ALU that runs faster than worst case– And dissipates less power than worst case– Enables low power just-in-time architecture

4. Personalized vision/graphics for personal mobile devices– Inspires workload understanding

5. Memristive processors and resistive memories – Presented by Yuval Cassuto