embedded computer architecture 5kk73 mpsoc platforms

36
Embedded Computer Architecture 5KK73 MPSoC Platforms Part2: Cell Bart Mesman and Henk Corporaal

Upload: clay

Post on 11-Jan-2016

53 views

Category:

Documents


1 download

DESCRIPTION

Embedded Computer Architecture 5KK73 MPSoC Platforms. Part2: Cell Bart Mesman and Henk Corporaal. The Complexity Crisis. I have always wished that my computer would be as easy to use as my telephone. My wish has come true. I no longer know how to use my telephone. --Bjarne Stroustrup. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Embedded Computer Architecture 5KK73 MPSoC Platforms

Embedded Computer Architecture5KK73

MPSoC Platforms

Part2: Cell

Bart Mesman and Henk Corporaal

Page 2: Embedded Computer Architecture 5KK73 MPSoC Platforms

The Complexity Crisis

I have always wished that my computer would be as easy to use as my telephone. My wish has come true. I no longer know how to use my telephone.

--Bjarne Stroustrup

04/21/23 2

Page 3: Embedded Computer Architecture 5KK73 MPSoC Platforms

04/21/23 3

The Software Crisis

Page 4: Embedded Computer Architecture 5KK73 MPSoC Platforms

04/21/23 4

The first SW crisis

Time Frame: ’60s and ’70s• Problem: Assembly Language Programming

– Computers could handle larger more complex programs

• Needed to get Abstraction and Portability without losing Performance

• Solution:– High-level languages for von-Neumann machines

FORTRAN and C

Page 5: Embedded Computer Architecture 5KK73 MPSoC Platforms

04/21/23 5

The second SW crisis

Time Frame: ’80s and ’90s• Problem: Inability to build and maintain complex

and robust applications requiring multi-million lines of code developed by hundreds of programmers– Computers could handle larger more complex

programs

• Needed to get Composability and Maintainability– High-performance was not an issue: left for Moore’s

Law

Page 6: Embedded Computer Architecture 5KK73 MPSoC Platforms

04/21/23 6

Solution

• Object Oriented Programming– C++, C# and Java

• Also…– Better tools

• Component libraries, Purify

– Better software engineering methodology• Design patterns, specification, testing, code

reviews

Page 7: Embedded Computer Architecture 5KK73 MPSoC Platforms

04/21/23 7

Today: Programmers are Oblivious to Processors

• Solid boundary between Hardware and Software• Programmers don’t have to know anything about the

processor– High level languages abstract away the processors

• Ex: Java bytecode is machine independent

– Moore’s law does not require the programmers to know anything about the processors to get good speedups

• Programs are oblivious of the processor -> work on all processors– A program written in ’70 using C still works and is much faster

today

• This abstraction provides a lot of freedom for the programmers

Page 8: Embedded Computer Architecture 5KK73 MPSoC Platforms

04/21/23 8

The third crisis: Powered by PlayStation

Page 9: Embedded Computer Architecture 5KK73 MPSoC Platforms

04/21/23 9

Contents

• Hammer your head against 4 walls– Or: Why Multi-Processor

• Cell Architecture

• Programming and porting– plus case-study

Page 10: Embedded Computer Architecture 5KK73 MPSoC Platforms

04/21/23 10

Moore’s Law

Page 11: Embedded Computer Architecture 5KK73 MPSoC Platforms

04/21/23 11

Single Processor SPECint Performance

Page 12: Embedded Computer Architecture 5KK73 MPSoC Platforms

04/21/23 12

What’s stopping them?

• General-purpose uni-cores have stopped historic performance scaling– Power consumption– Wire delays– DRAM access latency– Diminishing returns of more instruction-level

parallelism

Page 13: Embedded Computer Architecture 5KK73 MPSoC Platforms

04/21/23 13

Power density

Page 14: Embedded Computer Architecture 5KK73 MPSoC Platforms

04/21/23 14

Power Efficiency (Watts/Spec)

Page 15: Embedded Computer Architecture 5KK73 MPSoC Platforms

04/21/23 15

1 clock cycle wire range

Page 16: Embedded Computer Architecture 5KK73 MPSoC Platforms

04/21/23 16

Global wiring delay becomes dominant over gate delay

Gate delay vs. wire delay

0

50

100

150

200

250

300

350

400

0.5 0.35 0.25 0.18 0.13 0.1

technology (micron)

ps

wire delay (ps/mm)

gate delay (ps)

Page 17: Embedded Computer Architecture 5KK73 MPSoC Platforms

04/21/23 17

Memory

µProc:55%/year

CPU

DRAM:7%/yearDRAM

1

10

100

1000

1980

1985

1990

1995

2000

Processor-MemoryPerformance Gap:(grows 50% / year)

Performance

Time

“Moore’s Law”

[Patterson]

2005

Page 18: Embedded Computer Architecture 5KK73 MPSoC Platforms

04/21/23 18

Now what?

• Latest research drained

• Tried every trick in the book

So: We’re fresh out of ideas

Multi-processor is all that’s left!

Page 19: Embedded Computer Architecture 5KK73 MPSoC Platforms

04/21/23 19

Low power through parallelism• Sequential Processor

– Switching capacitance C– Frequency f– Voltage V– P = fCV2

• Parallel Processor (two times the number of units)– Switching capacitance 2C– Frequency f/2– Voltage V’ < V– P = f/2 2C V’2 = fCV’2

Page 20: Embedded Computer Architecture 5KK73 MPSoC Platforms

04/21/23 20

Architecture methods

Powerful Instructions (1)

MD-technique• Multiple data operands per operation• SIMD: Single Instruction Multiple Data

Vector instruction:

for (i=0, i++, i<64) c[i] = a[i] + 5*b[i];

c = a + 5*b

Assembly:

set vl,64ldv v1,0(r2)mulvi v2,v1,5ldv v1,0(r1)addv v3,v1,v2stv v3,0(r3)

Page 21: Embedded Computer Architecture 5KK73 MPSoC Platforms

04/21/23 21

Architecture methods

Powerful Instructions (1)

• Sub-word parallelism– SIMD on restricted scale:– Used for Multi-media instructions– Motivation: use a powerful 64-bit alu

as 4 x 16-bit alus

• Examples– MMX, SUN-VIS, HP MAX-2, AMD-

K7/Athlon 3Dnow, Trimedia II

– Example: i=1..4|ai-bi| * * * *

Page 22: Embedded Computer Architecture 5KK73 MPSoC Platforms

04/21/23 22

MPSoC Issues

• Homogeneous vs Heterogeneous• Shared memory vs local memory• Topology• Communication (Bus vs. Network)• Granularity (many small vs few large)• Mapping

– Automatic vs manual parallelization– TLP vs DLP– Parallel vs Pipelined

Page 23: Embedded Computer Architecture 5KK73 MPSoC Platforms

04/21/23 23

Multi-core

Page 24: Embedded Computer Architecture 5KK73 MPSoC Platforms

04/21/23 24

Cell

Page 25: Embedded Computer Architecture 5KK73 MPSoC Platforms

04/21/23 25

What can it do?

Page 26: Embedded Computer Architecture 5KK73 MPSoC Platforms

04/21/23 26

Cell/B.E. - the history

• Sony/Toshiba/IBM consortium– Austin, TX – March 2001– Initial investment: $400,000,000

• Official name: STI Cell Broadband Engine – Also goes by Cell BE, STI Cell, Cell

• In production for:– PlayStation 3 from Sony – Mercury’s blades

Page 27: Embedded Computer Architecture 5KK73 MPSoC Platforms

04/21/23 27

Cell blade

Page 28: Embedded Computer Architecture 5KK73 MPSoC Platforms

04/21/23 28

Cell/B.E. – the architecture1 x PPE 64-bit PowerPC

L1: 32 KB I$ + 32 KB D$L2: 512 KB

8 x SPE cores:Local store: 256 KB 128 x 128 bit vector registers

Hybrid memory model: PPE: Rd/Wr SPEs: Asynchronous DMA

• EIB: 205 GB/s sustained aggregate bandwidth• Processor-to-memory bandwidth: 25.6 GB/s• Processor-to-processor: 20 GB/s in each direction

Page 29: Embedded Computer Architecture 5KK73 MPSoC Platforms

04/21/23 29

Cell chip

Page 30: Embedded Computer Architecture 5KK73 MPSoC Platforms

04/21/23 30

SPE

Page 31: Embedded Computer Architecture 5KK73 MPSoC Platforms

04/21/23 31

SPE

Page 32: Embedded Computer Architecture 5KK73 MPSoC Platforms

04/21/23 32

SPE pipeline

Page 33: Embedded Computer Architecture 5KK73 MPSoC Platforms

04/21/23 33

Communication

Page 34: Embedded Computer Architecture 5KK73 MPSoC Platforms

04/21/23 34

8 parallel transactions

Page 35: Embedded Computer Architecture 5KK73 MPSoC Platforms

04/21/23 35

C++ on Cell

1

234

Send the code of the function to be run on SPE

Send address to fetch the dataDMA data in LS from the main memoryRun the code on the SPE

56

DMA data out of LS to the main memorySignal the PPE that the SPE has finished the function

Page 36: Embedded Computer Architecture 5KK73 MPSoC Platforms

04/21/23 36

Conclusions

• Multi-processors inevitable• Huge performance increase, but…• Hell to program

– Got to be an architecture expert– Portability?