japanese 2 nd generation dynamically reconfigurable processors

60
Japanese 2 nd generation Dynamically Reconfigurable Processors ERSA2009 Invited Speech Hideharu Amano Keio Univ.

Upload: london

Post on 08-Jan-2016

72 views

Category:

Documents


11 download

DESCRIPTION

Japanese 2 nd generation Dynamically Reconfigurable Processors. ERSA2009 Invited Speech Hideharu Amano Keio Univ. Commercial Products using Dynamically Reconfigurable Processors. SONY PMW EX-1/3 Professional camcorder NEC electronics’ STP engine Panasonic’s Professional camcorder - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Japanese 2 nd  generation Dynamically Reconfigurable Processors

Japanese 2nd generation   Dynamically Reconfigurable Processors

ERSA2009

Invited Speech

Hideharu Amano

Keio Univ.

Page 2: Japanese 2 nd  generation Dynamically Reconfigurable Processors

Commercial Products usingDynamically Reconfigurable Processors

SONY PMW EX-1/3 Professional camcorderNEC electronics’STP enginePanasonic’s Professional camcorderDFabric

Multifunction PrintersIP Flex’s DAPDNA-2

SONY PSP VME (Virtual Mobile Engine)

Page 3: Japanese 2 nd  generation Dynamically Reconfigurable Processors

Short history of Dynamically Reconfigurable

Processors 1990 1995 2000 2005

FPGA with DynamicReconfiguration

Processor withReconfigurableInstructions

MPLD(Fujitsu)

WASMII(Keio)

Time MultiplexedFPGA(Xilinx)

DRL(NEC)

GARP(UCB)CHIMAERA(NorthWestern Univ.)

Xpp(PACT)CS2112(Chameleon)

DRP(NEC elec.)

DAPDNA/2(IPFlex)DFabric(Elixcent)

Kilocore(Rapport)PipeRench(CMU)

X-bridge(NEC ele.)

DAPDNA/IMX(IPFlex)

S-5(Stretch) S-6(Stretch)

FE-GA(Hitachi)

The 1st Generation The 2nd Generation

A lot of commercialsystems

DISC(Brigham Young Univ.)

Page 4: Japanese 2 nd  generation Dynamically Reconfigurable Processors

Product Vendor Context Data PE

D-Fabri x Panasonic Deliver 4 Homo

Xpp PACT Deliver 24 Homo

S5/S6 engine Stretch Deliver 4/8 Hetero

CS2112 Chameleon Multi-C (8) 16/32 Homo

DAPDNA-2 IPFlex Multi-C (4) 32 Hetero

DRP-1 NEC electronics Multi-C (16)

8 Homo

STP-engine NEC electronics Multi-C(32) 8 Homo

Kilocore Rapport Multi-C 8 Homo

ADRES IMEC Multi-C (32)

16 Homo

FE-GA Hitachi Multi-C 16 Hetero

For Car-tuners SANYO Multi-C(4) 24 Homo

FlexSword(SAKE) Toshiba Multi-C(4/16) 16 Homo

Cluster Fujitsu Multi-C 16 Hetero

Most of Japanese semiconductor Companies have their own projects!

Page 5: Japanese 2 nd  generation Dynamically Reconfigurable Processors

Outline

• Why Dynamically Reconfigurable Processors ?– A solution of recent SoC problems.

• What is a Dynamically Reconfigurable Processor ?– Coarse Grain Structure– Dynamic Reconfiguration– C-level programming

• What is the main advantages/limitations?– Comparison with other architectures– Low power consumption

• The 2nd generation examples

Page 6: Japanese 2 nd  generation Dynamically Reconfigurable Processors

Why Dynamic Reconfigurable Processors?

CPU

Memory

ApplicationSpecific

HardwareI/O

A solution to problems onSoC (System-on-a Chip)

SoC (System-on-a-Chip)

Brain in Various IT products, e.g.Cellular Phones,Network Controllers,Mobile Terminals,Video camera,Car electronics…

Problem!•The performance is depending onApplication Specific Hardware•Various new techniques are coming up.•Design/mask cost of leading edge semiconductor process is much increased.

Powerful but flexible, low power/cost off-load engine is required!

Page 7: Japanese 2 nd  generation Dynamically Reconfigurable Processors

How about using common FPGAs?

CPU

Memory

ApplicationSpecific

HardwareI/O

CommonFPGA

Common FPGA is FlexibleXilinx’s FPGA (eg. Virtex-4/FX)with PowerPC are popularly used.Of course, Alteras’ are also popular.

•System on a Programmable Device tends to be expensive and too much power consuming for most consumer products.•They come from their static fine grain architecture

But

Page 8: Japanese 2 nd  generation Dynamically Reconfigurable Processors

What is a Dynamically Reconfigurable Processor ?

CPU

Memory

ApplicationSpecific

HardwareI/O

Dynamically Reconfigurable

Processor

Flexible Accelerators in SoCs

Coarse Grain Structure→   High performance

Dynamic reconfiguration →   High area efficiency

C-level programming→   Easy to design

1

Page 9: Japanese 2 nd  generation Dynamically Reconfigurable Processors

Outline

• Why Dynamically Reconfigurable Processors ?– A solution of recent SoC problems.

• What is a Dynamically Reconfigurable Processor ?– Coarse Grain Structure– Dynamic Reconfiguration– C-level programming

• What is the main advantages/limitations?– Comparison with other architectures– Low power consumption

• The 2nd generation examples

Page 10: Japanese 2 nd  generation Dynamically Reconfigurable Processors

1. Coarse Grain StructureAn example of PE array

PESE SE SE SE SE

PESE

PESE

PESE

PESE SE

PESE

PESE

PESE

PESE SE

PESE

PESE

PESE

PESE SE

MEMSE

MEMSE

MEMSE

MEMSE SE

MULT

MULT

MULT

MULT

PE PE PE

Island style like FPGAsVarious types of Array structures are used

MuCCRA-1 by Keio Univ(ASSCC2007)

Page 11: Japanese 2 nd  generation Dynamically Reconfigurable Processors

An example of PE (Processing Element)

SMU ALU

smuasel smubsel alubselaluaselalucsel

RFile

rfselrfcsel

aluina

rfinarfinca

aluinb

rfinbrfincb

rfboutcrfboutrfaoutcrfaout

rfaddra

rfaddrb

rfinc rfina

outc out

ina inb

rfwe

rfwec dmuope

cnst aluconf

outc out

inbinainc

24bit data2bit carry

smuina

aluincasmuinb

PE of MuCCRA-1

ALU: Add/Sub/Mult/CMPSMU:Shift/Mask/ConstantRFile: Register Files

Page 12: Japanese 2 nd  generation Dynamically Reconfigurable Processors

2. Dynamic Reconfiguration

• The operations of PEs and interconnections are defined by the configuration data stored in the configuration memory like FPGAs.

• Changing configuration data dynamically →   The data path for various applications can b

e switched quickly.• How configuration data are changed?

– High speed delivery from the central configuration memory.

– Multicontext dynamic reconfiguration→   One clock dynamic reconfiguration

Page 13: Japanese 2 nd  generation Dynamically Reconfigurable Processors

Quick delivery of instructions/configuration from on-chip memory

On-Chip Memory

PE /SE

PE /SE

On-Chip Memory

•Delivery with 10’s micro-seconds•PACT Xpp•Panasonic(Elixent’s) DFabric

Dynamically reconfiguration is donemainly for Task switching

Page 14: Japanese 2 nd  generation Dynamically Reconfigurable Processors

Multicontext Function

Mul

tipl

exer

SRAM slots

n

PE/SE

1

2

Input data

Output data

PE/SEPE/SEContext

A number of Configuration Memory slots are provided.

They can be switched in a clock

→  Hardware Structure is changed in a clock

→  Hardware Context switching

Page 15: Japanese 2 nd  generation Dynamically Reconfigurable Processors

Practical implementation ofmulticontext structure

Context Pointer

PE or Switcihng Element

Context Memory

Page 16: Japanese 2 nd  generation Dynamically Reconfigurable Processors

3. C-level programming

• The programming environment is a mixture of traditional C compiler and FPGA design tool

• The C-code is divided into the data flow and control.

• The assignment of the contexts, PEs and memory modules can be automatically done.

• The place-and-route sometimes takes a long time like FPGA design.

• The programming is easy only if the data to be processed can be mapped onto the memory modules.

Page 17: Japanese 2 nd  generation Dynamically Reconfigurable Processors

Example: DRP Compiler (NEC)

• Compiling C source code into DRP object code

Behavaioral Description Language (BDL)

• High level synthesis: generates finite state machines (FSMs) and associated datapath planes– The ASIC behavioral design tool: Cyb

er is modified and used.

• Mapper: maps FSMs and datapath plane to STC and PEs respectively

• Place & Router: physically locates the PEs, memories and interconnection between them

C Source Code

High Level Synthesis

FSM Datapath

Technology Mapper

Place & Router

Code Generation

Object Code

Page 18: Japanese 2 nd  generation Dynamically Reconfigurable Processors

Outline

• Why Dynamically Reconfigurable Processors ?– A solution of recent SoC problems.

• What is a Dynamically Reconfigurable Processor ?– Coarse Grain Structure– Dynamic Reconfiguration– C-level programming

• What is the main advantages/limitations?– Comparison with other architectures– Low power consumption

• The 2nd generation examples

Page 19: Japanese 2 nd  generation Dynamically Reconfigurable Processors

Dynamically Reconfigurable Processors vs. other architectures

vs. Multi-core/Many Core architectures– No instruction fetch/Cache mechanism– Less flexible but much smaller area →  16PEs in 1.5mm-square/90nm (MuCCRA2)

vs. SIMD (Single Instruction Streams Multiple Data Streams)– The operations and interconnections can be customized for each

PE and SE. →  Efficient for complicated algorithms.– The number of instructions/contexts are small

vs. VLIW (Very Long Instruction Word)– A larger degree of parallelism can be utilized. →  Higher performance can be obtained.– The number of instructions/contexts are small

Page 20: Japanese 2 nd  generation Dynamically Reconfigurable Processors

MuCCRA-2 Floor Plan

•ASPLA’s 90nm•2.5mmX2.5mm(Core: 1.5X1.5)

16

The total PE array < one PE of Recent Multi/Many core processors

Page 21: Japanese 2 nd  generation Dynamically Reconfigurable Processors

Dynamically Reconfigurable Processors vs. other architectures

vs. Multi-core/Many Core architectures– No instruction fetch/Cache mechanism– Less flexible but much smaller area →  16PEs in 1.5mm-square/90nm (MuCCRA2)

vs. SIMD (Single Instruction Streams Multiple Data Streams)– The operations and interconnections can be customized for each

PE and SE. →  Efficient for complicated algorithms.– The number of instructions/contexts are small

vs. VLIW (Very Long Instruction Word)– A larger degree of parallelism can be utilized. →  Higher performance can be obtained.– The number of instructions/contexts are small

Page 22: Japanese 2 nd  generation Dynamically Reconfigurable Processors

1 3 8 16 ManyNum. ofHW-contexts

Num. of Cores

Granularityof core

10

100

1000

32bit

16bit

8bit

4bit

Common ProcessorVLIW

DAPDNA-2

DRPDRL

CS2112

32

Xbridge

FE-GAFPGA

FPGA extension

Dynamically ReconfigurableProcessors

DFabric

Xpp

Multi-Core processor

Granularity vs.Num. of Cores vs.Mum. of HW-contexts

Page 23: Japanese 2 nd  generation Dynamically Reconfigurable Processors

Main Advantage: Low power consumption

Why low power ?1.   No redundant hardware

– There are no instruction fetch mechanisms, cache, TLB, and etc.→   Of course, it cannot be a general purpose engine, but enough

for an accelerator.– A bare datapath works only for computation.

2.   Parallel Execution with a number of PE s– Much lower clock frequency can be used to achieve the same pe

rformance as other architectures.– The main problem is leakage power, but can be suppressed by p

ower gating techniques.

10X energy efficient compared with DSPs.5-50X with FPGAs.Sometimes similar to that for hardwired logic.

Page 24: Japanese 2 nd  generation Dynamically Reconfigurable Processors

Energy consumption ( nJ)

0

1000

2000

3000

4000

5000

6000

7000

DCT Viterbi SHA-1

Ener

gy(n

J) MuCCRAFPGAASIC

The comparison using 0.18um implementation

Page 25: Japanese 2 nd  generation Dynamically Reconfigurable Processors

The main limitations as an accelerator in SoCs

• The data must be stored in the memory modules placed around the PE array.– If the data is more than the memory, it is hard to be

treated.

• If the required contexts are more than its context memory, the operational speed is much degraded.– The virtual hardware mechanism is provided but there

is a certain limitation.

• The performance is not so improved for problems without parallelism.

Page 26: Japanese 2 nd  generation Dynamically Reconfigurable Processors

Outline

• Why Dynamically Reconfigurable Processors ?– A solution of recent SoC problems.

• What is a Dynamically Reconfigurable Processor ?– Coarse Grain Structure– Dynamic Reconfiguration– C-level programming

• What is the main advantages/limitations?– Comparison with other architectures– Low power consumption

• The 2nd generation examples

Page 27: Japanese 2 nd  generation Dynamically Reconfigurable Processors

Dynamically Reconfigurable Processors: The 2nd generation

• Customized for a specific target application area– SANYO car tuner →   Tuner– Fujitsu →   Wireless communication– Toshiba SAKE  →  Multi-media– NEC electronics X-bridge  →  Multi-media

• Multi-core structure with small PE arrays rather than a big array– Cooperation with various type cores

• Integrated design environment• Low power design →   The main advantage!

Page 28: Japanese 2 nd  generation Dynamically Reconfigurable Processors

X-bridge: NEC electronics (2008)

CPUMIPS

JTAG

I-CD-C

UA

RT

UA

RT

CS

IG

PIO

INTC

GeneralPort8bX4

DMA

SPL

DMA

WorkRAM(1kB)

PCIexpHB/EP(1-lane)

PeriphI/F

10/100EtherMAC

DMA

PCIHost/

Target

DDR2SDRAM

CTR

DMAPCIexpHB/EP(1-lane)

STPEngine

64bit on chip bus (266MHz)

64bi

t M

emor

yS

witc

h (2

66M

Hz)SPL

SPLSPL

SPLSPL

SPLSPL SPL

Nconnect

DynamicallyReconfigurableCore256PE(8bit)32-context

Providing the virtual hardware

mechanism

DMA controller hides the communication

overhead

From Invited talk in Design Gaia.2008

Page 29: Japanese 2 nd  generation Dynamically Reconfigurable Processors

Mixture of SIMD and DRP units:Toshiba’s FlexSword

Host Processor

I/O Buffer (Data RAM)

Formatter0Write

Control

Host I/F

System Memory

Inter-Unit Buffer (Data Registers)

Dynamically Reconfigurable Units(Indenepndently Controlled)

Code Buffer (Code RAM)

Formatter1AUX0AUX1

Optimized forStream Processing

SIMD Units

codedata

Our Architecture

From FPT2007  Tutorial session

Page 30: Japanese 2 nd  generation Dynamically Reconfigurable Processors

The Architecture (Formatter)

data A

Cfg

Me

m

data B

Shuffle

16-bit ALU x 8PE

Xbar In

validID

PE

PE w/o Shuffle

Xbar In

Xbar Out

Cfg Controller

CodeMem

Simple Hardware•Pipeline registers only•No intra-PE data transfer•PE:4 cfgs, Xbar: 16cfgs•ALU, shift & absolute ops only

PE

PE

Xbar In: Formatter0 onlyXBar Out: Formatter1 only

128 128

Suitable for batterfly operations

19

64

From FPT2007  Tutorial session

Page 31: Japanese 2 nd  generation Dynamically Reconfigurable Processors

SANYO’s Car tuner DRP

ALU ALU ALU ALU ALU ALU

ALU ALU ALU ALU ALU ALU

ALU ALU ALU ALU ALU ALU

ALU ALU ALU ALU ALU ALU

main memory

Out

In

sequencer

command memory

Feedback

ALU array

Page 32: Japanese 2 nd  generation Dynamically Reconfigurable Processors

ALU ALU ALU ALU ALU ALU

ALU ALU ALU ALU ALU ALU

ALU ALU ALU ALU ALU ALU

ALU ALU ALU ALU ALU ALU

L1

L2

L3

L4

L1L2L3L4

Th1-1Th1-2

Th1-3Th1-4

Th2-1Th2-2

Th2-3Th2-4

Th3-1Th3-2

Th3-3Th3-4

Th4-1Th4-2

Th4-3Th4-4

Th1-5Th1-6

Th1-7Th1-8

Th2-5Th2-6

Th2-7

Pipelined execution of 4 threads

Page 33: Japanese 2 nd  generation Dynamically Reconfigurable Processors

Fine carrier frequency offset estimation/correction

Cluster0I

QI

Qto FFT

Cluster4 Cluster5 Cluster6

Cluster1

Cluster2

DIV ATAN

Reg incluster0

Cluster0

Cluster3data outcontrol

I

Qto FFT

I

Q

LT1

LT2

self-correlation

phase offset calculation

Cluster1Cluster6(through)

Cluster0Reg

correction offset calculation in phase

polarCluster2complexmultiply

I

Q

Cluster3data outcontrol &

clip

I

Q

a) Fine carrier frequency offset estimation for LT1

b) Fine carrier frequency offset estimation for LT2

c) Fine carrier frequency offset correction for SIGNAL and DATA

Page 34: Japanese 2 nd  generation Dynamically Reconfigurable Processors

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

MLT ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

MLT

MLT

MLT

MLT

MLT

MLT

MLT

Crossbar Network

LS

LS

LS

LS

LS

LS

LS

LS

LS

LS

MEM

MEM

MEM

MEM

MEM

MEM

MEM

MEM

MEM

MEM

Configuration Manager

Sequence Manager

BusInterface

Computational Cell Array

Interrupt/DMA request

I/Oport

Load/StoreCells

LocalMemory

Hitachi’s   FE-GA

Page 35: Japanese 2 nd  generation Dynamically Reconfigurable Processors

Heterogeneous Multi-Core using FE-GA

SH-4

LPM

FVR

LDM

DSM

DTU

Network Interface

CPU0

FE-GA

LPM

FVR

LDM

DSM

DTU

Network Interface

DRP0CPU1 DRP1

CPU2 CPU3 DRP2 DRP3

On-Chip CSM

Network Interface

The codes are generated by a

parallelizing compiler and

standard APIs.

Page 36: Japanese 2 nd  generation Dynamically Reconfigurable Processors

Summary

• The 2nd Generation Dynamically Reconfigurable Processors are going to be embedded into consumer electronics products.

• The main advantage is low power consumption.• The main limitations is data memory

→ limited into a kind of stream computing.• Especially active in Japan

– Major Japanese consumer electronics companies all try to develop such systems.

Page 37: Japanese 2 nd  generation Dynamically Reconfigurable Processors

Yes. Japanese Culture LovesDynamic Reconfiguration!

Thank you!A part of ourown project willbe presented inthe later sessions

Page 38: Japanese 2 nd  generation Dynamically Reconfigurable Processors
Page 39: Japanese 2 nd  generation Dynamically Reconfigurable Processors

PE architecture Simple structure Executable up to 4 instructions in parallel

To upper Cell

To lower Cell

To left Cell

To right Cell

From upper Cell

From lower Cell

From left Cell

From right Cell

ALUArithmetic-1

LogicalFlow Control

SFTShift

THRData Control

Out

put S

witc

hT

rans

fer

Reg

iste

r (T

RE

G)

Inpu

t Sw

itch

Del

ay A

djus

tmen

t

1-bit x 48-bit x 4w/ valid bitConfiguration Register (x4)

control bus

Page 40: Japanese 2 nd  generation Dynamically Reconfigurable Processors

DRP Programming0

Data input

Data output

1. Context switching

2. Parallel processing in a context3.Sequential execution in a context

3-dimensional flexibility.Functional optimizer works efficiently.Efficient C-level programming

Context is controlledwith a state machine.

Page 41: Japanese 2 nd  generation Dynamically Reconfigurable Processors

Time multiplexed execution

•Area becomes 1/n, but performance becomes also 1/n.

Target hardware

Real hardware

•A single task can be executed with multiple contexts.

Page 42: Japanese 2 nd  generation Dynamically Reconfigurable Processors

→  Area efficiency is improved!

Target Hardware

Real Hardware

Time multiplexed execution

Most of hardware works partially.

Page 43: Japanese 2 nd  generation Dynamically Reconfigurable Processors

A wide research field ofreconfigurable architectures

• Two major extremes of multiple-core architectures: – FPGAs

• Fine-grained multiple-core architectures   with huge number of cores

• Basically static: 1-hardware context

– Many-core processors• Very coarse-grained multiple-core architectures• Fully programmable: Infinite-number of hardware contexts

WIDE RESEARCH FIELD

Page 44: Japanese 2 nd  generation Dynamically Reconfigurable Processors

Our environment for architectural explorationMuCCRA array design environment [FPL07]

DRPA Verilog-HDLGenerator

Architecture parameters

Verilog HDL description

Logic SynthesisSynopsys Design Compiler

Placement and RoutingSynopsys Astro

RTL/Net/Chip simulation

(Cadence NC-Verilog)

Timing Analysis(Synopsys Prime Time)

Retargetable CompilerBlack Diamond

Application Programs

Test Bench and Test Vector

Netlist

GDSII

Netlist

Template Library

CMOS standard cell library

4

Page 45: Japanese 2 nd  generation Dynamically Reconfigurable Processors

Extremely Low Power Design

• Now, major benefit of Dynamically Reconfigurable Processors– 1/8-1/10 to DSP [ASSCC07]– The main reason why SONY uses VME (Virtual Mobile Engine) i

n PSP (Playstation Portable) and X-bridge in professional video systems.

• Applying traditional techniques/Reducing the overhead of context switching [FPL08]– Operand isolation is quite effective

• Context oriented voltage control [Schweizer:FPT07]• Fine-grained power gating [FPT08Poster]• Dual Vth

Page 46: Japanese 2 nd  generation Dynamically Reconfigurable Processors

Network on Chips for reconfigurable systems

• For inner-core connection– island style/direct interconnection– New style of interconnection?

• For inter-core connection– The similar network for Many-core systems

may be used?

• Three dimensional/Wireless– A new possibility

Page 47: Japanese 2 nd  generation Dynamically Reconfigurable Processors

3 Dimensional wireless connected dies: MuCCRA-Cube

• A plane is corresponding to an array like MuCCRA-2 (4 ×4   PE) • 4 planes are connected with inductive wireless very high speed

interconnection. (3Gbit/sec per each channel)• Planes are connected in the flipped direction• 16 channels are provided in the 3-D direction

Direction of planes Channels

Page 48: Japanese 2 nd  generation Dynamically Reconfigurable Processors

MuCCRA-Cube Prototype

• STARC/ASPLA 90nm• 2.5 mm x 5mm die• Verilog-HDL is used for desig

n

• Synthesis: Synopsys DesignCompiler 2006.06-SP2

• Place&Route: Synopsys Astro 2007.03-SP3• Simulation: Cadence Verilog-XL 5.7

DATAMEM

TCC

PE/SECSC

Transceiver(Data)

Transceiver(CLK)

Page 49: Japanese 2 nd  generation Dynamically Reconfigurable Processors

Summary

• There is a wide field for architectural exploration between FPGAs and Many-core processors

• Keywords– Application Configurable– Low power Techniques– Interconnection Networks including Three dim

ensional/Wireless– Integrated Design Environment

Page 50: Japanese 2 nd  generation Dynamically Reconfigurable Processors

ReconfigurableArrayView

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FU FU FU FU

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FU FU FU FU

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

FURF

RF

Instruction FetchInstruction DispatchInstruction Decode Data Cache

VLIWview

IMEC ADRES

Page 51: Japanese 2 nd  generation Dynamically Reconfigurable Processors

PE

Interconnect

PE PE PE…..

PE

Interconnect

PE PE PE…..

PE

Interconnect

PE PE PE…..

PE

Interconnect

PE PE PE…..

Co

nfi

gu

rati

on

Co

ntr

oll

er

Output Controller

Input Controller

Fabric16PEs X 16PEs

128bits

128bits

672bits

32bits

Stripe

Rapport Kilocore

Page 52: Japanese 2 nd  generation Dynamically Reconfigurable Processors

MMU

InstCache

DataCache

InstUnit

Load/StoreUnit

FR

FPU

AR

ALU

WR

ISEF

FP UnitInteger Unit

Extension Unit

Stretch   S5 engine

Page 53: Japanese 2 nd  generation Dynamically Reconfigurable Processors

Screen Shot of Context MenuImplementation

Page 54: Japanese 2 nd  generation Dynamically Reconfigurable Processors

Screen Shot of Code Menu

User Function

Library Function

Implementation

Page 55: Japanese 2 nd  generation Dynamically Reconfigurable Processors

Screen Shot of Pointer

Input

Output

Implementation

Page 56: Japanese 2 nd  generation Dynamically Reconfigurable Processors

Screen Shot of MuCCRA-1

Memory Modules

Multiply Modules

Switching Element

PE

Implementation

Page 57: Japanese 2 nd  generation Dynamically Reconfigurable Processors

Screen Shot of MuCCRA-2

PE

Switching Element

Memory Modules

Implementation

Page 58: Japanese 2 nd  generation Dynamically Reconfigurable Processors

DRP Tile structure

PE

PE

PE

PE

PE

PE

PE

PEP

EPE

PE

PE

PE

PE

PE

PEP

EPE

PE

PE

PE

PE

PE

PEP

EPE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PE

PEP

EPE

PE

PE

PE

PE

PE

PEP

EPE

PE

PE

PE

PE

PE

PEP

EPE

PE

PE

PE

PE

PE

PE

HMEM HMEM HMEM HMEM

HMEM HMEM HMEM HMEM

VMEM

VMEM

VMEM

VMEM

VMEM

VMEM

VMEM

VMEM

VMEM

VMEM

VMEM

VMEM

VMEM

VMEM

VMEM

VMEM

State Transition Controller

VMEM ctrlVMEM ctrl

VMEM ctrlVMEM ctrl

1 port HMEM8bit × 8K entries

2 port VMEM8bit × 256 entries

Page 59: Japanese 2 nd  generation Dynamically Reconfigurable Processors

Task and context control in MuCCRA[FPL08]

• Context control– Multicontext switching with a

Context Pointer

• Task Control– Multiple tasks each of which is consisting of multiple

contexts are loaded from the centralized memory– A Virtual Hardware Mechanism

CSC(Context Switching

Controller)

TCC(Task Configuration

Controller)

SMU

ALU

RFile

0123・・・63

Configuration Data Memory

Context Memory

BA

DC

Target Tasks

MuCCRA PE Array

Context Pointer

Configuration Data(Contexts)

Control Signals

PE

7

Page 60: Japanese 2 nd  generation Dynamically Reconfigurable Processors

1 3 8 16 ManyNum. ofHW-contexts

Num. of CoresGranularityof core

10

100

1000

32bit

16bit

8bit

4bit

Common ProcessorVLIW

DAPDNA-2

DRPDRL

Granularity vs.Num. of Cores vs.Mum. of HW-contexts

8 16

CS2112

Multi-Core processor

32

Xbridge

FE-GA FPGA