implementing risc multi core processor using hls language – bluespec final presentation liam...

48
Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter 2013 Department of Electrical Engineeri Electronics Computers Communications Technion Israel Institute of Technology

Upload: ashley-goodman

Post on 05-Jan-2016

236 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

Implementing RISC Multi Core Processor Using HLS Language – BLUESPECFinal Presentation

Liam Wigdor Advisor Mony OrbachShirel Josef

Semesterial Winter 2013

Department of Electrical EngineeringElectronicsComputersCommunicationsTechnion Israel Institute of Technology

Page 2: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

AGENDA• Introduction

• BlueSpec Development Environment

• Project’s Goals

• Project’s Requirements

• Design Overview

• Design Stage 1 – Instruction Memory

• Design Stage 2 – Data Memory

• Design Stage 3 – MultiCore

• The Scalable Processor

• Benchmarks & Results

• Summary & Conclusion

Page 3: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

Introduction• The future of single core is gloomy

• Multi cores can be used for parallel computing

• Multi cores may be used as specific accelerators as well as general purpose core.

Ecclesiastes 4:9-12 - Two are better than one

Page 4: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

BlueSpec Development Environment• High level language for hardware description

• Rules – describing dynamic behavior– Atomic, fires once per cycle– Can run concurrently if not conflicted– Scheduled by BlueSpec automatically

• Module - Same as object in an object-oriented language.

• Interface – A module can be manipulated via the methods of its interface.Interface can be used by parent module only!

Page 5: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

Project’s Goals• Main Goal:

– Implementing RISC multi core processor using BlueSpec.– Evaluate and analyze multi core design performance

compared to single core.

• Derived Goals:– Learning the BlueSpec principles, syntax and working

environment.– Understanding and using single core RISC processor to

implement multi core processor.– Validate design in BlueSpec level by using simple benchmark

programs and evaluating performance to single core.

Page 6: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

Project’s Requirements

• Scalable Architecture: The architecture does not depend on the number of cores.

• Shared data memory

Core 1 Core 2

Shared Data Memory

Single Core Dual Core Quadratic Core

Page 7: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

Baseline Processor – Single CoreThe SMIPS BlueSpec code taken from 046004 - Architecting and Implementing Microprocessors in BlueSpec

• 2 Stage Pipeline• Data and Instruction memory as sub modules• Includes naïve branch predictor

Page 8: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

Design Overview

• In order to achieve project’s goals our design consisted of 3 stages:– Stage 1 – Instruction memory

– Stage 2 – Data memory

– Stage 3 – multicore

Page 9: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

Stage 1 – Instruction Memory Motiviation• Each core execute different instructions.

• Can’t be achieved with I.Mem as CPU’s sub module.

• Solution: Draw out the I.Mem to the same hierarchy as the CPU module.

Core 1

D Mem

I Mem

Page 10: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

• Modules use get/put Interface. (CPU as client, memory as Server)

• connect_resps and connect_reqs rules use CPU and I.Mem interfaces in order to connect the requests and the responses.

Test Bench I mem

put_request.put

get_response.get

CPU

get_request.get

put_response.put

f_out

f_in f_out

f_inRule:

connect_reqs

Rule:connect_resps

Stage 1 – Instruction Memory Implementation Methods

Page 11: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

• Problem: Fetching Instruction latency is 5 cycles

• Cycle 1:– CPU rule enqueue the PC address into memory request

to f_out.

Test Bench I mem

put_request.put

get_response.get

CPU

get_request.get

put_response.put

f_out

f_in f_out

f_inRule:

connect_reqs

Rule:connect_resps

Stage 1 – Instruction Memory Latency problem – cycle 1

Page 12: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

• Problem: Fetching Instruction latency is 5 cycles

• Cycle 2:– connect_reqs dequeue the request from CPU f_out and

enqueue it into I.mem f_in fifo.

Test Bench I mem

put_request.put

get_response.get

CPU

get_request.get

put_response.put

f_out

f_in f_out

f_inRule:

connect_reqs

Rule:connect_resps

Stage 1 – Instruction Memory Latency problem – cycle 2

Page 13: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

• Problem: Fetching Instruction latency is 5 cycles

• Cycle 3:– I.Mem dequeue the request from f_in, process it and

enqueue the response to f_out

Test Bench I mem

put_request.put

get_response.get

CPU

get_request.get

put_response.put

f_out

f_in f_out

f_inRule:

connect_reqs

Rule:connect_resps

Stage 1 – Instruction Memory Latency problem – cycle 3

Page 14: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

• Problem: Fetching Instruction latency is 5 cycles

• Cycle 4:– connect_resps dequeue the response from I.mem f_out

and enqueue it into CPU f_in fifo.

Test Bench I mem

put_request.put

get_response.get

CPU

get_request.get

put_response.put

f_out

f_in f_out

f_inRule:

connect_reqs

Rule:connect_resps

Stage 1 – Instruction Memory Latency problem – cycle 4

Page 15: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

• Problem: Fetching Instruction latency is 5 cycles

• Cycle 5:– CPU rule dequeue the response from f_in and process it.

Test Bench I mem

put_request.put

get_response.get

CPU

get_request.get

put_response.put

f_out

f_in f_out

f_inRule:

connect_reqs

Rule:connect_resps

Stage 1 – Instruction Memory Latency problem – cycle 5

Page 16: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

• Solution: Using bypass fifo for f_in and f_out instead of regular fifo, allowing enqueue and dequeue in the same cycle.New latency: 1 Cycle

• doFetch execute after response arrives.

Stage 1 – Instruction Memory Solution – Overview

Test Bench I mem

put_request.put

get_response.get

CPU

get_request.get

put_response.put

f_out

f_in f_out

f_inRule:

connect_reqs

Rule:connect_resps

Page 17: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

• Each core access the same data memory to achieve parallelism

• Can’t be achieved with D.Mem as CPU’s sub module.

• Solution: Draw out the D.Mem to the same hierarchy as the CPU module.

Core 1

D Mem

I Mem

Stage 2 – Data Memory Motivation

Page 18: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

• Modules use get/put Interface. (CPU as client, memory as Server)

• dconnect_resps and dconnect_reqs rules use CPU and D.Mem interfaces in order to connect the requests and the responses.

Stage 2 – Data Memory Implementation method

Page 19: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

• Rule can only fire once per cycle.• doExecute initiate memory operation and

process the response and cannot fire twice in the same cycle.

Stage 2 – Data MemoryLatency Problem

Page 20: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

• Solution:– Add memory stage in the pipeline data path, requesting

data in the execution Stage and receiving it in the memory stage.

• This solution was not implemented as we focused on creating multi-core processor.

Stage 2 – Data Memorysuggested solution

Page 21: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

• Connecting multiple cores to their instruction memory and the shared data memory.

• Higher hierarchy module must be created in order to establish these connections.

Core 1 Core 2

Shared Data Memory

I Mem 1 I Mem 2

Core 1

D Mem

I Mem

Stage 3 – Multi-Core processor

Page 22: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

• Connections are established using dconnect_reqs and dconnect_resps between each core to the same data memory

Stage 3 – Multi-Core processor Implementation method

Page 23: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

• Issue 1: D.Mem has only one port, How can memory access be scheduled?

• Solution: BlueSpec automaticaly schedule design rules execution, giving priority to lower numbered core.

Stage 3 – Multi-Core processorIssue 1 - Scheduling

Page 24: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

• Issue 2: – Connection rules constantly try to fire.

– Need to ensure that the CPU which accessed the memory will obtain the response and not other core.

Stage 3 – Multi-Core processorIssue 2 – Response Path

Page 25: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

• Issue 3: – When simulating the processor 2 cores were unable to

operate together resulting poor performance.

Stage 3 – Multi-Core processorIssue 3 – Performance

Page 26: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

– Using BlueSpec tools we observed that dconnect_resps_core2 blocked by dconnect_resps_core1

– Therefore, core2 execute stage was blocked when core1 operated.

Stage 3 – Multi-Core processorIssue 3 – debugging

Page 27: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

– get_response interface in D.Mem was:

– Due to f_out.deq, only one core could obtain response and blocked all other cores because D.Mem f_out fifo was empty.

– get_response interface changed to:

Stage 3 – Multi-Core processorIssue 2,3 – Solution

Page 28: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

– step 1: sendMessage enqueue the response which was prepared in the previous cycle

put_request.put

get_response.getf_out

f_inRule:

dconnect_reqs

Rule:dconnect_resps

D mem

Rule:sendMessage

Rule:dMemoryResponse

Stage 3 – Multi-Core processorChange in D.mem – step 1

Page 29: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

– step 2: the connection use fifo.first (do not dequeue f_out) and new request arrives

put_request.put

get_response.getf_out

f_inRule:

dconnect_reqs

Rule:dconnect_resps

D mem

Rule:sendMessage

Rule:dMemoryResponse

Stage 3 – Multi-Core processorChange in D.mem – step 2

Page 30: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

– step 3: dMemoryResponse prepare the new response and dequeue the response that was sent in the beginning of the cycle

put_request.put

get_response.getf_out

f_inRule:

dconnect_reqs

Rule:dconnect_resps

D mem

Rule:sendMessage

Rule:dMemoryResponse

Stage 3 – Multi-Core processorChange in D.mem – step 3

Page 31: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

• Two cores executing instructions simultaneously, sharing the same data memory

Stage 3 – Multi-Core processorParallel execution

Page 32: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

The Scalable Processor• 3 easy steps are required to add cores:

– Step 1: • Creating new instruction memory

– Step 2: • Connecting cores to data and instruction memories.

– Step 3:• Adding monitoring mechanism for each core

• Architectural independency in number of cores

Page 33: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

Benchmark 1 – Description • Benchmark 1 – pure computational program

– No memory instructions– Pure parallelism due to no blocking

Page 34: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

Benchmark 1 – Results • Benchmark 1 – pure computational program

– Results:

– With no memory instructions, all cores working independently and simultaneously.

– the results match the concept of multi-core as 8 cores can do the same “job” as 1 core in 1/8 of the time.

Page 35: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

Benchmark 2 – Description • Benchmark 2 – Short Image Processing

– Input: 32X32 binary image– Output: inverted image– Using memory instructions

Page 36: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

Benchmark 2 – Example • Benchmark 2 – Short Image Processing

– Image processing result:

Page 37: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

Benchmark 2 – Results• Benchmark 2 – Short Image Processing

– Results:

– 2 cores managed to multiply performance by 2– 4 cores and 8 cores improvement declined as can be

predicted by the rule of diminishing marginal productivity.– The gap between the memory instruction was enough for 2

cores to operate with phase difference allowing each core to access the memory without blocking the other.

Page 38: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

Benchmark 3/4 – Description • Benchmark 3/4 – Pure memory accessing

program– Mostly SW instructions or LW instruction– SW is “fire and forget” instruction, however load instruction

wait for response

Page 39: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

Benchmarks 3/4 – Results • Benchmark 3/4 – Pure memory accessing

program– Results:

– Single core allocate cycles to computation, therefore memory is idle.

– In multiple cores, some cores execute computation instruction and others memory instructions allowing the maximize memory utilization.

Page 40: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

Benchmark 5 – Description • Benchmark 5 – Long Image Processing

– Input: 32X32 binary image– Output: inverted image– Using memory instructions– However, processing part takes longer than benchmark2.

- Motivation – Larger gap betweenmemory instructions.

Page 41: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

Benchmark 5 – Results • Benchmark 5 – Long Image Processing

– Results:

– As predicted in benchmark 2, larger gape between memory instructions resulted in greater performance for quadratic core.

– The larger the gap, more cores are capable to operate in different phase allowing them not to be blocked by other cores memory access.

Page 42: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

Summary & Conclusion• Design included 3 stages:

– Stage 1 – Instruction memory– Stage 2 – Data memory– Stage 3 – Multi core

• Scalable and shared data memory requirements achieved.

• MultiCore increase data memory utilization (shown in benchmark 3/4)

Page 43: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

Summary & Conclusion

• the number of cores should be chosen with regards to executed program

• Using mutlicore processor can enhance performance but after certain number of cores adding more cores will not result in better performance.

Page 44: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

Summary & Conclusion - BlueSpec

• Pros:– High abstraction level of design – easier to focus on goal.– Automatic scheduling of modules interactions.– High level language – more human readable.

• Cons:– Hard to optimize – understanding the automatic scheduling

mechanism takes time.– Decipher scheduling errors and warnings .– Lack of “knowledge-base”.

Page 45: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

Summary & Conclusion - FAQ• Problem: Each core execute same instructions• Solution: Draw out the I.Mem to the same hierarchy as the

CPU module.

• Problem: Client/Server interface latency is 5 cycles.• Solution: Use bypass fifo instead of regular fifo.

• load instructions latency cannot be 1 cycle even when using bypass fifo.

• Solution: Add memory stage in the pipeline data path, requesting data in the execution Stage and receiving it in the memory stage. (Not implemented)

Page 46: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

Summary & Conclusion - FAQ• Problem: D.Mem has only one port, How can memory

access be scheduled?• Solution: BlueSpec automatically schedule design rules

execution, giving priority to lower numbered core.

• Problem: Need to ensure that the CPU which accessed the memory will obtain the response and not other core.

• Solution: Change interface so that every core can receive and validate response possession.

Page 47: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

Future Projects Possibilities: • what’s next: MultiCore 2.0

– Design’s verification on hardware.

– Adding memory stage to reduce load latency.• Send request in execute stage and receive response in memory stage.

– Implement cache to reduce memory access.

– Implement multiple port data memory.

– Design mechanism for memory coherence .

Page 48: Implementing RISC Multi Core Processor Using HLS Language – BLUESPEC Final Presentation Liam Wigdor Advisor Mony Orbach Shirel Josef Semesterial Winter

As BlueSpec alluring advertisement says: