mid presentation part a project netanel yamin & by: shahar zuta moshe porian advisor: dual...

Post on 08-Jan-2018

221 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

algorithm overview INPUT FILE INPUT FILE Literal items ONLY A copy item consists of two bytes that represent from 3 to 18 bytes. literal item consist of one byte which represents himself LZRW3 COMPRESSOR OUTPUT FILE [----][-----]- [ ][ ][----] OUTPUT FILE [----][-----]- [ ][ ][----] GROUPS OF ITEMS (literal/Copy)

TRANSCRIPT

mid presentation Part A Project

Netanel Yamin & by: Shahar Zuta

Moshe porian Advisor:

Dual semester project November 2012

Contents Project Overview Project goals Requirements Architecture Micro architecture Problems & solutions Conclusions Testability Methodology Schedule

algorithm overview

INPUT FILE

-------------------------------------------------------

Literal items ONLY

A copy item consists of two bytes that represent from 3 to 18 bytes. literal item consist of one byte which represents himself

LZRW3 COMPRESSO

R

OUTPUT FILE

]----[-]-----[]-------[]-----------[]----[

GROUPS OF ITEMS(literal/Copy)

mechanism

HASH FUNCTIO

N

INDEX409

5

0

INPUT FILE:

Offset

Expression_c om

press _ion

E x p

Offset value=

0

XXX

ZZZ

YYY

UUU

demonstration

UUU

r e s

3

XXX

OutputExp

res

L.I

L.I

NOTE: The next 3 byte should be

“x p r” , then “ p r e “ and only then “r e s”, we did’nt demonstrate all the actions

for simplicity.

“L.I“ stands for

“Literal Item“

mechanism

HASH FUNCTIO

N

INDEX409

5

0

INPUT FILE:

Expres sion_c om

press _ion

Offset value=

XXX

ZZZ

YYY

UUU

demonstration

ZZZ

03

6

s i

9

_ o

YYYExp

res

Output

L.I

L.I

sio L.I

n_c L.I

Offset

cn

mechanism

HASH FUNCTIO

N

INDEX409

5

0

INPUT FILE:

Expression_c om

press _ion

Offset value=

XXX

ZZZ

YYY

UUU

demonstration

o m p

03

12

69

Exp

res

Output

L.I

L.I

sio L.I

n_c L.I

omp L.I

Offset

mechanism

HASH FUNCTIO

N

INDEX409

5

0

INPUT FILE:

Express _comp ress _io

Offset value=

XXX

ZZZ

YYY

UUU

r e s

XXX

03

15

12

96

demonstration

Exp

res

Output

L.I

L.I

sio L.I

n_c L.I

omp L.I

123

C.IXXX

io nn

3+

012345

Offset

“C.I“ stands for

“Copy Item “

Hash 3 bytes

Hash table [index

]

Enter offset

O.F-. Literal

item

Get offset

O.F.- Copy item

Length++

more

same byte

s

FWD 1 byte

FWD 3 +Length

bytes

START

index

empty filed

Same 3

bytes

no

yes

yes

Project Goals

Implementation of LZRW3 data compression algorithm

Implementing strong debugging capabilities via GUI

RequirementsVHDL implementationDE2 development board that features an

Altera Cyclone II FPGAFPGA – Host communication via UART

protocolUse internal memory on FPGA, no interface

to external memoryAdapted to data templates of 2Kbyte to

32KbyteHigh performance- data transfer of 1Gbps

RequirementsVHDL implementationXUPV5 development board that features an

Xilinx Virtex-5 FPGAFPGA – Host communication via UART

protocolUse internal memory on FPGA, no interface

to external memoryAdapted to data templates of 2Kbyte to

32KbyteHigh performance- data transfer of 1Gbps

Architecture

Rx PATH

Tx PATH

INPUT BLOCK memory LZRW3

COMPRESSOR

CORE

COMPRESSED FILE memory

GUI

XILINX VIRTEX 5 ON XUVP505 BOARD

UART

UART

Architecture

Rx PATH

Tx PATH

INPUT BLOCKmemory LZRW3

COMPRESSOR

CORE

COMPRESSED FILE memory

GUI

XILINX VIRTEX 5 ON XUVP505 BOARD

UART

UART

LZRW3 COMPRESSOR

CORE

Lzrw3_goLzrw3_mode

data_input_byte (7..0)

data_input_valid

data_input_taken

clk

Lzrw3_busy

Lzrw3_done

Lzrw3_output_group_size (4..0)

data_output_validdata_output_take

ndata_output_la

streset

data_output_bytes(13..0)

End_of_file

STAGE 1 – three bytes buffer

3 BYTESBUFFER

enable

reset

New_byte(7..0)

clk

Newer_byte(7..0)

Mid_byte(7..0)

Older_byte(7..0)

STAGE 2- hash function

enable

HASH FUNCTION

middle_byte(7..0)

clk

Table_index(11..0)

older_byte(7..0)

Newer_byte(7..0)

reset

TABLE INDEX = (((40543*(((*(PTR))<<8)^((*((PTR)+1))<<4)^(*((PTR)+2))))>>4) & 0xFFF) PTR pointes to the first byte . TABLE INDEX range: 0 to 4095.

7 6 5 4 3 2 1 0

7 6 5 4 3 2 1 0

7 6 5 4 3 2 1 0

7 6 5 4 3 7 2 6 1 5 0 4 3 7 2 6 1 5 0 4 3 2 1 0

, ,0000,0000

0000, , ,0000

0000,0000, ,

, , , , , , , , , , , , , , ,

a a a a a a a a

b b b b b b bb

c c c c c c c c

a a a a a b a b a b a b b c b c b c b c c c c c

STAGE 2- RTL view

STAGE 3 – hash tableenable

HASH TABLE

Data_out_valid

Table_index(0..11)

clk

Offset(19..0)

Current_offset(19..0)Offset

counter

reset

clear

Current_offset

000011010110

Valid bits

21 bits

4096

ro

ws

Offsetcounter

DATA_ IN

INDEX

ADDRESS

Offset

Data_out_valid

1

Offsetcounter

STAGE 4 – input file memory

Stage 4 implementationInput file memory should supply three byte at

the same time.

How to choose bank when byte arrives?

# _ %3Bank current offset

__ _3

current offsetAddress in bank

SOLUTIONInstead of counting in stage 3 and divide in

stage 4, we incerment by one only after three clock cycles.

In this configuration we expand the offset by 2 bits (tagging) to select the the data need to write into.

Hash table size now is 4096 x (19+2) .1001010101001110011 10

19 bits 2 bits

Solution costs (mem units) Memory usage At stage 3 from synplify_pro:

same as before.

LUT usage:

20 4096 81920 80 3 _ 108bit Kbit RAM block Kbit

36Kbit

Back to stage 4

Input file memorybanks

comparator

Continue

1

0

clk

clkTentative

Next address

clk

counter

offset

TAG

Com

priso

n_va

lid

Compare_success

clk

Offset_tag

Tentative_tag

clk

clk

Tentative_taken

Compare_success_P

Item_length_p

O

ffset

_val

id

Bank 0,1,2addresses

0

1

Addresses

alignment

Older_byte_P

Offset_valid

CBA

3401

Y Z

TENT

00

A

0

0X

B CD

CD

B

B

11

1

0

INDEX

TAG indicate the banks bytes order

Input file memorybanks

comparator

Continue

1

0

clk

clkTentative

Next address

clk

counter

offset

TAG

Com

priso

n_va

lid

Compare_success

clk

Offset_tag

Tentative_tag

clk

clk

Tentative_taken

Compare_success_P

Item_length_p

O

ffset

_val

id

Bank 0,1,2addresses

0

1

Addresses

alignment

Older_byte_P

Offset_valid

D C

00

1T

DE

CINDE

X

C

Problem(1)in stage 4, at first we implemented the counter that counts the number of successful comparisons in the comparator which is made of an asynchronous process. It passed simulations but was not synthesizable.

Solution(1)we’ve changed the architecture of the units so the counter is implemented in a synchronous unit, it receives a signal from the asynchronous comparator if the comparison was successful and responds accordingly.

Problem(2)in stage 4, in order to perform the comparison of the current 3 bytes in the pipe and three bytes from the RAM memory we need to extract three following bytes from different addresses at one clock period.

Solution(2)we distributed the one memory we had into 3 RAM memory banks which contains following addresses so in case we want to extract 3 following bytes from the memory we’ll extract one byte from each bank.

Problem(3)in stage 4, the current pipe bytes that arrive the comparator are arranged in their arrival order but the three bytes withdrawn from the banks aren’t necessarily arranged in the right order.

Reading configurations1. SAME ADDRESES

2. DIFFERENT ADDRESS

Reading configurations

3. DIFFERENT ADDRESS # 2

Reading configurations

(�ׂ3)SolutionWe used the TAG that represented the extracted bytes addresses to determine which extracted byte will be compared with which current piped byte.

Problem(4)In stage 4, the RAM memory banks need to have the next address to extract on the next

clock before the end of the current clock .

(4)SolutionWe created two units that will contain the next two possible addresses (tentative

address unit or address align unit).

ConclusionsWriting code for synthesis is different from

writing code for simulation.In asynchronous implementation all the

signals need to be in the sensitivity list.Reset should not pass through any logic.Think hardware when writing VHDL code for

synthesis.Keep on simplicity to achieve more flexibility.

2048

2048Testability Synthesisable

Hash Function

Block

UnsynthesisableSimulation Function

Random input

generator

A B C

A B C

Assert the comparison and report to console

Input file

MethodologyStage data flow review.Writing VHDL code.Writing VHDL testbench.Code review and debugging.Synthesis check- synplify.

Check RTL view.Check CLK constraints.

Commit SVN folders and update data flow if needed.

Next stage data flow review.

Simulation & debugging

Schedule 1/2DateGoals

24/4/2012 – 1/5/2012

Project Characterization& Algorithm interpreting

2/5/2012Characterization Presentation2/5/2012 –

16/5/2012Full Characterization of all blocks

17/5/2012 – 1/7/2012

•System blocks VHDL •Design

1/7/2012 – 27/7/2012

Work on project paused for exams

29/7/2012– 11/11/2012

•System blocks VHDL •Design (Cont.)•Writing every unit a simulating testbench

Schedule 2/2DateGoals 12/11/2012Mid presentation

13/11/2012– 19/12/2012

•System blocks VHDL •Design (Cont.)•Writing every unit a simulating testbench

20/1/2012Part A final- Core Simulation Vs. Golden model

21/1/2012 – 15/2/2012

Assemble all units and FPGA synthesis

16/2/2012 – 28/2/2012

GUI implementation

1/3/2012 – 10/3/2012

Final overall Tests & debug

11/3/2012 – 31/3/2012

Editing and finishing project portfolio

1/4/2012Final presentation

top related