2.07.06 technion digital lab project performance evaluation of virtex-ii-pro embedded solution of...

39
2.07.06 2.07.06 Technion Technion Digital Lab Project Digital Lab Project Performance evaluation Performance evaluation of Virtex-II-Pro of Virtex-II-Pro embedded solution of embedded solution of Xilinx Xilinx Students: Students: Tsimerman Tsimerman Igor Igor Firdman Firdman Leonid Leonid Supervisors: Supervisors: Rivkin Ina Rivkin Ina Bergman Bergman Alexander Alexander

Post on 22-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

Performance evaluation of Performance evaluation of Virtex-II-Pro embedded Virtex-II-Pro embedded

solution of Xilinxsolution of Xilinx Students:Students: Tsimerman Igor Tsimerman Igor

Firdman LeonidFirdman Leonid

Supervisors:Supervisors: Rivkin Ina Rivkin InaBergman AlexanderBergman Alexander

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

AgendaAgenda Project goalsProject goals Abstract Abstract Project resourcesProject resources System overviewSystem overview System implementation & results System implementation & results Results summaryResults summary Possible improvementsPossible improvements

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

Project goalsProject goals Creation, integration & testing of Xilinx’s Creation, integration & testing of Xilinx’s

PLB Master core using standard IPIF.PLB Master core using standard IPIF. Comparing between hardware (FPGA) Comparing between hardware (FPGA)

and software (PowerPC based) and software (PowerPC based) implementation.implementation.

Estimation of performance level of Virtex Estimation of performance level of Virtex II Pro embedded solution on real digital II Pro embedded solution on real digital design example. design example.

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

AbstractAbstract

PowerPC 405

Monitored BUS

Virtex II Pro Test PlatformTest Platform

HOST

Some design requires use of external Logic Analyzer for testability purposes. Virtex II Some design requires use of external Logic Analyzer for testability purposes. Virtex II Pro may have the capabilities to serve as Programmable On-Chip-Logic Analyzer.Pro may have the capabilities to serve as Programmable On-Chip-Logic Analyzer.In order to achieve modularity and unification of design, it is preferred to build design In order to achieve modularity and unification of design, it is preferred to build design around one standard bus.around one standard bus.Power PC or Hardware IP may be served as the analyzing units within Virtex II Pro, Power PC or Hardware IP may be served as the analyzing units within Virtex II Pro, therefore their performance must be evaluated for this task on the same standard bus therefore their performance must be evaluated for this task on the same standard bus (PLB).(PLB).

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

Project resourcesProject resources

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

Project resourcesProject resources

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

Virtex II Pro XC2VP30 – FF896Virtex II Pro XC2VP30 – FF896– ~30K ASIC gates~30K ASIC gates

– 136 18x18-bit Multipliers136 18x18-bit Multipliers

– 2448 Kb of BRAM (18K in each block)2448 Kb of BRAM (18K in each block)

– 2 Power PC 405 CPU core (up to 300 2 Power PC 405 CPU core (up to 300 MHZ each)MHZ each)

– 8 DCM (digital clock manager) units8 DCM (digital clock manager) units

– 8 Rocket IO transceivers (MGTs)8 Rocket IO transceivers (MGTs)

Project resourcesProject resources

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

System OverviewSystem Overview

Virtex II Pro

Generator

PLB

SW EN block

DCM 0CLKDV

Counter 1

Random event (reference)

Reset block

SYS_RST

PowerPC 405

Timer (on OPB)

Non critical intr

OCMCounter 0SW_EN

Random event

System block diagramSystem block diagram

Generator is creating mass stream ofrandom and sequential data patterns.

PowerPC and Master core doing the same logical function in data analyzing and being compared to each other in the end of the test.

Master core

Event counters count random events from Generator (Reference) and Master core.

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

System OverviewSystem Overview

Master core

Virtex II Pro

Generator

PLB

SW EN block

DCM 0CLKDV

Counter 1

Random event (reference)

Reset block

SYS_RST

PowerPC 405

Timer (on OPB)

Non critical intr

OCMCounter 0SW_EN

Random event

3

1

2

4

Start sequence

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

System OverviewSystem Overview

Master core

Virtex II Pro

Generator

PLB

SW EN block

DCM 0CLKDV

Counter 1

Random event (reference)

Reset block

SYS_RST

PowerPC 405

Timer (on OPB)

Non critical intr

OCMCounter 0SW_EN

Random event

1

2

Stop sequence

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

System OverviewSystem Overview

Random PatternRandom Pattern

Random event on PLB = Not increment by one of previous dataRandom event on PLB = Not increment by one of previous data

Master core

PowerPC 405

PLB

344 345 586 587 588 13 14 15 55 55 … 344 345 586 587 588 13 14 15 55 55 …

Random events

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

System OverviewSystem Overview

Generator Block DiagramGenerator Block Diagram

Pseudo-Random

Generator

Randomdelay

Count to PLBCounter

Controlled CLK

Din

32 bit 32 bit

Load

Max5 bit

Din

Load

Synchronized Random event

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

System OverviewSystem Overview

Pseudo-Random GeneratorPseudo-Random Generator

The placement and number of XORs may vary The placement and number of XORs may vary

Random cycle is 2^32 in lengthRandom cycle is 2^32 in length

Initial pattern is constInitial pattern is const

Pseudo - Random Generator

XOR

0 1 2 3 3130

Shift right each clock

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

System OverviewSystem Overview

ModelSim simulation resultsModelSim simulation results

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

System OverviewSystem Overview

Chip Scope resultsChip Scope results

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

System OverviewSystem Overview

Chip Scope resultsChip Scope results

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

System OverviewSystem Overview

Chip Scope resultsChip Scope results

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

System implementation & ResultsSystem implementation & ResultsPower PCPower PC

Code and data in OCMCode and data in OCM

32 bit data read32 bit data read

Power PC freq. = 300MHz, PLB freq. = 100MHzPower PC freq. = 300MHz, PLB freq. = 100MHz

The results were displayed at the end of the test via UART (Tera-Pro)The results were displayed at the end of the test via UART (Tera-Pro)

Code example:Code example:

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

Power PC – Chip Scope resultsPower PC – Chip Scope resultsNote:Note: All Chip Scope results are measured at: All Chip Scope results are measured at:

PLB sys_clk freq: 100MHzPLB sys_clk freq: 100MHz

Generator freq: 100/12 (8.33MHz)Generator freq: 100/12 (8.33MHz)

System implementation & ResultsSystem implementation & Results

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

Power PC – Chip Scope resultsPower PC – Chip Scope results

System implementation & ResultsSystem implementation & Results

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

Power PC – Chip Scope resultsPower PC – Chip Scope results

System implementation & ResultsSystem implementation & Results

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

Power PC – Chip Scope resultsPower PC – Chip Scope results

20 sys_clocks between PPC read requests.

Max freq: ~5MHz

System implementation & ResultsSystem implementation & Results

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

Power PC – Statistics resultsPower PC – Statistics resultsPowerPC

0.00000

0.50000

1.00000

1.50000

2.00000

2.50000

3.00000

3.50000

50 33.33 25 20 16.66 14.3 12.5 11.11 10 9.1 8.33 7.7 7.1 6.6 6.25

Generator's freq. [MHz]

No

rmal

Ave

rag

e o

f ra

nd

om

eve

nts

PowerPC

System implementation & ResultsSystem implementation & Results

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

Master coreMaster coreSingle transaction configurationSingle transaction configuration

Connected through standard IPIF 2.01aConnected through standard IPIF 2.01a

Performs data analyzing operation similar to PPC operation Performs data analyzing operation similar to PPC operation

Code example:Code example:

System implementation & ResultsSystem implementation & Results

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

System implementation & ResultsSystem implementation & ResultsMaster core – Chip Scope resultsMaster core – Chip Scope resultsNote:Note: All Chip Scope results are measured at: All Chip Scope results are measured at:

PLB sys_clk freq: 100MHzPLB sys_clk freq: 100MHz

Generator freq: 100/12 (8.33MHz)Generator freq: 100/12 (8.33MHz)

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

System implementation & ResultsSystem implementation & ResultsMaster core – Chip Scope resultsMaster core – Chip Scope results

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

System implementation & ResultsSystem implementation & ResultsMaster core – Chip Scope resultsMaster core – Chip Scope results

24 sys_clocks between Master core read requests.

Max freq: ~4.16MHz

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

System implementation & ResultsSystem implementation & ResultsMaster core – Statistics resultsMaster core – Statistics results

Master

0

2000000

4000000

6000000

8000000

10000000

12000000

50 33.33 25 20 16.66 14.3 12.5 11.11 10 9.1 8.33 7.7 7.1 6.6 6.25

Generator's freq. [MHz]

Ave

rag

e o

f ra

nd

om

eve

nts

Master

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

System implementation & ResultsSystem implementation & ResultsPPC & Master core – Chip Scope resultsPPC & Master core – Chip Scope results

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

System implementation & ResultsSystem implementation & ResultsPPC & Master core – Chip Scope resultsPPC & Master core – Chip Scope results

PPC: 18-22 sys_clocks between PPC read requests.

Aver Max freq: ~5MHzMaster:24 sys_clocks between PPC read requests.

Max freq: ~4.16MHz

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

System implementation & ResultsSystem implementation & ResultsPPC & Master core – Statistics resultsPPC & Master core – Statistics results

0

0.5

1

1.5

2

2.5

3

3.5

4

50 33.33 25 20 16.66 14.3 12.5 11.11 10 9.1 8.33 7.7 7.1 6.6 6.25

Generator's freq. [MHz]

No

rmal

Ave

rag

e o

f ra

nd

om

eve

nts

0

0.5

1

1.5

2

2.5

3

3.5

4

PowerPC

PPC&Master

Note: the statistic results are regarding only PPC transactionsNote: the statistic results are regarding only PPC transactions

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

ResultsResults SummarySummaryPPC:PPC: 20 system clocks between PPC read requests.

Max freq: ~5MHz

Master: Master: 24 system clocks between Master core read requests.

Max freq: ~4.16MHz

PPC & Master: PPC & Master: PPC:18-22 system clocks between PPC read requests.

Aver Max freq: ~5MHzMaster: 24 system clocks between PPC read requests.

Max freq: ~4.16MHz

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

ResultsResults SummarySummary

Previous results are valid for certain design. For example, additional statements in Previous results are valid for certain design. For example, additional statements in PPC design will cause additional delay between PPC read requests and therefore, PPC design will cause additional delay between PPC read requests and therefore, lower read frequency.lower read frequency.

Power PC – Alternative test implementationPower PC – Alternative test implementation

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

ResultsResults SummarySummary

PPC: 28 sys_clocks between PPC read requests. Max freq: ~3.57MHz

Power PC – Alternative test implementationPower PC – Alternative test implementation

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

The conclusion:The conclusion:

In current configuration Master core throughput is lower than PPC’s.In current configuration Master core throughput is lower than PPC’s.

Possible reasons:Possible reasons:– PLB protocol limitations in single read transactions PLB protocol limitations in single read transactions

(Max freq: sys_clk / 6 = 16.66MHz)(Max freq: sys_clk / 6 = 16.66MHz)

– IPIF latency timeIPIF latency time

ResultsResults SummarySummary

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

ResultsResults SummarySummaryPLB protocol limitationsPLB protocol limitations

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

ResultsResults SummarySummaryIPIF latency timeIPIF latency time

In this example, master core initiates single read transaction from slave on PLB through IPIF. We can see 23 clocks between Master’s request till valid data on BUS2IP_DATA signals.

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

Possible improvementsPossible improvementsPPC:PPC:

•Code optimization (assembly level)Code optimization (assembly level)•Avoid single read transactions to the same address whenever Avoid single read transactions to the same address whenever possible possible •Use burst mode and cache transactions.Use burst mode and cache transactions.

Master: Master:

•Avoid using standard Xilinx IPIF when using Master core on Avoid using standard Xilinx IPIF when using Master core on PLB (designing of embedded interface with PLB).PLB (designing of embedded interface with PLB).•Use FIFO and burst mode transactions. Use FIFO and burst mode transactions.

2.07.062.07.06TechnionTechnion

Digital Lab ProjectDigital Lab Project

That’s it!!!That’s it!!!