2.07.062.07.06TechnionTechnion
Digital Lab ProjectDigital Lab Project
Performance evaluation of Performance evaluation of Virtex-II-Pro embedded Virtex-II-Pro embedded
solution of Xilinxsolution of Xilinx Students:Students: Tsimerman Igor Tsimerman Igor
Firdman LeonidFirdman Leonid
Supervisors:Supervisors: Rivkin Ina Rivkin InaBergman AlexanderBergman Alexander
2.07.062.07.06TechnionTechnion
Digital Lab ProjectDigital Lab Project
AgendaAgenda Project goalsProject goals Abstract Abstract Project resourcesProject resources System overviewSystem overview System implementation & results System implementation & results Results summaryResults summary Possible improvementsPossible improvements
2.07.062.07.06TechnionTechnion
Digital Lab ProjectDigital Lab Project
Project goalsProject goals Creation, integration & testing of Xilinx’s Creation, integration & testing of Xilinx’s
PLB Master core using standard IPIF.PLB Master core using standard IPIF. Comparing between hardware (FPGA) Comparing between hardware (FPGA)
and software (PowerPC based) and software (PowerPC based) implementation.implementation.
Estimation of performance level of Virtex Estimation of performance level of Virtex II Pro embedded solution on real digital II Pro embedded solution on real digital design example. design example.
2.07.062.07.06TechnionTechnion
Digital Lab ProjectDigital Lab Project
AbstractAbstract
PowerPC 405
Monitored BUS
Virtex II Pro Test PlatformTest Platform
HOST
Some design requires use of external Logic Analyzer for testability purposes. Virtex II Some design requires use of external Logic Analyzer for testability purposes. Virtex II Pro may have the capabilities to serve as Programmable On-Chip-Logic Analyzer.Pro may have the capabilities to serve as Programmable On-Chip-Logic Analyzer.In order to achieve modularity and unification of design, it is preferred to build design In order to achieve modularity and unification of design, it is preferred to build design around one standard bus.around one standard bus.Power PC or Hardware IP may be served as the analyzing units within Virtex II Pro, Power PC or Hardware IP may be served as the analyzing units within Virtex II Pro, therefore their performance must be evaluated for this task on the same standard bus therefore their performance must be evaluated for this task on the same standard bus (PLB).(PLB).
2.07.062.07.06TechnionTechnion
Digital Lab ProjectDigital Lab Project
Project resourcesProject resources
2.07.062.07.06TechnionTechnion
Digital Lab ProjectDigital Lab Project
Project resourcesProject resources
2.07.062.07.06TechnionTechnion
Digital Lab ProjectDigital Lab Project
Virtex II Pro XC2VP30 – FF896Virtex II Pro XC2VP30 – FF896– ~30K ASIC gates~30K ASIC gates
– 136 18x18-bit Multipliers136 18x18-bit Multipliers
– 2448 Kb of BRAM (18K in each block)2448 Kb of BRAM (18K in each block)
– 2 Power PC 405 CPU core (up to 300 2 Power PC 405 CPU core (up to 300 MHZ each)MHZ each)
– 8 DCM (digital clock manager) units8 DCM (digital clock manager) units
– 8 Rocket IO transceivers (MGTs)8 Rocket IO transceivers (MGTs)
Project resourcesProject resources
2.07.062.07.06TechnionTechnion
Digital Lab ProjectDigital Lab Project
System OverviewSystem Overview
Virtex II Pro
Generator
PLB
SW EN block
DCM 0CLKDV
Counter 1
Random event (reference)
Reset block
SYS_RST
PowerPC 405
Timer (on OPB)
Non critical intr
OCMCounter 0SW_EN
Random event
System block diagramSystem block diagram
Generator is creating mass stream ofrandom and sequential data patterns.
PowerPC and Master core doing the same logical function in data analyzing and being compared to each other in the end of the test.
Master core
Event counters count random events from Generator (Reference) and Master core.
2.07.062.07.06TechnionTechnion
Digital Lab ProjectDigital Lab Project
System OverviewSystem Overview
Master core
Virtex II Pro
Generator
PLB
SW EN block
DCM 0CLKDV
Counter 1
Random event (reference)
Reset block
SYS_RST
PowerPC 405
Timer (on OPB)
Non critical intr
OCMCounter 0SW_EN
Random event
3
1
2
4
Start sequence
2.07.062.07.06TechnionTechnion
Digital Lab ProjectDigital Lab Project
System OverviewSystem Overview
Master core
Virtex II Pro
Generator
PLB
SW EN block
DCM 0CLKDV
Counter 1
Random event (reference)
Reset block
SYS_RST
PowerPC 405
Timer (on OPB)
Non critical intr
OCMCounter 0SW_EN
Random event
1
2
Stop sequence
2.07.062.07.06TechnionTechnion
Digital Lab ProjectDigital Lab Project
System OverviewSystem Overview
Random PatternRandom Pattern
Random event on PLB = Not increment by one of previous dataRandom event on PLB = Not increment by one of previous data
Master core
PowerPC 405
PLB
344 345 586 587 588 13 14 15 55 55 … 344 345 586 587 588 13 14 15 55 55 …
Random events
2.07.062.07.06TechnionTechnion
Digital Lab ProjectDigital Lab Project
System OverviewSystem Overview
Generator Block DiagramGenerator Block Diagram
Pseudo-Random
Generator
Randomdelay
Count to PLBCounter
Controlled CLK
Din
32 bit 32 bit
Load
Max5 bit
Din
Load
Synchronized Random event
2.07.062.07.06TechnionTechnion
Digital Lab ProjectDigital Lab Project
System OverviewSystem Overview
Pseudo-Random GeneratorPseudo-Random Generator
The placement and number of XORs may vary The placement and number of XORs may vary
Random cycle is 2^32 in lengthRandom cycle is 2^32 in length
Initial pattern is constInitial pattern is const
Pseudo - Random Generator
XOR
0 1 2 3 3130
Shift right each clock
2.07.062.07.06TechnionTechnion
Digital Lab ProjectDigital Lab Project
System OverviewSystem Overview
ModelSim simulation resultsModelSim simulation results
2.07.062.07.06TechnionTechnion
Digital Lab ProjectDigital Lab Project
System OverviewSystem Overview
Chip Scope resultsChip Scope results
2.07.062.07.06TechnionTechnion
Digital Lab ProjectDigital Lab Project
System OverviewSystem Overview
Chip Scope resultsChip Scope results
2.07.062.07.06TechnionTechnion
Digital Lab ProjectDigital Lab Project
System OverviewSystem Overview
Chip Scope resultsChip Scope results
2.07.062.07.06TechnionTechnion
Digital Lab ProjectDigital Lab Project
System implementation & ResultsSystem implementation & ResultsPower PCPower PC
Code and data in OCMCode and data in OCM
32 bit data read32 bit data read
Power PC freq. = 300MHz, PLB freq. = 100MHzPower PC freq. = 300MHz, PLB freq. = 100MHz
The results were displayed at the end of the test via UART (Tera-Pro)The results were displayed at the end of the test via UART (Tera-Pro)
Code example:Code example:
2.07.062.07.06TechnionTechnion
Digital Lab ProjectDigital Lab Project
Power PC – Chip Scope resultsPower PC – Chip Scope resultsNote:Note: All Chip Scope results are measured at: All Chip Scope results are measured at:
PLB sys_clk freq: 100MHzPLB sys_clk freq: 100MHz
Generator freq: 100/12 (8.33MHz)Generator freq: 100/12 (8.33MHz)
System implementation & ResultsSystem implementation & Results
2.07.062.07.06TechnionTechnion
Digital Lab ProjectDigital Lab Project
Power PC – Chip Scope resultsPower PC – Chip Scope results
System implementation & ResultsSystem implementation & Results
2.07.062.07.06TechnionTechnion
Digital Lab ProjectDigital Lab Project
Power PC – Chip Scope resultsPower PC – Chip Scope results
System implementation & ResultsSystem implementation & Results
2.07.062.07.06TechnionTechnion
Digital Lab ProjectDigital Lab Project
Power PC – Chip Scope resultsPower PC – Chip Scope results
20 sys_clocks between PPC read requests.
Max freq: ~5MHz
System implementation & ResultsSystem implementation & Results
2.07.062.07.06TechnionTechnion
Digital Lab ProjectDigital Lab Project
Power PC – Statistics resultsPower PC – Statistics resultsPowerPC
0.00000
0.50000
1.00000
1.50000
2.00000
2.50000
3.00000
3.50000
50 33.33 25 20 16.66 14.3 12.5 11.11 10 9.1 8.33 7.7 7.1 6.6 6.25
Generator's freq. [MHz]
No
rmal
Ave
rag
e o
f ra
nd
om
eve
nts
PowerPC
System implementation & ResultsSystem implementation & Results
2.07.062.07.06TechnionTechnion
Digital Lab ProjectDigital Lab Project
Master coreMaster coreSingle transaction configurationSingle transaction configuration
Connected through standard IPIF 2.01aConnected through standard IPIF 2.01a
Performs data analyzing operation similar to PPC operation Performs data analyzing operation similar to PPC operation
Code example:Code example:
System implementation & ResultsSystem implementation & Results
2.07.062.07.06TechnionTechnion
Digital Lab ProjectDigital Lab Project
System implementation & ResultsSystem implementation & ResultsMaster core – Chip Scope resultsMaster core – Chip Scope resultsNote:Note: All Chip Scope results are measured at: All Chip Scope results are measured at:
PLB sys_clk freq: 100MHzPLB sys_clk freq: 100MHz
Generator freq: 100/12 (8.33MHz)Generator freq: 100/12 (8.33MHz)
2.07.062.07.06TechnionTechnion
Digital Lab ProjectDigital Lab Project
System implementation & ResultsSystem implementation & ResultsMaster core – Chip Scope resultsMaster core – Chip Scope results
2.07.062.07.06TechnionTechnion
Digital Lab ProjectDigital Lab Project
System implementation & ResultsSystem implementation & ResultsMaster core – Chip Scope resultsMaster core – Chip Scope results
24 sys_clocks between Master core read requests.
Max freq: ~4.16MHz
2.07.062.07.06TechnionTechnion
Digital Lab ProjectDigital Lab Project
System implementation & ResultsSystem implementation & ResultsMaster core – Statistics resultsMaster core – Statistics results
Master
0
2000000
4000000
6000000
8000000
10000000
12000000
50 33.33 25 20 16.66 14.3 12.5 11.11 10 9.1 8.33 7.7 7.1 6.6 6.25
Generator's freq. [MHz]
Ave
rag
e o
f ra
nd
om
eve
nts
Master
2.07.062.07.06TechnionTechnion
Digital Lab ProjectDigital Lab Project
System implementation & ResultsSystem implementation & ResultsPPC & Master core – Chip Scope resultsPPC & Master core – Chip Scope results
2.07.062.07.06TechnionTechnion
Digital Lab ProjectDigital Lab Project
System implementation & ResultsSystem implementation & ResultsPPC & Master core – Chip Scope resultsPPC & Master core – Chip Scope results
PPC: 18-22 sys_clocks between PPC read requests.
Aver Max freq: ~5MHzMaster:24 sys_clocks between PPC read requests.
Max freq: ~4.16MHz
2.07.062.07.06TechnionTechnion
Digital Lab ProjectDigital Lab Project
System implementation & ResultsSystem implementation & ResultsPPC & Master core – Statistics resultsPPC & Master core – Statistics results
0
0.5
1
1.5
2
2.5
3
3.5
4
50 33.33 25 20 16.66 14.3 12.5 11.11 10 9.1 8.33 7.7 7.1 6.6 6.25
Generator's freq. [MHz]
No
rmal
Ave
rag
e o
f ra
nd
om
eve
nts
0
0.5
1
1.5
2
2.5
3
3.5
4
PowerPC
PPC&Master
Note: the statistic results are regarding only PPC transactionsNote: the statistic results are regarding only PPC transactions
2.07.062.07.06TechnionTechnion
Digital Lab ProjectDigital Lab Project
ResultsResults SummarySummaryPPC:PPC: 20 system clocks between PPC read requests.
Max freq: ~5MHz
Master: Master: 24 system clocks between Master core read requests.
Max freq: ~4.16MHz
PPC & Master: PPC & Master: PPC:18-22 system clocks between PPC read requests.
Aver Max freq: ~5MHzMaster: 24 system clocks between PPC read requests.
Max freq: ~4.16MHz
2.07.062.07.06TechnionTechnion
Digital Lab ProjectDigital Lab Project
ResultsResults SummarySummary
Previous results are valid for certain design. For example, additional statements in Previous results are valid for certain design. For example, additional statements in PPC design will cause additional delay between PPC read requests and therefore, PPC design will cause additional delay between PPC read requests and therefore, lower read frequency.lower read frequency.
Power PC – Alternative test implementationPower PC – Alternative test implementation
2.07.062.07.06TechnionTechnion
Digital Lab ProjectDigital Lab Project
ResultsResults SummarySummary
PPC: 28 sys_clocks between PPC read requests. Max freq: ~3.57MHz
Power PC – Alternative test implementationPower PC – Alternative test implementation
2.07.062.07.06TechnionTechnion
Digital Lab ProjectDigital Lab Project
The conclusion:The conclusion:
In current configuration Master core throughput is lower than PPC’s.In current configuration Master core throughput is lower than PPC’s.
Possible reasons:Possible reasons:– PLB protocol limitations in single read transactions PLB protocol limitations in single read transactions
(Max freq: sys_clk / 6 = 16.66MHz)(Max freq: sys_clk / 6 = 16.66MHz)
– IPIF latency timeIPIF latency time
ResultsResults SummarySummary
2.07.062.07.06TechnionTechnion
Digital Lab ProjectDigital Lab Project
ResultsResults SummarySummaryPLB protocol limitationsPLB protocol limitations
2.07.062.07.06TechnionTechnion
Digital Lab ProjectDigital Lab Project
ResultsResults SummarySummaryIPIF latency timeIPIF latency time
In this example, master core initiates single read transaction from slave on PLB through IPIF. We can see 23 clocks between Master’s request till valid data on BUS2IP_DATA signals.
2.07.062.07.06TechnionTechnion
Digital Lab ProjectDigital Lab Project
Possible improvementsPossible improvementsPPC:PPC:
•Code optimization (assembly level)Code optimization (assembly level)•Avoid single read transactions to the same address whenever Avoid single read transactions to the same address whenever possible possible •Use burst mode and cache transactions.Use burst mode and cache transactions.
Master: Master:
•Avoid using standard Xilinx IPIF when using Master core on Avoid using standard Xilinx IPIF when using Master core on PLB (designing of embedded interface with PLB).PLB (designing of embedded interface with PLB).•Use FIFO and burst mode transactions. Use FIFO and burst mode transactions.