procstar iii performance charactarization

25
PROCStar III Performance Charactarizaon Final Presentaon Instructor : Ina Rivkin Performed by: Idan Steinberg Evgeni Riaboy Semestrial Project Winter 2010

Upload: zorita-battle

Post on 03-Jan-2016

42 views

Category:

Documents


5 download

DESCRIPTION

Final Presentation. PROCStar III Performance Charactarization. Instructor : Ina Rivkin Performed by : Idan Steinberg Evgeni Riaboy Semestrial Project Winter 2010. Project Overview. - PowerPoint PPT Presentation

TRANSCRIPT

Slide 1

PROCStar III Performance CharactarizationFinal Presentation

Instructor : Ina Rivkin Performed by: Idan Steinberg Evgeni Riaboy

Semestrial ProjectWinter 2010Project OverviewWith the introduction of a new FPGA based board, we have to devise a series of tests to examine the devices max practical performance, allowing the students that use these boards for future projects, to plan optimal design based on the concluded tested performance.All the tests are intended to determine maximal frequency that ensure correct results (this is why we check all the data for correctness).Gidel PROCstar III Stratix III 260E

4 Altera Stratix III 260E FPGAs, with 256 MB on chip memory 8 Lane PCIe host interface 8 DDR2 Banks, with 2*2GB on first FPGA, and 1*2GB on the other FPGAs ~2MB FPGA Internal RAM 255K Logic Elements (per FPGA)

The PROCStar III Processing Unit

Project GoalsTesting Procstar III board for:Maximum frequency of reading/writing between the FPGA and Memory BanksMaximum communication speed between FPGAs, both on their adjacent connection and on their shared BUS.Highest possible performance of the internal logic in Add, Subtract, Multiply, Divide and sqrt configurations.

Test 1: External Memories Transfer Rate:In stage 1, As a preparation for the test, We have written constant (X1E) to the memory in FIFO configuration from the PCIe.In Stage 2, we read the data from the FPGA, comparing it to the written data, in purpose to determine the max frequency between the memory bank and the FPGA.

Memory BankProcstar IIIFPGATest 1: Results:We ran the test in the full spectrum of frequencies (0 400 MHz), and were unable to find a failing frequency for the memory access operation. After testing the system in Signaltap, we concluded that the memory access is always 16 cycles long, no matter what the frequency is.

frequencyRead latency400 MHz16 Cycles7Project PlanTests 2,3: FPGA Communication:We have tested the communication between the FPGAs, both on their adjacent connection and their shared BUS.

Test 2: FPGA Communication:We have built a state machine that creates 4 different outputs on FPGA2, transferred the data to FPGA1 on their direct connection, and checked the data correctness (running at increasing frequencies on the data channel).Procstar IIIFPGA 1FPGA 2Test 2: Results:The FSM both on FPGA1 and FPGA2:

10Test 2: Results:For data 00,01,10,11 (lsb), we found the failure frequency to be 247 Mhz.For data 00.. and 11.. , which is the worst case because all bits change between transfers, we found the failure frequency to be 65 Mhz.For a vector 10101010100 and the 000 vector, we found the failure frequency to be 133 Mhz.

11Test 2: Results:Data length: 100 bitDataFrequency0000000100..1000..11247 MHz00..00111100..00111165 MHz1010101010000000010101010100000000133 MHz12Test 3: FPGA Communication:We have built a state machine that creates 4 different outputs on FPGA2, transferred the data to FPGA1,2 and 4 on their BUS, and checked the data correctness (running at increasing frequencies on the data channel).Procstar IIIFPGA 1FPGA 2FPGA 3FPGA 4BUSTest 3: Results:For data 00,01,10,11 (lsb), we found the failure frequency to be 89 MHz for FPGA4, 97 MHz for FPGA1, 101 MHz for FPGA3.For data 00.. and 11.. , which is the worst case because all bits change between transfers, we found the failure frequency to be 72 MHz for FPGA4, 73Mhz for FPGA1, 76 MHz for FPGA3.For a vector 10101010100 and the 000 vector, we found the failure frequency to be 87 MHz for FPGA4, 98 MHz for FPGA1, 96 MHz for FPGA3.

14Test 3: Results:DataFrequency0000000100..1000..11 MHz97 FPGA1: MHz101 FPGA3: MHz89 FPGA4: 00..00111100..001111 MHz73 FPGA1: MHz76 FPGA3: MHz72 FPGA4: 1010101010000000010101010100000000 MHz98 FPGA1: MHz96 FPGA3: MHz97 FPGA4: Data length: 37 bit

15Test 4: Internal Functions Testing:We have built 2 slice units, for the fixed point:One with alu1, including dsp unit

And one with alu2, without dsp unit

FSMData errorper operation18 bit18 bit per operation18 bit per operation= ???ALU1Test doneper operation

FSMData errorper operation18 bit18 bit per operation18 bit per operation= ???ALU2Test doneper operation

Test 4: Internal Functions Testing:We have built another slice unit in the floating point configuration to check that the comparator doesnt affect the performance:

FSMData errorper operation18 bit18 bit per operation18 bit per operationTest doneper operation

= ???ALU= ???= ???Comparator testing (from fsm)Comparator testing (from alu)Test 4: Internal Functions Testing:floating point Alu implementation:all the units are fromAltera arithmetic library

ALTFP_ADDFrom fsmconstantconstantconstantconstantALTFP_SUBALTFP_MULTALTFP_DIVALTFP_SQRTTest 4: Internal Functions Testing:fixed point Alu implementation:LPM_DIVIDE, ALTSQRT and half_dsp_blockunits are fromAltera arithmetic libraryaddsubLPM_DIVIDEALTSQRTFrom fsmconstantconstantxx+xx+xFFconstanthalf_dsp_blockconstantMul_leTest 4: Internal Functions Testing:Fixed point: Frequency at which one of the units fails for each configuration:

Number of units6120328add400MHz192MHz92MHzsub400MHz192MHz92MHzmul_le372MHz192MHz92MHzdiv400MHz192MHz92MHzsqrt400MHz192MHz92MHzNumber of units360128mul_dsp400MHz215MHz92MHzlogic util7%23%56%dsp util3%52%100%pipe001020Test 4: Internal Functions Testing:Floating point: Frequency at which one of the units fails for each configuration:

Number of units32038add400MHz400MHzMHz168Sub400MHz170MHzMHz160Mul400MHz330MHzMHz168div400MHz310MHzMHz168sqrt400MHz170MHzMHz160logic util8%34%61%dsp util8%52%99%pipe1414111428ConclusionsOur goal was to determine the maximal performance of the Procstar III 260 board with different operations performed on and between the FPGAs.The most significant conclusion we have found, is that the results are completely temperature dependant, and differ extremely when the system gets hot.The edge point for normal performance is at 62C, when after this temperature performance degrade extremely.ConclusionsThe memory testing we performed, led us to the conclusion that the memory access can be performed at any frequency that the system supports (the read access from the memory is always 16 cycles long).The communication testing between FPGAs, both on adjacent channels, and the bus between them all, led us to the conclusion ConclusionsThat the results are completely data dependant, as can be seen from the results.The mathematical operations testing, was the most complicated one. We have tried many system configurations, until chose to implement the test using a unit duplicated several times. In order to make sure that our testing is performed correctly, and the comparator unit does not affect the results, we have tested it in several ways. ConclusionsFirst, we have built a design containing solely the comparator, duplicated 200 times, and made sure that it never fails at any frequency.Then, to be sure that the physical implementation of the design does not affect comparison, we have created another comparator in each slice unit, and connected same inputs in comparison (both for FSM and ALU units outputs). The result was that this units never failed at any frequency, showing us that at each design, the comparator does not affect the results.