tigersharc clu exploration of xcorrs for take-home quiz 4 biawpqhi -- 13 april – start of class

29
1 TigerSHARC CLU Exploration of XCORRS for Take- Home Quiz 4 BIAWPQHI -- 13 April – start of class M. Smith, University of Calgary, Canada [email protected]

Upload: archer

Post on 11-Jan-2016

21 views

Category:

Documents


0 download

DESCRIPTION

TigerSHARC CLU Exploration of XCORRS for Take-Home Quiz 4 BIAWPQHI -- 13 April – start of class. M. Smith, University of Calgary, Canada [email protected]. Ideal -- Take Home Quiz. Develop tests for complex correlation Time and functionality Evaluate on - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: TigerSHARC CLU Exploration of XCORRS for Take-Home Quiz 4 BIAWPQHI  -- 13 April – start of class

1

TigerSHARC CLUExploration of XCORRS for Take-Home Quiz 4BIAWPQHI -- 13 April – start of class

M. Smith,

University of Calgary, Canada

[email protected]

Page 2: TigerSHARC CLU Exploration of XCORRS for Take-Home Quiz 4 BIAWPQHI  -- 13 April – start of class

2

Ideal -- Take Home Quiz

Develop tests for complex correlation Time and functionality

Evaluate on “C++” – in default and optimized mode

(especially optimized) Your optimized complex assembly code in

complex correlation in SID and SIMD modes XCORRS in complex correlation in SID and

SIMD modes

Page 3: TigerSHARC CLU Exploration of XCORRS for Take-Home Quiz 4 BIAWPQHI  -- 13 April – start of class

3

Reasonable -- Take Home QuizCode and report Develop Functionality and Time tests for real FIR -- based on Lab. 3

Use on optimized C++ and your SISD and SIMD FIR Develop Functionality and Time tests for real correlation -- based on Lab. 3 / 4

Use on optimized C++ and your SISD and SIMD correlation Work out (theory) speed changes expected on your SISD and SIMD if went to

complex. Use as template for expected changes in optimized C++ Develop Functionality and Time tests for complex FIR

Use on optimized C++ Develop Functionality and Time tests for complex correlation

Use on optimized C++ and your SISD and SIMD XCORRS only Report on whether changes in C++ code speed work the way you expect

Use these figures to scale for FIR and correlation to complex data Report on relative speeds

“C++” – in default and optimized mode (especially optimized) Your optimized complex assembly code in complex correlation in SID and SIMD

modes XCORRS in complex correlation in SID and SIMD modes

Page 4: TigerSHARC CLU Exploration of XCORRS for Take-Home Quiz 4 BIAWPQHI  -- 13 April – start of class

4

Mark assignment

My tests and C++ are available on the web If you use my tests, then you must say so, and

10% of marks are deducted If you use my C++ code, then you must say

so, and 10% of marks are deducted If you use my C++ code and my test, then you

must say so, and 20% of marks are deducted

Page 5: TigerSHARC CLU Exploration of XCORRS for Take-Home Quiz 4 BIAWPQHI  -- 13 April – start of class

5

Speed comparison – Part 1

Real FIRfloat / int values[ ], params[ ]

Loop:sum = sum + values * params

2 memory fetches1 add and 1 mult per loop cycle – done in ½ cycle in theory

Time N / 2 + overhead

Determine overhead by measuring with and without the loop-sum

Complex FIRCMPX float / int values[ ], params[ ]

Loop: many common factors with FFT – Hint for final?

sum = sum + values * params

Real sum = v.re * p.re – v.im * p.imImag sum = v.re * p.im + v.im * p.re

8 memory fetches 3 add / sub and 4 mult per loop

Time ??? + overhead

Page 6: TigerSHARC CLU Exploration of XCORRS for Take-Home Quiz 4 BIAWPQHI  -- 13 April – start of class

6

Speed comparison – Part 2

Speed in theory without doing anything special

Any special way to store complex values to speed up memory access?

Do we need to do 8 memory fetches On the Blackfin? In the TigerSHARC?

Expected optimal speed? Time ??? + overhead

Complex FIRCMPX float / int values[ ], params[ ]

Loop: many common factors with FFT – Hint for final?

sum = sum + values * params

Real sum = v.re * p.re – v.im * p.imImag sum = v.re * p.im + v.im * p.re

8 memory fetches 3 add / sub and 4 mult per loop

Time ??? + overhead

Page 7: TigerSHARC CLU Exploration of XCORRS for Take-Home Quiz 4 BIAWPQHI  -- 13 April – start of class

7

Speed comparison – Part 3?

Do these speed calculations scale the same way for complex correlation as for complex FIR?

Do a theory calculation and then compare result for debug and optimized C++ code to validate – within 25% of predicted changes is probably more than reasonable for a back-of-envelope calculation

Use scaling factor on your real FIR and correlation functions

Page 8: TigerSHARC CLU Exploration of XCORRS for Take-Home Quiz 4 BIAWPQHI  -- 13 April – start of class

8

Tests for following functions neededWhen convert from float to int?void ConvertReal2Complex(float *, CMPX32 *, int size) Make Complex = Real + j0

bool ConvertC32_2_C8(CMPX32 * , CMPX8 *, int size) Take bottom 8 bits of complex 32 Return false if overflows Complex 8 is padded 2 complex in to 32 bits --- int in format

bool ConvertC32_2_C1(CMPX32 * , CMPX1 *, int size) Take bottom 1 bits of complex 32 Return false if overflows, or if not +-1 +-j1 format Complex 1 is padded 16 complex in to 32 bits --- int in format

void ConvertC8_2_C32(CMPX8 * , CMPX32 *, int size) needed? YESumvoid ConvertC1_2_C32(CMPX1 * , CMPX32 *, int size) needed?

Page 9: TigerSHARC CLU Exploration of XCORRS for Take-Home Quiz 4 BIAWPQHI  -- 13 April – start of class

9

Tests for following functions needed

float RealFIR(float *vals, float *params, int size, bool overhead);

CMPLX ComplexFIR(CMPLX* vals, CMPLX params, int size, bool overhead);vals in dm and params in pm

void RealCorrs(float *vals, int size1, float *params, int size2, float *result, int *size3, bool overhead);

void ComplexCorrs(CMPLX* vals, int size1, CMPLX params, int size2, CMPLX *result, int *size3, bool overhead);

void XCORRS(CMPLX* vals, int size1, CMPLX params, int size2, CMPLX *result, int *size3, bool overhead, int version);

version is 0 – works, = 1 SISD, = 2 SIMD

*

Page 10: TigerSHARC CLU Exploration of XCORRS for Take-Home Quiz 4 BIAWPQHI  -- 13 April – start of class

10

Some hints

void XCORRS(CMPLX* vals, int size1, CMPLX params, int size2, CMPLX *result, int *size3, bool overhead, version) {

bool ConvertC32_2_C8(CMPX32 * , dm CMPX8 *, int size1)

bool ConvertC32_2_C1(CMPX32 * ,pm CMPX1 *, int size2)

size3 = size1 – size2

for result = 1 to size 3

result[ ] = 0;

if (!overhead) XCORRS(dm CMPX8 *, pm CMPX1 *, dm? Result, size1, size2, size 3, whichversion

Page 11: TigerSHARC CLU Exploration of XCORRS for Take-Home Quiz 4 BIAWPQHI  -- 13 April – start of class

11

Some Hints

void ComplexCorrs(CMPLX* vals, int size1, CMPLX params, int size2, CMPLX *result, int *size3, bool overhead) {

if (overhead) return;

*size3 = size1 – size 2;

for loop to size 3

result[loop] = ComplexFIR(vals, CMPLX params, int size, bool overhead);

val++;

end loop;

}

Page 12: TigerSHARC CLU Exploration of XCORRS for Take-Home Quiz 4 BIAWPQHI  -- 13 April – start of class

12

Some decisions

Complex 32 – first decision Store real in dm space and imaginary in pm space?

Complex8 in dm space, Complex1 in pm space Doing everything with static pm variables

Using dm variables on stack, in an attempt to avoid running out of memory

Try with satellite of size 2048 and PRN data of size 1024 but suspect may not have enough room when doing with Complex 32 so may have to test on smaller for comparison I ended up generating the same data as for the

xcorrs( ) shown last Friday – size 48 = 16 * 3. Decided that if I could handle that (3 times round xcorrs loop) then far enough test

Page 13: TigerSHARC CLU Exploration of XCORRS for Take-Home Quiz 4 BIAWPQHI  -- 13 April – start of class

13

Some Tests developed 1

TEST(ConvertReal2CMPLX32, D_TEST) {TEST_LEVEL(1);

#define TEST_SIZE 8float values[TEST_SIZE] = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0};float zeros[TEST_SIZE] = {0, 0, 0, 0, 0, 0, 0, 0};

ConvertReal2Complex(values, C32Real, C32Imag, TEST_SIZE);ARRAYS_EQUAL(values, C32Real, TEST_SIZE);ARRAYS_EQUAL(zeros, C32Imag, TEST_SIZE);

}

Page 14: TigerSHARC CLU Exploration of XCORRS for Take-Home Quiz 4 BIAWPQHI  -- 13 April – start of class

14

Test for padded data – C8 format

#define TEST_SIZE 8pm float imag1 [TEST_SIZE] = {0x04, 0x14, -0x8, -0x18, 0x24, 0x34, 0x44, 0x54};float real1[TEST_SIZE] = {0x08, 0x18, -1, -2, 0x28, 0x38, 0x48, 0x58 };

TEST(ConvertToCMPLX8, D_TEST) {TEST_LEVEL(1);

#define TEST_SIZE 8unsigned int result[4] = {0x14180408, 0xE8FEF8FF, 0x34382428, 0x54584448};CHECK(!ConvertC32_2_C8(real1, imag1, DATAC8, 1));CHECK(ConvertC32_2_C8(real1, imag1, DATAC8, TEST_SIZE));

ARRAYS_EQUAL(DATAC8, result, TEST_SIZE / 2);}

Page 15: TigerSHARC CLU Exploration of XCORRS for Take-Home Quiz 4 BIAWPQHI  -- 13 April – start of class

15

Test for padded data C1 format

#define LONGER_SIZE 32pm float imag2[LONGER_SIZE] = {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ……..float real2[LONGER_SIZE] = {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ………..pm float imag4[LONGER_SIZE];float real4[LONGER_SIZE];

TEST(ConvertCMPLX1, D_TEST) {TEST_LEVEL(1);unsigned int result1[2] = {0x00000000, 0x00000000};unsigned int result2[2] = {0xFFFFFFFF, 0xFFFFFFFF};CHECK(!ConvertC32_2_C1(real1, imag1, PRNC1, 1));CHECK(!ConvertC32_2_C1(real1, imag1, PRNC1, TEST_SIZE));CHECK(!ConvertC32_2_C1(real2, imag2, PRNC1, 1));CHECK(ConvertC32_2_C1(real2, imag2, PRNC1, LONGER_SIZE));ARRAYS_EQUAL(PRNC1, result1, LONGER_SIZE / 16);for (int i = 0; i < LONGER_SIZE; i++) {

real4[i] = -1 * real2[i];imag4[i] = -1 * imag2[i];

}CHECK(ConvertC32_2_C1(real4, imag4, PRNC1, LONGER_SIZE));ARRAYS_EQUAL(PRNC1, result2, LONGER_SIZE / 16);

}

Page 16: TigerSHARC CLU Exploration of XCORRS for Take-Home Quiz 4 BIAWPQHI  -- 13 April – start of class

16

RealFIR

#define TEST_SIZE 8pm float params[TEST_SIZE] = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0};

TEST(RealFIR, D_TEST) {TEST_LEVEL(1);float impulse[TEST_SIZE];float results[TEST_SIZE];

for (int i = 0; i < TEST_SIZE; i++) {for (int j = 0; j < TEST_SIZE; j++) // Set to zero

impulse[j] = 0;impulse[i] = 1;results[i] = RealFIR(impulse, params, TEST_SIZE, false);

}ARRAYS_EQUAL(results, params, TEST_SIZE);

}

Page 17: TigerSHARC CLU Exploration of XCORRS for Take-Home Quiz 4 BIAWPQHI  -- 13 April – start of class

17

Complex FIR tests (3 of them)To see if I got both Real and Imag correct

pm float resultsI[TEST_SIZE];TEST(ComplexFIR, D_TEST) {

TEST_LEVEL(1);float impulse[TEST_SIZE];float resultsR[TEST_SIZE];float zeros[TEST_SIZE] = {0, 0, 0, 0, 0, 0, 0, 0};for (int i = 0; i < TEST_SIZE; i++) {

for (int j = 0; j < TEST_SIZE; j++) // Set to zeroimpulse[j] = 0;

impulse[i] = 1;for (int j = 0; j < TEST_SIZE; j++) {

C32Real[j] = impulse[j]; C32Imag[j] = 0;C32Real1[j] = params[j]; C32Imag1[j] = 0;

}

ComplexFIR(C32Real, C32Imag, C32Real1, C32Imag1, &resultsR[i], &resultsI[i], TEST_SIZE, false);

}ARRAYS_EQUAL(resultsR, params, TEST_SIZE);ARRAYS_EQUAL(resultsI, zeros, TEST_SIZE);

}

Page 18: TigerSHARC CLU Exploration of XCORRS for Take-Home Quiz 4 BIAWPQHI  -- 13 April – start of class

18

Real Correlation

pm float PRN32I[TEST_SIZE] = {1, -1, 1, -1, 1, 0, 0, 0};TEST(RealCorrelation, D_TEST) {

TEST_LEVEL(1);float data[TEST_SIZE * 2] = {0, 0, 0, 0, 1, -1, 1, -1, 1, 0, 0, 0, 0, 0, 0, 0 };

float result[TEST_SIZE];int Iresult[TEST_SIZE];int size3; RealCorrs(data, 2 * TEST_SIZE, PRN32I, TEST_SIZE, result,

&size3, false);CHECK(size3 == TEST_SIZE);for (int j= 0; j < TEST_SIZE; j++)

Iresult[j] = result[j];CHECK(MaximumLocation(Iresult, TEST_SIZE) == 4);

}

Page 19: TigerSHARC CLU Exploration of XCORRS for Take-Home Quiz 4 BIAWPQHI  -- 13 April – start of class

19

Complex Correlation -- Simple Test

pm float dataI[TEST_SIZE * 2] = {0, 0, 0, 0, 1.0, -1, 1, -1, 1, 0, 0, 0, 0, 0, 0, 0};pm float resI[TEST_SIZE];

TEST(ComplexCorrelation, D_TEST) {TEST_LEVEL(1)float dataR[TEST_SIZE * 2] = {0.0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };float resR[TEST_SIZE];int Iresult[TEST_SIZE];float parR[TEST_SIZE] = {0, 0, 0, 0, 0, 0, 0, 0 };int size3;

ComplexCorrs(dataR, dataI, TEST_SIZE * 2, parR, PRN32I, TEST_SIZE, resR, resI, &size3, false);

CHECK(size3 == TEST_SIZE);for (int j= 0; j < TEST_SIZE; j++) {

Iresult[j] = abs(resR[j]);}CHECK(MaximumLocation(Iresult, TEST_SIZE) == 4);

}

Page 20: TigerSHARC CLU Exploration of XCORRS for Take-Home Quiz 4 BIAWPQHI  -- 13 April – start of class

20

Complex Correlation– related to results from last lecture

for (int i = 0; i < 96; i += 3) {satXCORRSR[i] = -1; satXCORRSR[i+1] = 1; satXCORRSR[i+2] = 1;satXCORRSI[i] = 0; satXCORRSI[i+1] = 0; satXCORRSI[i+2] = 0;

}for (int i = 0; i < 48; i += 3) {

prnXCORRSR[i] = -1; prnXCORRSR[i+1] = 1;prnXCORRSR[i+2] = 1; prnXCORRSI[i] = -1;prnXCORRSI[i+1] = 1; prnXCORRSI[i+2] = 1;

}ComplexCorrs(satXCORRSR, satXCORRSI, 96, prnXCORRSR, prnXCORRSI,

48, resXCORRSR, resXCORRSI, &size3, false);

CHECK(size3 == 48);for (int j= 0; j < 48; j++) { Iresult[j] = abs(resXCORRSR[j]); }for (int j = 1; j < 45; j += 3) {

CHECK(resXCORRSR[j-1] == 48);CHECK(resXCORRSR[j] == -16);CHECK(resXCORRSR[j+1] == -16);CHECK(MaximumLocation(Iresult + j, 48 - j) == 2);

}

Page 21: TigerSHARC CLU Exploration of XCORRS for Take-Home Quiz 4 BIAWPQHI  -- 13 April – start of class

21

Complex Correlation ASM– related to results from last lecture

for (int i = 0; i < 96; i += 3) {satXCORRSR[i] = -1; satXCORRSR[i+1] = 1; satXCORRSR[i+2] = 1;satXCORRSI[i] = 0; satXCORRSI[i+1] = 0; satXCORRSI[i+2] = 0;

}for (int i = 0; i < 48; i += 3) {

prnXCORRSR[i] = -1; prnXCORRSR[i+1] = 1;prnXCORRSR[i+2] = 1; prnXCORRSI[i] = -1;prnXCORRSI[i+1] = 1; prnXCORRSI[i+2] = 1;

} ComplexCorrsASM(satXCORRSR, satXCORRSI, 96, prnXCORRSR, prnXCORRSI, 48, resXCORRSR, resXCORRSI, &size3, false);

CHECK(size3 == 48);for (int j= 0; j < 48; j++) { Iresult[j] = abs(resXCORRSR[j]); }for (int j = 1; j < 45; j += 3) {

CHECK(resXCORRSR[j-1] == 48);CHECK(resXCORRSR[j] == -16);CHECK(resXCORRSR[j+1] == -16);CHECK(MaximumLocation(Iresult + j, 48 - j) == 2);

}

Page 22: TigerSHARC CLU Exploration of XCORRS for Take-Home Quiz 4 BIAWPQHI  -- 13 April – start of class

22

bool ConvertC32_2_C8(float *inR, pm float *inI, unsigned int *C8, int size) { float *holdR = inR; pm float *holdI = inI; for (int i = 0; i < size; i++) { if ((*inR > 127) || (*inR < -128)) return false;

if ((*inI > 127) || (*inI < -128)) return false; inR++; inI++;

}// Not going to bother with things that don't fit

if (size & 1) return false;

inR = holdR; inI = holdI; for (int half = 0; half < size; half +=2) { unsigned int first = ( (int) *inR++) & 0xFF; unsigned int second = ( (int) *inI++) & 0xFF; unsigned int third = ( (int) *inR++) & 0xFF; unsigned int fourth = ( (int) *inI++) & 0xFF; *C8++ = ((((((fourth << 8) + third) << 8) + second) << 8) + first) ; } return true;}

Page 23: TigerSHARC CLU Exploration of XCORRS for Take-Home Quiz 4 BIAWPQHI  -- 13 April – start of class

23

float UINT8ToFloat(unsigned int value) {if (value & 0x80) { value = value | 0xFFFFFF00;

return ( (int) value);}else return value;

}

void ConvertC8_2_C32(unsigned int *C8, float *inR, pm float *inI, int size) { for (int i = 0; i < size; i +=2) { unsigned int value = *C8++; *inR++ = UINT8ToFloat(value & 0xFF); value >>= 8; *inI++ = UINT8ToFloat(value & 0xFF); value >>= 8; *inR++ = UINT8ToFloat(value & 0xFF); value >>= 8; *inI++ = UINT8ToFloat(value & 0xFF); }}

C8 C32 and C16 C32

Page 24: TigerSHARC CLU Exploration of XCORRS for Take-Home Quiz 4 BIAWPQHI  -- 13 April – start of class

24

FIR filtersfloat RealFIR(float *values, pm float *params, int size, bool overhead) {

if (overhead) return 0.0;float sum = 0;for (int i = 0; i < size; i++) sum += *values++ * *params++;return sum;

}

pm float sumI = 0;void ComplexFIR(float *valR, pm float *valI, float *parR, pm float *parI,

float *resultR, pm float* resultI, int size, bool overhead) {

if (overhead) { *resultR = *resultI = 0; return;}float sumR = 0; sumI = 0; // Was a static

variable for (int i = 0; i < size; i++) {

sumR += *valR * *parR - *valI * *parI;sumI += *valR * *parI + *valI * *parR;valR++; valI++; parR++; parI++;

}*resultR = sumR;*resultI = sumI;return;

}

Page 25: TigerSHARC CLU Exploration of XCORRS for Take-Home Quiz 4 BIAWPQHI  -- 13 April – start of class

25

Correlation

void RealCorrs(float *vals, int size1, pm float *params, int size2, float *result, int *size3, bool overhead) {

if (overhead) return;*size3 = size1 - size2;for (int j = 0; j < size2; j++)

*result++ = RealFIR(vals++, params, size2, overhead);}

void ComplexCorrs(float* valR, pm float* valI, int size1, float* parR, pm float* parI, int size2, float* resR, pm float* resI, int *size3, bool overhead) { if (overhead) return;

*size3 = size1 - size2;

for (int j = 0; j < size2; j++) ComplexFIR(valR++, valI++, parR, parI, &resR[j], &resI[j], size2, false);

}

Page 26: TigerSHARC CLU Exploration of XCORRS for Take-Home Quiz 4 BIAWPQHI  -- 13 April – start of class

26

Correlation XCORRS

extern "C" void xcorrsfunc(unsigned int *C8, pm unsigned int *C1, unsigned int *C16, int size);

void ComplexXCORRS(float* valR, pm float* valI, int size1, float* parR, pm float* parI, int size2, float* resR, pm float* resI, int *size3, bool overhead) {

ConvertC32_2_C8(valR, valI, DATAC8, size1);*PRNC1 = 0x0; // Need to shift hte PPRN to location C15ConvertC32_2_C1(parR, parI, PRNC1 + 1, size2); *size3 = size1 - size2;if (!overhead) xcorrsfunc(DATAC8, PRNC1, RESULTC16, *size3);ConvertC16_2_C32(RESULTC16, resR, resI, *size3);

}

Page 27: TigerSHARC CLU Exploration of XCORRS for Take-Home Quiz 4 BIAWPQHI  -- 13 April – start of class

27

XCORRS – same code as beforeexcept – need to transfer results out

// Shift out the values in TR registers into resultsxR3:0 = TR3:0;;Q[J6 += 4] = xR3:0;;xR3:0 = TR7:4;;Q[J6 += 4] = xR3:0;;xR3:0 = TR11:8;;Q[J6 += 4] = xR3:0;;xR3:0 = TR15:12;;Q[J6 += 4] = xR3:0;;IF NLC0E, JUMP OUTERLOOP;;

Page 28: TigerSHARC CLU Exploration of XCORRS for Take-Home Quiz 4 BIAWPQHI  -- 13 April – start of class

28

Need to get inpars and go round more than 16 times

J0 = zeros;; // Clear the THR registers the hard wayR3:0 = Q[J0 += 4];; THR3:0 = R3:0;; R7:4 = R3:0;;// K0 = prn;;

J2 = J4;; // satellite_data;;

LC0 = 3;;OUTERLOOP:

K0 = J5;;J2 = J4;;J4 = J4 + 8; // Increment by 8 and not 16

REST OF CODE UNCHANGED// Load THR with PRN codeR1:0 = L[K0 += 2];; THR1:0 = R1:0;;R1:0 = L[K0 += 2];; THR3:2 = R1:0;;

Page 29: TigerSHARC CLU Exploration of XCORRS for Take-Home Quiz 4 BIAWPQHI  -- 13 April – start of class

29

Test results