squish-dsp application of a project management tool to manage low-level dsp processor resources
DESCRIPTION
Squish-DSP Application of a Project Management Tool to manage low-level DSP processor resources. M. Smith, University of Calgary, Canada smithmr @ ucalgary.ca. Series of Talks and Workshops. CACHE-DSP – Talk on a simple process tool to identify cache conflicts in DSP code. - PowerPoint PPT PresentationTRANSCRIPT
Squish-DSPApplication of a Project Management
Toolto manage
low-level DSP processor resources
M. Smith, University of Calgary, Canada
smithmr @ ucalgary.ca
Squish-DSP Tool [email protected]
2/28
Series of Talks and Workshops
CACHE-DSP – Talk on a simple process tool to identify cache conflicts in DSP code.SQUISH-DSP – Talk on using a project management tool to automate identification of parallel DSP processor instructions .SHARC Ecology 101 – Workshop showing how to systematically write parallel 2106X code.SHARC Ecology 201 – Workshop on SQUISH-DSP and CACHE-DSP tools.
Squish-DSP Tool [email protected]
3/28
Scope of Talk
Overview of hand optimization of codeParadigm shift in microprocessor resource scheduling
Project Management Tool Application
Translating ‘microprocessor’ language into a ‘business’ formatExamples and limitations
Better optimization from VisualDSP code
Future directions
Squish-DSP Tool [email protected]
4/28
Standard “C” code
void Convert(float *temperature, int N) {int count;
for (count = 0; count < N; count++) {*temperature = (*temperature) * 9 / 5
+ 32;temperature++
}
Squish-DSP Tool [email protected]
5/28
2106X-style load/store “C” code
void Convert(register float *temperature, register int N) {register int count;register float *pt = temperature; // Ireg <- Dregregister float scratch;
for (count = 0; count < N; count++) {scratch = *pt;scratch = scratch * (9 / 5);scratch = scratch + 32; // Order of Ops*pt = scratch;pt++;
}
Squish-DSP Tool [email protected]
6/28
Check on required register use
#define count scratchR1#define pt scratchDMpt#define scratchF2 F2
LCNTR = INPAR2, DO LOOP_END UNTIL LDE:scratchF2 = dm(pt, zeroDM);
Any special requirements here on F2?? // INPAR1 (R4) is dead -- can reuse
#define constantF4 F4 // Must be floatconstantF4 = 1.8;scratchF2 = scratchF2 * constantF4
Fn = F(0,1,2 or 3) * F(4,5,6 or 7),#define F0_32 F0 // Must be float
F0_32 = 32.0;scratchF2 = scratchF2 + F0_32;
Fm = F(8, 9, 10 or 11) + F(12, 13, 14 or 15) LOOP_END: dm(pt, plus1DM) = scratchF2;
Squish-DSP Tool [email protected]
7/28
Resource Chart -- Basic code
ADDER MULTIPLIER DM ACCESS PM ACCESS
_Convert: pt = INPAR1; F12_32 = 32.0 // bring constants outside the loop F4_1_8 = 1.8 LCNTR = INPAR2, DO LOOP_END UNTIL LCE; F2 = dm(pt, ZERODM) F8 = F2 * F4_1_8 F2 = F8 + F12_32 LOOP_END: dm(pt, PLUS1DM) = F2 5 magic lines of “C” Time = 4 + N * 4 + 5 + 5 to do the call
Squish-DSP Tool [email protected]
8/28
Unroll the loop -- 5 times here
ADDER MULTIPLIER DM ACCESSF2 = dm(pt, ZERODM) R1
F8 = F2 * F4_1_8 M1 F2 = F8 + F12_32 A1
dm(pt, PLUS1DM) = F2 W1F2 = dm(pt, ZERODM) R2
F8 = F2 * F4_1_8 M2 F2 = F8 + F12_32 A2
dm(pt, PLUS1DM) = F2 W2F2 = dm(pt, ZERODM) R3
F8 = F2 * F4_1_8 M3 F2 = F8 + F12_32 A3
dm(pt, PLUS1DM) = F2 W3F2 = dm(pt, ZERODM) R4
F8 = F2 * F4_1_8 M4 F2 = F8 + F12_32 A4
dm(pt, PLUS1DM) = F2 W4F2 = dm(pt, ZERODM) R5
F8 = F2 * F4_1_8 M5 F2 = F8 + F12_32 A5
dm(pt, PLUS1DM) = F2 W5
Squish-DSP Tool [email protected]
9/28
Parallelism causes Register/Resource Conflicts
ADDER MULTIPLIER DM ACCESSF2 = dm(pt, ZERODM) Decode(Mem)
Writeback(F2)F8 = F2 * F4_1_8 F2 = Decode(F2,F4)
Writeback(F8) F2 = F8 + F12_32 F8 = F2 = Decode(F8,F4)
Writeback(F2) F2 = F8 = NO dm(pt, PLUS1DM) = F2 Decode(F2)
Writeback(Mem)
NO F2 = dm(pt, ZERODM) Decode(Mem)Writeback(F2)
F8 = F2 * F4_1_8 Decode(F2,F4)Writeback(F8)
F2 = F8 + F12_32 Decode(F8,F4)Writeback(F2)
dm(pt, PLUS1DM) = F2 Decode(F2)Writeback(Mem)
SRC
SRC
SRC
SRC
SRC
SRC
SRC
SRC
DEST
DEST
DEST
DEST
DEST
DEST
DEST
DEST
Squish-DSP Tool [email protected]
10/28
c
Unroll the loop a bit more
ADDER MULTIPLIER DM ACCESSF2 = dm(pt, ZERODM) R1
F8 = F2 * F4_1_8 F2 = dm(pt, ZERODM) M1, R2 F9 = F8 + F12_32 F8 = F2 * F4_1_8 A1, M2 F9 = F8 + F12_32 dm(pt, PLUS1DM) = F9 W1, A2
dm(pt, PLUS1DM) = F9 W2F2 = dm(pt, ZERODM) R3
F8 = F2 * F4_1_8 F2 = dm(pt, ZERODM) M3, R4 F9 = F8 + F12_32 F8 = F2 * F4_1_8 F2 = dm(pt, ZERODM) A3, M4, R5 F9 = F8 + F12_32 F8 = F2 * F4_1_8 dm(pt, PLUS1DM) = F9 W3, A4, M5F9 = F8 + F12_32 dm(pt, PLUS1DM) = F9 W4, A5
dm(pt, PLUS1DM) = F9 W5F2 = dm(pt, ZERODM) R6
F8 = F2 * F4_1_8 F2 = dm(pt, ZERODM) M6, R7 F9 = F8 + F12_32 F8 = F2 * F4_1_8 F2 = dm(pt, ZERODM) A6, M7, R8 F9 = F8 + F12_32 F8 = F2 * F4_1_8 dm(pt, PLUS1DM) = F9 W6 A7, M8F9 = F8 + F12_32 dm(pt, PLUS1DM) = F9 W7, A8
dm(pt, PLUS1DM) = F9 W9
Squish-DSP Tool [email protected]
11/28
Final code version
ADDER MULTIPLIER DM ACCESS_Convert: Modify(CTOPofSTACK, -1); dm(FP, -2) = R9; pt = INPAR1; F12_32 = 32.0 // bring constants outside the loop F4_1_8 = 1.8
F2 = dm(pt, ZERODM) R1F8 = F2 * F4_1_8 F2 = dm(pt, ZERODM) M1, R2
F9 = F8 + F12_32 F8 = F2 * F4_1_8 A1, M2 F9 = F8 + F12_32 dm(pt, PLUS1DM) = F9 W1, A2
dm(pt, PLUS1DM) = F9 W2 LCNTR = (N-2)/3, DO LOOP_END UNTIL LCE;
F2 = dm(pt, ZERODM) R3F8 = F2 * F4_1_8 F2 = dm(pt, ZERODM) M3, R4
F9 = F8 + F12_32 F8 = F2 * F4_1_8 F2 = dm(pt, ZERODM) A3, M4, R5 F9 = F8 + F12_32 F8 = F2 * F4_1_8 dm(pt, PLUS1DM) = F9 W3, A4, M5F9 = F8 + F12_32 dm(pt, PLUS1DM) = F9 W4, A5
LOOP_END: dm(pt, PLUS1DM) = F9 W5 R9 = dm(FP, -2); 5 magic lines of C
Squish-DSP Tool [email protected]
12/28
Real Life is not made up of ‘short loops’
Probably using DSP-intelligent compiler as a starting pointLonger loops -- more tasks to make parallelMany different opportunities for task orderingComplicated resource management and register dependency issuesNeed a tool to help get the product ‘out the door’
Squish-DSP Tool [email protected]
13/28
Business Management ToolOne evening went looking for a ‘tree’ program to manage the scheduling of microprocessor resources.
In frustration, decided to take the 2106X tasks and put them into Microsoft Project.
By mistake, found that I had developed a very useful microprocessor management tool, especially with the MS Project GUI!Question -- how to get it to function in a systematic manner?
Squish-DSP Tool [email protected]
14/28
MS Project -- 21XXX processor
Requires a paradigm shiftBusiness project concept -- One person can’t be doing two tasks in the same time slot.
Becomes one data bus can’t be transferring two data items at same time
Handled by identifying the ‘processor resources’ needed to complete each ‘basic task’.
Squish-DSP Tool [email protected]
15/28
MS Project -- 21XXX processor
Business project concept.If you delay building a wall (Task A), then you must delay painting it (Task B) HOWEVER
If you build the wall earlier, you could paint it earlier, but you don’t have to.Might make more sense to delay Task B so that Task C can be done earlier
since doing Task C allows Task D to be completed in parallel with Task Bso that the whole project is finished earlier.
Squish-DSP Tool [email protected]
16/28
Simple Example 1) F6 = dm(I4, M4);10) F1 = F2 * F4, F8 = F8 + F12, F12 = pm(I12, M12);16) F5 = F3 * F6, F8 = F8 + F12, F12 = pm(I12,
M12);
Might be able to move Task 1 in parallel with any instruction 2 through 15 BUT not in parallel with 16If Task 10 moves earlier, so can Task 16, BUT not before Task 10In Task 10 ‘F12=….’ can be made parallel with ‘F6=….’, BUT Task 10 ‘F8=….’ can’t!
Squish-DSP Tool [email protected]
17/28
SquishDSP -- parser 1) F6 = dm(I4, M4);10) F1 = F2 * F4, F8 = F8 + F12, F12 = pm(I12, M12);16) F5 = F3 * F6, F8 = F8 + F12, F12 = pm(I12, M12);
Task 16 split into 3 atomic tasksF12 = pm(I12, M12) -- PMBUS resource, must come after ‘F12=…’ from Task 10, and after ‘F8=…’ in current TaskF8 = F8 + F12 -- ALU resource, must come after ‘F8=…’ and ‘F12=…’ from Task 10F5 = F3 * F6 -- MULTIPLIER resource, must come after ‘F6=…’ from Task 1
Squish-DSP Tool [email protected]
18/28
Preparation for Microsoft Project
.asm Code broken up into sub-tasks with intra and inter dependencies recognizedReformatted as Microsoft Project Text fileRescheduled within Microsoft Project, either automatically or using GUI interfaceReformatted as .asm code with increased parallelism
Squish-DSP Tool [email protected]
19/28
Example GUI screen captureINSTR.BrokenintoATOMICTASKS
ATOMIC TASKS showing RESOURCE and DEPENDENCIES
ATOMIC TASKS with RESOURCE CONFLICTS
Squish-DSP Tool [email protected]
26/28
Advantages and Limitations
Current version intended to handle the inner critical loop of algorithmNot handling ‘Cache’ conflicts Not optimized for instructions in delay slots in jumps and conditional jumpsNot optimized for multiple DAG delays
e.g. I4 = …. ; DM(I4, M2) = ; I5 =…
Moving to ‘task profile management’ macros with Primavera PV3 Tool
Squish-DSP Tool [email protected]
27/28
Conclusion
SquishDSP is a prototype scheduling tool to identify and reschedule microprocessor resource operations in parallelAlready useful in current form for ‘inner DSP loops’Microsoft Project used for concept work but Primavera PV3 tool offers more long term promise
Squish-DSP Tool [email protected]
28/28
Acknowledgements
Financial support of Natural Sciences and Engineering Research Council (NSERC) of Canada and University of CalgaryFinancial support from Analog Devices. Dr. Mike Smith is ADI University Professor 2001/2002Future financial support from Alberta Provincial Government through Alberta Software Engineering Research Consortium (ASERC)