04/18/23 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada
2
Concepts tackled
Introduction to capabilities of TigerSHARC ADSP-TS201 processor– Warning – you have TS201S instruction
manual and TS101S hardware manual. At the moment the TS201S hardware manual is only available on the web
04/18/23 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada
3
Processor Architecture
3 128-bitdata busses
2 Integer ALU 2 Computational
Blocks– ALU (Float and integer)– SHIFTER– MULTIPLIER– COMMUNICATIONS
CLU
04/18/23 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada
4
IntegerALU
Except for NO multipliercapability,essentially“processor”unit withcapabilitiesof a 68Kor MIPS processor
Intended more asDAGData address generator,but can do integermath.
04/18/23 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada
5
X and YRegister File
X – 32 locations Y – 32 locations Holds “bit patterns”
Those bit patterns can be “floating point number” bit patterns OR “integer number” bit patterns BUT NOT BOTH AT THE SAME TIME
10% of marks lost in final and midterm will be associated with not understanding this issue. (30% of time wasted in labs too)
04/18/23 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada
6
X and YALU
Can handle floating point and integer operations by taking “bit patterns” from register file and do operations on them
Very flexible 10% of marks lost in
final and midterm will be associated with not understanding this functionality. (30% of time wasted in labs too)
04/18/23 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada
7
SHIFTER
Can handle integer operations by taking “bit patterns” from register file and do operations on them
Very flexible 10% of marks lost in
final and midterm will be associated with not understanding this functionality.
04/18/23 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada
8
MULTIPLIER
Multiplies integer and floating point “bit patterns” from register file
Very flexible 10% of marks lost in
final and midterm will be associated with not understanding this functionality. (15% of time wasted in labs too)
04/18/23 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada
9
CLU
VERY, VERY FANCY COMPLEX
ARITHMETIC (2, 8 and 16 bits)
TRELLIS, VERTIBBI etc
Capability, excellent individual projects, also Q9 on final exam (D-I-Yourself)
04/18/23 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada
10
J and KDATABUSES
VERY FANCY 32-bit accesses 64-bit accesses 128-bit accesses 256-bit loads possible (128
to 4 X registers and 128 to 4 Y registers)
Some special issues when loading QUAD values (4 at same time) that are offset handled with DAB (data address buffer?)
04/18/23 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada
11
I-BUSDATABUS
VERY FANCY VLIW – very long and variable
length instruction word 32-bit accesses done with IAB 64-bit accesses done with IAB 96-bit accesses done with IAB 128-bit accesses done with
IAB BTB (Branch target buffer)
assists with many pipeline issues
10% of marks lost in final and midterm will be associated with not understanding this functionality.
04/18/23 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada
12
Pipeline issues -- Normal instruction
04/18/23 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada
13
Terminology – C++ compiler often inserts comments about “bubbles”
Instr 2needs result frominstr1
But Instr1result notavailable till end ofpipeline soInstr 2 stalls
04/18/23 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada
14
Not entirely clearof explanationseems 1 cycleout by my model
STALL
BUBBLE
Once the STALLis broken, thena BUBBLE (virtual NOP?) isinserted into theinstruction stream
04/18/23 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada
15
Many types of coding used in this course
Compiler “debug” mode – inefficient code Compiler “release” mode – more efficient code (inter-procedural optimization,
general parallel instructions). – Use the .s output as a starting point for optimizing assembly code and for learning
about instructions and optimizing techniques Custom Assembler “SISD” “debug mode” – not highly optimized, but no
general inefficiencies. Lab1 and Lab. 2. Generally quizable. Custom Assembler “SISD” and “SIMD” “release modes” – coded in a way
that we “avoid” probable stalls, rather than completely understanding them. Lab. 2 and Lab. 3. Somewhat quizable.
Custom Assembler “SISD”, “SIMD” and “MIMD” “highly optimized mode”. Understanding the concepts, very difficult to put actual questions into a quiz (too time consuming). Probably demonstrable in final lab. Lab 4 – individual assignments. Need to know “what to worry about”
Dual processor mode – all of the above. Probably demonstrable in final lab. Lab 4 – individual assignments. Need to know “what to worry about”
04/18/23 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada
16
Predicted jump – BTB hit
BIG DELAYSNO DELAYS
04/18/23 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada
17
Predicted jump – BTB miss
BIG DELAYS
04/18/23 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada
18
Non predicted branches XY – big loss
04/18/23 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada
19
Non predicted branches JK – less loss
04/18/23 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada
20
Pipeline issues – Predicted – not taken
04/18/23 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada
21
Pipeline issues -- many
Predicted Branch not taken R1 = R2 + R3;; R0 = [J1 += J5];; R4 = R5 + R1;; Conflict on J-bus Data dependencies Sort of acts like Sort of acts like
R1 = R2 + R3;; R0 = [J1];; stall J1 = J1 + J5;; R4 =R4 + R1;;
04/18/23 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada
22
Pipeline issues -- many
R1 = R2 + R3;; R0 = [J1 += J5];; R4 = R5 + R1;; Conflict on J-bus Data dependencies Sort of acts like Sort of acts like
R1 = R2 + R3;; R0 = [J1];; stall J1 = J1 + J5;; R4 =R4 + R1;;
04/18/23 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada
23
Pipeline issues -- many
Predicted Branch not taken R1 = R2 + R3;; R0 = [J1 += J5];; R4 = R5 + R1;; Conflict on J-bus Data dependencies Sort of acts like Sort of acts like
R1 = R2 + R3;; R0 = [J1];; stall J1 = J1 + J5;; R4 =R4 + R1;;
04/18/23 TigerSHARC processor, M. Smith, ECE, University of Calgary, Canada
24
Concepts tackled
Introduction to capabilities of TigerSHARC ADSP-TS201 processor– Warning – you have TS201S instruction
manual and TS101S hardware manual. At the moment the TS201S hardware manual is only available on the web