chapter 2 tms320c6000 architectural overview dsp lecture 02

121
Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

Upload: octavia-cannon

Post on 28-Dec-2015

256 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

Chapter 2TMS320C6000 Architectural Overview

DSP Lecture 02

Page 2: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

2

• Describe C6000 CPU architecture.• Introduce some basic instructions.• Describe the C6000 memory map.• Provide an overview of the peripherals.• Performance• Software Development

Learning Objectives

Page 3: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

3

General DSP System Block Diagram

PPEERRIIPPHHEERRAALLSS

Central Central

ProcessingProcessing

UnitUnit

Internal MemoryInternal Memory

Internal BusesInternal Buses

ExternalExternal

MemoryMemory

Page 4: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

4

Implementation of Sum of Products (SOP)

It has been shown in It has been shown in Chapter 1 that SOP is the Chapter 1 that SOP is the key element for most DSP key element for most DSP algorithms.algorithms.

So let’s write the code for So let’s write the code for this algorithm and at the this algorithm and at the same time discover the same time discover the C6000 architecture.C6000 architecture.

Two basicTwo basic

operations are requiredoperations are required

for this algorithm.for this algorithm.

(1) (1) MultiplicationMultiplication

(2) (2) Addition Addition

Therefore two basicTherefore two basic

instructions are requiredinstructions are required

Y =Y =NN

aann xxnnn = 1n = 1

**

= a= a11 ** x x1 1 + + aa2 2 * * xx22 ++... ... ++ a aN N ** xxNN

Page 5: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

5

Two basicTwo basic

operations are requiredoperations are required

for this algorithm.for this algorithm.

(1) (1) MultiplicationMultiplication

(2) (2) Addition Addition

Therefore two basicTherefore two basic

instructions are requiredinstructions are required

Implementation of Sum of Products (SOP)

Y =Y =NN

aann xxnnn = 1n = 1

**So let’s implement the SOP So let’s implement the SOP algorithm!algorithm!

The implementation in this The implementation in this module will be done in module will be done in assembly.assembly.

= a= a11 ** x x1 1 + + aa2 2 ** xx22 ++... ... ++ a aN N ** xxNN

Page 6: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

6

Multiply (MPY)

The The multiplicationmultiplication of of aa11 by x by x1 1 is done in is done in assembly by the following instruction:assembly by the following instruction:

MPYMPY a1, x1, Ya1, x1, Y

This instruction is performed by a This instruction is performed by a multiplier unit that is called “.M”multiplier unit that is called “.M”

Y =Y =NN

aann xxnnn = 1n = 1

**

= a= a11 ** x x1 1 + + aa2 2 ** xx22 ++... ... ++ a aN N ** xxNN

Page 7: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

7

Multiply (.M unit)

.M.M.M.M

Y =Y =4040

aann x xnnn = 1n = 1

**

The . M unit performs multiplications in The . M unit performs multiplications in hardware hardware

MPYMPY .M.M a1, x1, Ya1, x1, Y

Note: 16-bit by 16-bit multiplier provides a 32-bit result.Note: 16-bit by 16-bit multiplier provides a 32-bit result.

32-bit by 32-bit multiplier provides a 64-bit result.32-bit by 32-bit multiplier provides a 64-bit result.

Page 8: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

8

Addition (.?)

.M.M.M.M

.?.?.?.?

Y =Y =4040

aann x xnnn = 1n = 1

**

MPYMPY .M.M a1, x1, proda1, x1, prod

ADDADD .?.? Y, prod, YY, prod, Y

Page 9: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

9

Add (.L unit)

.M.M.M.M

.L.L.L.L

Y =Y =4040

aann x xnnn = 1n = 1

**

MPYMPY .M.M a1, x1, proda1, x1, prod

ADDADD .L.L Y, prod, YY, prod, Y

RISC processors such as the C6000 use registers to RISC processors such as the C6000 use registers to hold the operands, so lets change this code.hold the operands, so lets change this code.

Page 10: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

10

Register File - A

Y =Y =4040

aann x xnnn = 1n = 1

**

MPYMPY .M.M a1, x1, proda1, x1, prod

ADDADD .L.L Y, prod, YY, prod, Y

Let us correct this by replacing a, x, prod and Y by the Let us correct this by replacing a, x, prod and Y by the registers as shown above.registers as shown above.

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

a1a1x1x1

prodprod

32-bits32-bits

YY

Page 11: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

11

Specifying Register Names

Y =Y =4040

aann x xnnn = 1n = 1

**

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

The registers A0, A1, A3 and A4 contain the values to The registers A0, A1, A3 and A4 contain the values to be used by the instructions.be used by the instructions.

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

a1a1x1x1

prodprod

32-bits32-bits

YY

Page 12: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

12

Specifying Register Names

Y =Y =4040

aann x xnnn = 1n = 1

**

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

Register File A contains 16 registers (A0 -A15) which Register File A contains 16 registers (A0 -A15) which are 32-bits wide.are 32-bits wide.

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

a1a1x1x1

prodprod

32-bits32-bits

YY

Page 13: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

13

Data loading

Q: How do we load the Q: How do we load the operands into the registers?operands into the registers?

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

a1a1x1x1

prodprod

32-bits32-bits

YY

Page 14: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

14

Load Unit “.D”

A: The operands are loaded A: The operands are loaded into the registers by loading into the registers by loading them from the memory them from the memory using the .D unit.using the .D unit.

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

a1a1x1x1

prodprod

32-bits32-bits

YY

.D.D.D.D

Data MemoryData Memory

Q: How do we load the Q: How do we load the operands into the registers?operands into the registers?

Page 15: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

15

Load Unit “.D”

It is worth noting at this It is worth noting at this stage that the only way to stage that the only way to access memory is through the access memory is through the .D unit..D unit.

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

a1a1x1x1

prodprod

32-bits32-bits

YY

.D.D.D.D

Data MemoryData Memory

Page 16: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

16

Load Instruction

Q: Which instruction(s) can be Q: Which instruction(s) can be used for loading operands used for loading operands from the memory to the from the memory to the registers?registers?

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

a1a1x1x1

prodprod

32-bits32-bits

YY

.D.D.D.D

Data MemoryData Memory

Page 17: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

17

Load Instructions (LDB, LDH,LDW,LDDW)

A: The load instructions.A: The load instructions..M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

a1a1x1x1

prodprod

32-bits32-bits

YY

.D.D.D.D

Data MemoryData Memory

Q: Which instruction(s) can be Q: Which instruction(s) can be used for loading operands used for loading operands from the memory to the from the memory to the registers?registers?

Page 18: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

18

Using the Load Instructions

0000000000000000

0000000200000002

0000000400000004

0000000600000006

0000000800000008

DataData

16-bits16-bits

Before using the load unit you Before using the load unit you have to be aware that this have to be aware that this processor is byte addressable, processor is byte addressable, which means that each byte is which means that each byte is represented by a unique represented by a unique address.address.

Also the addresses are 32-bit Also the addresses are 32-bit wide.wide.

addressaddress

FFFFFFFFFFFFFFFF

Page 19: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

19

The syntax for the load The syntax for the load instruction is:instruction is:

Where:Where:

Rn is a register that contains Rn is a register that contains the address of the operand to the address of the operand to be loaded be loaded

andand

Rm is the destination register.Rm is the destination register.

Using the Load Instructions

0000000000000000

0000000200000002

0000000400000004

0000000600000006

0000000800000008

DataData

a1a1x1x1

prodprod

16-bits16-bits

YY

addressaddress

FFFFFFFFFFFFFFFF

LD *Rn,RmLD *Rn,Rm

Page 20: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

20

The syntax for the load The syntax for the load instruction is:instruction is:

The question now is how many The question now is how many bytes are going to be loaded bytes are going to be loaded into the destination register?into the destination register?

Using the Load Instructions

0000000000000000

0000000200000002

0000000400000004

0000000600000006

0000000800000008

DataData

a1a1x1x1

prodprod

16-bits16-bits

YY

addressaddress

FFFFFFFFFFFFFFFF

LD *Rn,RmLD *Rn,Rm

Page 21: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

21

The syntax for the load The syntax for the load instruction is:instruction is:

LD *Rn,RmLD *Rn,Rm

Using the Load Instructions

0000000000000000

0000000200000002

0000000400000004

0000000600000006

0000000800000008

DataData

a1a1x1x1

prodprod

16-bits16-bits

YY

addressaddress

FFFFFFFFFFFFFFFF

The answer, is that it depends The answer, is that it depends on the instruction you choose:on the instruction you choose:• LDB: loads one byte (8-bit)LDB: loads one byte (8-bit)

• LDH: loads half word (16-bit)LDH: loads half word (16-bit)

• LDW: loads a word (32-bit)LDW: loads a word (32-bit)

• LDDW: loads a double word (64-LDDW: loads a double word (64-bit)bit)

Note: LD on its own does not Note: LD on its own does not existexist..

Page 22: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

22

Using the Load Instructions

0000000000000000

0000000200000002

0000000400000004

0000000600000006

0000000800000008

DataData

16-bits16-bits

addressaddress

FFFFFFFFFFFFFFFF

0xB0xB0xA0xA

0xD0xD0xC0xC

Example:Example:

If we assume that If we assume that A5 = 0x4A5 = 0x4 then: then:

(1) (1) LDB *A5, A7 ; gives A7 = 0x00000001LDB *A5, A7 ; gives A7 = 0x00000001

(2) (2) LDH *A5,A7; gives A7 = 0x00000201LDH *A5,A7; gives A7 = 0x00000201

(3) (3) LDW *A5,A7; gives A7 = 0x04030201LDW *A5,A7; gives A7 = 0x04030201

(4) (4) LDDW *A5,A7:A6; gives A7:A6 = LDDW *A5,A7:A6; gives A7:A6 = 0x08070605040302010x0807060504030201

0x10x10x20x2

0x30x30x40x4

0x50x50x60x6

0x70x70x80x8

The syntax for the load The syntax for the load instruction is:instruction is:

LD *Rn,RmLD *Rn,Rm

0011

Page 23: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

23

Using the Load Instructions

0000000000000000

0000000200000002

0000000400000004

0000000600000006

0000000800000008

DataData

16-bits16-bits

addressaddress

FFFFFFFFFFFFFFFF

0xB0xB0xA0xA

0xD0xD0xC0xC

Example:Example:

If we assume that If we assume that A5 = 0x4A5 = 0x4 then: then:

(1) (1) LDB *A5, A7LDB *A5, A7 ; gives A7 = 0x00000001 ; gives A7 = 0x00000001

(2) (2) LDH *A5,A7; gives A7 = 0x00000201LDH *A5,A7; gives A7 = 0x00000201

(3) (3) LDW *A5,A7; gives A7 = 0x04030201LDW *A5,A7; gives A7 = 0x04030201

(4) (4) LDDW *A5,A7:A6; gives A7:A6 = LDDW *A5,A7:A6; gives A7:A6 = 0x08070605040302010x0807060504030201

0x10x10x20x2

0x30x30x40x4

0x50x50x60x6

0x70x70x80x8

The syntax for the load The syntax for the load instruction is:instruction is:

LD *Rn,RmLD *Rn,Rm

0011

Page 24: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

24

Using the Load Instructions

0000000000000000

0000000200000002

0000000400000004

0000000600000006

0000000800000008

DataData

16-bits16-bits

addressaddress

FFFFFFFFFFFFFFFF

0xB0xB0xA0xA

0xD0xD0xC0xC

Example:Example:

If we assume that If we assume that A5 = 0x4A5 = 0x4 then: then:

(1) (1) LDB *A5, A7 ; gives A7 = 0x00000001LDB *A5, A7 ; gives A7 = 0x00000001

(2) (2) LDH *A5,A7LDH *A5,A7; gives A7 = 0x00000201; gives A7 = 0x00000201

(3) (3) LDW *A5,A7; gives A7 = 0x04030201LDW *A5,A7; gives A7 = 0x04030201

(4) (4) LDDW *A5,A7:A6; gives A7:A6 = LDDW *A5,A7:A6; gives A7:A6 = 0x08070605040302010x0807060504030201

0x10x10x20x2

0x30x30x40x4

0x50x50x60x6

0x70x70x80x8

The syntax for the load The syntax for the load instruction is:instruction is:

LD *Rn,RmLD *Rn,Rm

0011

Page 25: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

25

Using the Load Instructions

0000000000000000

0000000200000002

0000000400000004

0000000600000006

0000000800000008

DataData

16-bits16-bits

addressaddress

FFFFFFFFFFFFFFFF

0xB0xB0xA0xA

0xD0xD0xC0xC

Example:Example:

If we assume that If we assume that A5 = 0x4A5 = 0x4 then: then:

(1) (1) LDB *A5, A7 ; gives A7 = 0x00000001LDB *A5, A7 ; gives A7 = 0x00000001

(2) (2) LDH *A5,A7; gives A7 = 0x00000201LDH *A5,A7; gives A7 = 0x00000201

(3) (3) LDW *A5,A7LDW *A5,A7; gives A7 = 0x04030201; gives A7 = 0x04030201

(4) (4) LDDW *A5,A7:A6; gives A7:A6 = LDDW *A5,A7:A6; gives A7:A6 = 0x08070605040302010x0807060504030201

0x10x10x20x2

0x30x30x40x4

0x50x50x60x6

0x70x70x80x8

The syntax for the load The syntax for the load instruction is:instruction is:

LD *Rn,RmLD *Rn,Rm

0011

Page 26: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

26

Using the Load Instructions

0000000000000000

0000000200000002

0000000400000004

0000000600000006

0000000800000008

DataData

16-bits16-bits

addressaddress

FFFFFFFFFFFFFFFF

0xB0xB0xA0xA

0xD0xD0xC0xC

Example:Example:

If we assume that If we assume that A5 = 0x4A5 = 0x4 then: then:

(1) (1) LDB *A5, A7 ; gives A7 = 0x00000001LDB *A5, A7 ; gives A7 = 0x00000001

(2) (2) LDH *A5,A7; gives A7 = 0x00000201LDH *A5,A7; gives A7 = 0x00000201

(3) (3) LDW *A5,A7; gives A7 = 0x04030201LDW *A5,A7; gives A7 = 0x04030201

(4) (4) LDDW *A5,A7:A6LDDW *A5,A7:A6; gives A7:A6 = ; gives A7:A6 = 0x08070605040302010x0807060504030201

0x10x10x20x2

0x30x30x40x4

0x50x50x60x6

0x70x70x80x8

The syntax for the load The syntax for the load instruction is:instruction is:

LD *Rn,RmLD *Rn,Rm

0011

Page 27: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

27

Using the Load Instructions

0000000000000000

0000000200000002

0000000400000004

0000000600000006

0000000800000008

DataData

16-bits16-bits

addressaddress

FFFFFFFFFFFFFFFF

0xB0xB0xA0xA

0xD0xD0xC0xC

Question: Question:

If data can only be accessed by the If data can only be accessed by the load instruction and the load instruction and the .D.D unit, unit, how can we load the register how can we load the register pointer pointer RnRn in the first place? in the first place?

0x10x10x20x2

0x30x30x40x4

0x50x50x60x6

0x70x70x80x8

The syntax for the load The syntax for the load instruction is:instruction is:

LD *Rn,RmLD *Rn,Rm

Page 28: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

28

• The instruction MVKL will allow a move of a 16-bit constant into a register as shown below:

MVKL .? a, A5

(‘a’ is a constant or label)

• How many bits represent a full address?

32 bits

• So why does the instruction not allow a 32-bit move?

All instructions are 32-bit wide (see instruction opcode).

Loading the Pointer Rn

Page 29: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

29

• To solve this problem another instruction is available:

MVKH

Loading the Pointer Rn

eg.eg. MVKHMVKH .?.? a, A5a, A5

(‘a’ is a constant or label) (‘a’ is a constant or label)

ahah

ahah xx

alal aa

A5A5

MVKL MVKL a, A5a, A5

MVKH MVKH a, A5a, A5

• Finally, to move the 32-bit address to a register we can use:

Page 30: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

30

Loading the Pointer Rn

MVKLMVKL 0x1234FABC, A5 0x1234FABC, A5

A5 = 0xFFFFFABC ; WrongA5 = 0xFFFFFABC ; Wrong

Example 1 Example 1

A5 = 0x87654321 A5 = 0x87654321

MVKLMVKL 0x1234FABC, A5 0x1234FABC, A5

A5 = 0xFFFFFABC (sign extension)A5 = 0xFFFFFABC (sign extension)

MVKHMVKH 0x1234FABC, A5 0x1234FABC, A5

A5 = 0x1234FABC ; OKA5 = 0x1234FABC ; OK

Example 2Example 2

MVKHMVKH 0x1234FABC, A5 0x1234FABC, A5

A5 = 0x12344321A5 = 0x12344321

• Always use MVKL then MVKH, look at the following examples:

Page 31: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

31

LDH, MVKL and MVKH

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A4A4

A15A15

Register File ARegister File A

..

..

..

aaxx

prodprod

32-bits32-bits

YY

.D.D.D.D

Data MemoryData Memory

MVKL MVKL pt1, A5pt1, A5 MVKH MVKH pt1, A5 pt1, A5

MVKL MVKL pt2, A6pt2, A6 MVKH MVKH pt2, A6pt2, A6

LDHLDH .D.D *A5, A0*A5, A0

LDHLDH .D.D *A6, A1*A6, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

pt1 and pt2 point to some locations pt1 and pt2 point to some locations

in the data memory.in the data memory.

Page 32: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

32

Creating a loop

MVKL MVKL pt1, A5pt1, A5 MVKH MVKH pt1, A5 pt1, A5

MVKL MVKL pt2, A6pt2, A6 MVKH MVKH pt2, A6pt2, A6

LDHLDH .D.D *A5, A0*A5, A0

LDHLDH .D.D *A6, A1*A6, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

So far we have only So far we have only implemented the SOP implemented the SOP for one tap only, i.e.for one tap only, i.e.

Y= aY= a11 ** x x11

So let’s create a loop So let’s create a loop so that we can so that we can implement the SOP implement the SOP for N Taps.for N Taps.

Page 33: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

33

Creating a loop

With the C6000 processors With the C6000 processors there are no dedicated there are no dedicated

instructions such as block instructions such as block repeat. The loop is created repeat. The loop is created

using the B instruction.using the B instruction.

So far we have only So far we have only implemented the SOP implemented the SOP for one tap only, i.e.for one tap only, i.e.

Y= aY= a11 * * xx11

So let’s create a loop So let’s create a loop so that we can so that we can implement the SOP implement the SOP for N Taps.for N Taps.

Page 34: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

34

What are the steps for creating a loop

1. 1. Create a label to branch to.Create a label to branch to.

2. 2. Add a branch instruction, B.Add a branch instruction, B.

3.3. Create a loop counter.Create a loop counter.

4. 4. Add an instruction to decrement the loop counter.Add an instruction to decrement the loop counter.

5. 5. Make the branch conditional based on the value in Make the branch conditional based on the value in

the loop counter.the loop counter.

Page 35: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

35

1. Create a label to branch to

MVKLMVKL pt1, A5pt1, A5 MVKH MVKH pt1, A5 pt1, A5

MVKL MVKL pt2, A6pt2, A6 MVKH MVKH pt2, A6pt2, A6

loop loop LDHLDH .D.D *A5, A0*A5, A0

LDHLDH .D.D *A6, A1*A6, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4 A4, A3, A4

Page 36: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

36

MVKLMVKL pt1, A5pt1, A5 MVKH MVKH pt1, A5 pt1, A5

MVKL MVKL pt2, A6pt2, A6 MVKH MVKH pt2, A6pt2, A6

loop loop LDHLDH .D.D *A5, A0*A5, A0

LDHLDH .D.D *A6, A1*A6, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

BB .?.? looploop

2. Add a branch instruction, B.

Page 37: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

37

Which unit is used by the B instruction?

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

aaxx

prodprod

32-bits32-bits

YY

.D.D.D.D

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

aaxx

prodprod

32-bits32-bits

YY

.D.D.D.D

Data MemoryData Memory

.S.S.S.S

MVKLMVKL pt1, A5pt1, A5 MVKH MVKH pt1, A5 pt1, A5

MVKL MVKL pt2, A6pt2, A6 MVKH MVKH pt2, A6pt2, A6

looploop LDHLDH .D.D *A5, A0*A5, A0

LDHLDH .D.D *A6, A1*A6, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

BB .?.? looploop

Page 38: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

38Data MemoryData Memory

Which unit is used by the B instruction?

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

aaxx

prodprod

32-bits32-bits

YY

.D.D.D.D

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

aaxx

prodprod

32-bits32-bits

YY

.D.D.D.D

.S.S.S.S

MVKLMVKL .S .S pt1, A5pt1, A5 MVKH MVKH .S.S pt1, A5 pt1, A5

MVKL MVKL .S.S pt2, A6pt2, A6 MVKH MVKH .S.S pt2, A6pt2, A6

looploop LDHLDH .D.D *A5, A0*A5, A0

LDHLDH .D.D *A6, A1*A6, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

BB .S.S looploop

Page 39: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

39Data MemoryData Memory

3. Create a loop counter.

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

aaxx

prodprod

32-bits32-bits

YY

.D.D.D.D

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

aaxx

prodprod

32-bits32-bits

YY

.D.D.D.D

.S.S.S.S

MVKLMVKL .S .S pt1, A5pt1, A5 MVKH MVKH .S .S pt1, A5 pt1, A5

MVKL MVKL .S .S pt2, A6pt2, A6 MVKH MVKH .S .S pt2, A6pt2, A6

MVKLMVKL .S.S count, B0count, B0

looploop LDHLDH .D.D *A5, A0*A5, A0

LDHLDH .D.D *A6, A1*A6, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

BB .S.S looploop

B registers will be introduced laterB registers will be introduced later

Page 40: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

40

4. Decrement the loop counter

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

aaxx

prodprod

32-bits32-bits

YY

.D.D.D.D

Data MemoryData Memory

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

aaxx

prodprod

32-bits32-bits

YY

.D.D.D.D

.S.S.S.S

MVKLMVKL .S .S pt1, A5pt1, A5 MVKH MVKH .S .S pt1, A5 pt1, A5

MVKL MVKL .S .S pt2, A6pt2, A6 MVKH MVKH .S .S pt2, A6pt2, A6

MVKLMVKL .S.S count, B0count, B0

looploop LDHLDH .D.D *A5, A0*A5, A0

LDHLDH .D.D *A6, A1*A6, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

SUBSUB .S.S B0, 1, B0B0, 1, B0

BB .S.S looploop

Page 41: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

41

• What is the syntax for making instruction conditional?

[condition] Instruction Label

e.g.

[B1] B loop

(1) (1) The The conditioncondition can be one of the following can be one of the following registers: registers: A1, A2, B0, B1, B2.A1, A2, B0, B1, B2.

(2) (2) Any instruction can be conditional.Any instruction can be conditional.

5. Make the branch conditional based on the value in the loop counter

Page 42: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

42

• The condition can be inverted by adding the exclamation symbol “!” as follows:

[!condition] Instruction Label

e.g.

[!B0] B loop ;branch if B0 = 0

[B0] B loop ;branch if B0 != 0

5. Make the branch conditional based on the value in the loop counter

Page 43: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

43Data MemoryData Memory

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

aaxx

prodprod

32-bits32-bits

YY

.D.D.D.D

.M.M.M.M

.L.L.L.L

A0A0

A1A1

A2A2

A3A3

A15A15

Register File ARegister File A

..

..

..

aaxx

prodprod

32-bits32-bits

YY

.D.D.D.D

.S.S.S.S

MVKLMVKL .S2 .S2 pt1, A5pt1, A5 MVKH MVKH .S2 .S2 pt1, A5 pt1, A5

MVKL MVKL .S2 .S2 pt2, A6pt2, A6 MVKH MVKH .S2 .S2 pt2, A6pt2, A6

MVKLMVKL .S2.S2 count, B0count, B0

looploop LDHLDH .D.D *A5, A0*A5, A0

LDHLDH .D.D *A6, A1*A6, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

SUBSUB .S.S B0, 1, B0B0, 1, B0

[B0][B0] BB .S.S looploop

5. Make the branch conditional

Page 44: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

44

Case 1: Case 1: B .S1B .S1 labellabel Relative branch.Relative branch.

Label limited to +/- 2Label limited to +/- 22020 offset. offset.

More on the Branch Instruction (1)

With this processor all the instructions are With this processor all the instructions are encoded in a 32-bit.encoded in a 32-bit.

Therefore the label must have a dynamic range Therefore the label must have a dynamic range of less than 32-bit as the instruction B has to be of less than 32-bit as the instruction B has to be coded.coded.

21-bit relative address21-bit relative addressBB

32-bit32-bit

Page 45: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

45

More on the Branch Instruction (2)

By specifying a register as an operand instead By specifying a register as an operand instead of a label, it is possible to have an absolute of a label, it is possible to have an absolute branch.branch.

This will allow a dynamic range of 2This will allow a dynamic range of 23232..

Case 2: Case 2: B .SB .S22 registerregister Absolute branch.Absolute branch.

Operates on .S2 ONLY!Operates on .S2 ONLY!

5-bit register 5-bit register codecodeBB

32-bit32-bit

Page 46: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

46

Testing the code

This code performs the followingThis code performs the following

operations:operations:

aa00*x*x00 + a + a00*x*x00 + a + a00*x*x00 + … + a + … + a00*x*x00

However, we would like to perform:However, we would like to perform:

aa00*x*x00 + a + a11*x*x11 + a + a22*x*x22 + … + a + … + aNN*x*xNN

MVKLMVKL .S2 .S2 pt1, A5pt1, A5 MVKH MVKH .S2 .S2 pt1, A5 pt1, A5

MVKL MVKL .S2 .S2 pt2, A6pt2, A6 MVKH MVKH .S2 .S2 pt2, A6pt2, A6

MVKLMVKL .S2.S2 count, B0count, B0

looploop LDHLDH .D.D *A5, A0*A5, A0

LDHLDH .D.D *A6, A1*A6, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

SUBSUB .S.S B0, 1, B0B0, 1, B0

[B0][B0] BB .S.S looploop

Page 47: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

47

Modifying the pointers

The solution is to modify the pointers The solution is to modify the pointers

A5 and A6.A5 and A6.

MVKLMVKL .S2 .S2 pt1, A5pt1, A5 MVKH MVKH .S2 .S2 pt1, A5 pt1, A5

MVKL MVKL .S2 .S2 pt2, A6pt2, A6 MVKH MVKH .S2 .S2 pt2, A6pt2, A6

MVKLMVKL .S2.S2 count, B0count, B0

looploop LDHLDH .D.D *A5, A0*A5, A0

LDHLDH .D.D *A6, A1*A6, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

SUBSUB .S.S B0, 1, B0B0, 1, B0

[B0][B0] BB .S.S looploop

Page 48: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

48

Indexing Pointers

DescriptionDescription

PointerPointer

SyntaxSyntax PointerPointerModifiedModified

**RR NoNo

R R can be any registercan be any register

In this case the pointers are used but not modified.In this case the pointers are used but not modified.

Page 49: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

49

Indexing Pointers

DescriptionDescription

PointerPointer

+ Pre-offset+ Pre-offset

- Pre-offset- Pre-offset

SyntaxSyntax PointerPointerModifiedModified

*R*R

*+R[*+R[dispdisp]]

*-R[*-R[dispdisp]]

NoNo

NoNo

NoNo

[[ disp] specifies the number of elements size in DW (64-bit), disp] specifies the number of elements size in DW (64-bit), W (32-bit), H (16-bit), or B (8-bit).W (32-bit), H (16-bit), or B (8-bit).

disp = disp = RR or 5-bit constant. or 5-bit constant. R R can be any register.can be any register.

In this case the pointers are modified In this case the pointers are modified BEFOREBEFORE being used being used

and and RESTOREDRESTORED to their previous values. to their previous values.

Page 50: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

50

Indexing Pointers

DescriptionDescription

PointerPointer

+ Pre-offset+ Pre-offset

- Pre-offset- Pre-offset

Pre-incrementPre-increment

Pre-decrementPre-decrement

SyntaxSyntax PointerPointerModifiedModified

*R*R

*+R[*+R[dispdisp]]

*-R[*-R[dispdisp]]

*++R[*++R[dispdisp]]

*--R[*--R[dispdisp]]

NoNo

NoNo

NoNo

YesYes

YesYes

In this case the pointers are modified In this case the pointers are modified BEFOREBEFORE being used being used

and and NOTNOT RESTORED to their Previous Values. RESTORED to their Previous Values.

Page 51: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

51

Indexing Pointers

DescriptionDescription

PointerPointer

+ Pre-offset+ Pre-offset

- Pre-offset- Pre-offset

Pre-incrementPre-increment

Pre-decrementPre-decrement

Post-incrementPost-increment

Post-decrementPost-decrement

SyntaxSyntax PointerPointerModifiedModified

*R*R

*+R[*+R[dispdisp]]

*-R[*-R[dispdisp]]

*++R[*++R[dispdisp]]

*--R[*--R[dispdisp]]

*R++[*R++[dispdisp]]

*R--[*R--[dispdisp]]

NoNo

NoNo

NoNo

YesYes

YesYes

YesYes

YesYes

In this case the pointers are modified In this case the pointers are modified AFTERAFTER being used being used

and and NOTNOT RESTORED to their Previous Values. RESTORED to their Previous Values.

Page 52: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

52

Indexing Pointers

DescriptionDescription

PointerPointer

+ Pre-offset+ Pre-offset

- Pre-offset- Pre-offset

Pre-incrementPre-increment

Pre-decrementPre-decrement

Post-incrementPost-increment

Post-decrementPost-decrement

SyntaxSyntax PointerPointerModifiedModified

*R*R

*+R[*+R[dispdisp]]

*-R[*-R[dispdisp]]

*++R[*++R[dispdisp]]

*--R[*--R[dispdisp]]

*R++[*R++[dispdisp]]

*R--[*R--[dispdisp]]

NoNo

NoNo

NoNo

YesYes

YesYes

YesYes

YesYes

[disp] specifies # elements - size in DW, W, H, or B.[disp] specifies # elements - size in DW, W, H, or B. disp = disp = RR or 5-bit constant. or 5-bit constant. R R can be any register.can be any register.

Page 53: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

53

Modify and testing the code

This code now performs the followingThis code now performs the following

operations:operations:

aa00*x*x00 + a + a11*x*x11 + a + a22*x*x22 + ... + a + ... + aNN*x*xNN

MVKLMVKL .S2 .S2 pt1, A5pt1, A5 MVKH MVKH .S2 .S2 pt1, A5 pt1, A5

MVKL MVKL .S2 .S2 pt2, A6pt2, A6 MVKH MVKH .S2 .S2 pt2, A6pt2, A6

MVKLMVKL .S2.S2 count, B0count, B0

looploop LDHLDH .D.D *A5++*A5++, A0, A0

LDHLDH .D.D *A6++*A6++, A1, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

SUBSUB .S.S B0, 1, B0B0, 1, B0

[B0][B0] BB .S.S looploop

Page 54: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

54

Store the final result

This code now performs the followingThis code now performs the following

operations:operations:

aa00*x*x00 + a + a11*x*x11 + a + a22*x*x22 + ... + a + ... + aNN*x*xNN

MVKLMVKL .S2 .S2 pt1, A5pt1, A5 MVKH MVKH .S2 .S2 pt1, A5 pt1, A5

MVKL MVKL .S2 .S2 pt2, A6pt2, A6 MVKH MVKH .S2 .S2 pt2, A6pt2, A6

MVKLMVKL .S2.S2 count, B0count, B0

loop loop LDHLDH .D.D *A5++, A0*A5++, A0

LDHLDH .D.D *A6++, A1*A6++, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

SUBSUB .S.S B0, 1, B0B0, 1, B0

[B0][B0] BB .S.S looploop

STHSTH .D.D A4, *A7A4, *A7

Page 55: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

55

Store the final result

The Pointer A7 has not been initialized.The Pointer A7 has not been initialized.

MVKLMVKL .S2 .S2 pt1, A5pt1, A5 MVKH MVKH .S2 .S2 pt1, A5 pt1, A5

MVKL MVKL .S2 .S2 pt2, A6pt2, A6 MVKH MVKH .S2 .S2 pt2, A6pt2, A6

MVKLMVKL .S2.S2 count, B0count, B0

loop loop LDHLDH .D.D *A5++, A0*A5++, A0

LDHLDH .D.D *A6++, A1*A6++, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

SUBSUB .S.S B0, 1, B0B0, 1, B0

[B0][B0] BB .S.S looploop

STHSTH .D.D A4, *A7A4, *A7

Page 56: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

56

Store the final result

The Pointer A7 is now initialized.The Pointer A7 is now initialized.

MVKLMVKL .S2 .S2 pt1, A5pt1, A5 MVKH MVKH .S2 .S2 pt1, A5 pt1, A5

MVKL MVKL .S2 .S2 pt2, A6pt2, A6 MVKH MVKH .S2 .S2 pt2, A6pt2, A6

MVKLMVKL .S2.S2 pt3, A7pt3, A7MVKHMVKH .S2.S2 pt3, A7pt3, A7MVKLMVKL .S2.S2 count, B0count, B0

loop loop LDHLDH .D.D *A5++, A0*A5++, A0

LDHLDH .D.D *A6++, A1*A6++, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

SUBSUB .S.S B0, 1, B0B0, 1, B0

[B0][B0] BB .S.S looploop

STHSTH .D.D A4, *A7A4, *A7

Page 57: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

57

What is the initial value of A4?

A4 is used as an accumulator,A4 is used as an accumulator,

so it needs to be reset to zero.so it needs to be reset to zero.

MVKLMVKL .S2 .S2 pt1, A5pt1, A5 MVKH MVKH .S2 .S2 pt1, A5 pt1, A5

MVKL MVKL .S2 .S2 pt2, A6pt2, A6 MVKH MVKH .S2 .S2 pt2, A6pt2, A6

MVKLMVKL .S2.S2 pt3, A7pt3, A7MVKHMVKH .S2.S2 pt3, A7pt3, A7MVKLMVKL .S2.S2 count, B0count, B0ZEROZERO .L.L A4A4

loop loop LDHLDH .D.D *A5++, A0*A5++, A0

LDHLDH .D.D *A6++, A1*A6++, A1

MPYMPY .M.M A0, A1, A3A0, A1, A3

ADDADD .L.L A4, A3, A4A4, A3, A4

SUBSUB .S.S B0, 1, B0B0, 1, B0

[B0][B0] BB .S.S looploop

STHSTH .D.D A4, *A7A4, *A7

Page 58: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

58

How can we add more processing power to

this processor?

.S1.S1.S1.S1

.M1.M1.M1.M1

.L1.L1.L1.L1

.D1.D1.D1.D1

A0A0A1A1A2A2A3A3A4A4

Register File ARegister File A

..

..

..

Data MemoryData Memory

A15A15

32-bits32-bits

Increasing the processing power!

Page 59: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

59

(1) Increase the clock frequency.

.S1.S1.S1.S1

.M1.M1.M1.M1

.L1.L1.L1.L1

.D1.D1.D1.D1

A0A0A1A1A2A2A3A3A4A4

Register File ARegister File A

..

..

..

Data MemoryData Memory

A15A15

32-bits32-bits

Increasing the processing power!

(2)(2) Increase the number Increase the number of Processing units.of Processing units.

Page 60: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

60

To increase the Processing Power, this processor has two sides (A and B or 1 and 2)

Data MemoryData Memory

.S1.S1.S1.S1

.M1.M1.M1.M1

.L1.L1.L1.L1

.D1.D1.D1.D1

A0A0A1A1A2A2A3A3A4A4

Register File ARegister File A

..

..

..

A15A15

32-bits32-bits

.S2.S2.S2.S2

.M2.M2.M2.M2

.L2.L2.L2.L2

.D2.D2.D2.D2

B0B0B1B1B2B2B3B3B4B4

Register File BRegister File B

..

..

..

B15B15

32-bits32-bits

Page 61: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

61

Can the two sides exchange operands in order to increase performance?

Data MemoryData Memory

.S1.S1.S1.S1

.M1.M1.M1.M1

.L1.L1.L1.L1

.D1.D1.D1.D1

A0A0A1A1A2A2A3A3A4A4

Register File ARegister File A

..

..

..

A15A15

32-bits32-bits

B15B15

.S2.S2.S2.S2

.M2.M2.M2.M2

.L2.L2.L2.L2

.D2.D2.D2.D2

B0B0B1B1B2B2B3B3B4B4

Register File BRegister File B

..

..

..

32-bits32-bits

Page 62: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

62

The answer is YES but there are limitations.

To exchange operands between the two To exchange operands between the two sides, some cross paths or links are sides, some cross paths or links are required.required.

What is a cross path?What is a cross path? A cross path links one side of the CPU to A cross path links one side of the CPU to

the other.the other. There are two types of cross paths:There are two types of cross paths:

DataData cross paths.cross paths. AddressAddress cross paths.cross paths.

Page 63: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

63

Data Cross Paths

Data cross paths can also be referred to Data cross paths can also be referred to as register file cross paths.as register file cross paths.

These cross paths allow operands from These cross paths allow operands from one side to be used by the other side.one side to be used by the other side.

There are only two cross paths:There are only two cross paths: one path which conveys data from side B one path which conveys data from side B

to side A, 1X.to side A, 1X. one path which conveys data from side A one path which conveys data from side A

to side B, 2X.to side B, 2X.

Page 64: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

64

TMS320C67x Data-Path

Page 65: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

65

Data Cross Paths

Data cross paths only apply to the .L, .S Data cross paths only apply to the .L, .S and .M units.and .M units.

The data cross paths are very useful, The data cross paths are very useful, however there are some limitations in however there are some limitations in their use.their use.

Page 66: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

66

Data Cross Path Limitations

AA

22xx

.L1.L1

.M1.M1

.S1.S1

BB

11xx

<src><src>

<src><src><dst><dst>

(1) The destination register must be on same side as unit.

(2) Source registers - up to one cross path per execute packet per side.

Execute packetExecute packet: group of instructions that : group of instructions that execute simultaneously.execute simultaneously.

Page 67: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

67

Data Cross Path Limitations

AA

22xx

.L1.L1

.M1.M1

.S1.S1

BB

11xx

<src><src>

<src><src><dst><dst>

eg:ADD .L1x A0,A1,B2MPY .M1x A0,B6,A9SUB .S1x A8,B2,A8

|| ADD .L1x A0,B0,A2

|| || Means that the SUB and ADD Means that the SUB and ADD belong to the same fetch packet, belong to the same fetch packet, therefore execute therefore execute simultaneously.simultaneously.

Page 68: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

68

Data Cross Path Limitations

AA

22xx

.L1.L1

.M1.M1

.S1.S1

BB

11xx

<src><src>

<src><src><dst><dst>

eg:ADD .L1x A0,A1,B2MPY .M1x A0,B6,A9SUB .S1x A8,B2,A8

|| ADD .L1x A0,B0,A2

NOT VALID!NOT VALID!

Page 69: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

69

Data Cross Paths for both sides

AA

22xx

.L1.L1

.M1.M1

.S1.S1

BB

11xx

<src><src>

<src><src><dst><dst>

.L2.L2

.M2.M2

.S2.S2

<dst><dst>

<src><src>

<src><src>

Page 70: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

70

Address cross paths

.D1.D1AA

AddrAddr

DataData

LDW.D1T1 *A0,A5LDW.D1T1 *A0,A5STW.D1T1 A5,*A0STW.D1T1 A5,*A0LDW.D1T1 *A0,A5LDW.D1T1 *A0,A5STW.D1T1 A5,*A0STW.D1T1 A5,*A0

(1) The pointer must be on the same side of the unit.

Page 71: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

71

Load or store to either side

.D1.D1AA

*A0*A0

BB

Data1Data1 A5A5

Data2Data2 B5B5

DA1 = T1DA1 = T1

DA2 = T2DA2 = T2LDW.D1T1 *A0,A5LDW.D1T1 *A0,A5LDW.D1T2 *A0,B5LDW.D1T2 *A0,B5LDW.D1T1 *A0,A5LDW.D1T1 *A0,A5LDW.D1T2 *A0,B5LDW.D1T2 *A0,B5

Page 72: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

72

Standard Parallel Loads

.D1.D1AA

A5A5

*A0*A0

BBB5B5

.D2.D2

Data1Data1

*B0*B0

LDW.D1T1 *A0,A5LDW.D1T1 *A0,A5|| LDW.D2T2 *B0,B5|| LDW.D2T2 *B0,B5 LDW.D1T1 *A0,A5LDW.D1T1 *A0,A5|| LDW.D2T2 *B0,B5|| LDW.D2T2 *B0,B5

DA1 = T1DA1 = T1

DA2 = T2DA2 = T2

Page 73: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

73

Parallel Load/Store using address cross paths

.D1.D1AA

A5A5

*A0*A0

BBB5B5

.D2.D2

Data1Data1

*B0*B0

LDW.D1T2 *A0,B5LDW.D1T2 *A0,B5|| STW.D2T1 A5,*B0|| STW.D2T1 A5,*B0 LDW.D1T2 *A0,B5LDW.D1T2 *A0,B5|| STW.D2T1 A5,*B0|| STW.D2T1 A5,*B0

DA1 = T1DA1 = T1

DA2 = T2DA2 = T2

Page 74: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

74

Fill the blanks ... Does this work?

.D1.D1AA

*A0*A0

BB

.D2.D2

Data1Data1

*B0*B0

LDW.D1__ *A0,B5LDW.D1__ *A0,B5|| STW.D2__ B6,*B0|| STW.D2__ B6,*B0 LDW.D1__ *A0,B5LDW.D1__ *A0,B5|| STW.D2__ B6,*B0|| STW.D2__ B6,*B0

DA1 = T1DA1 = T1

DA2 = T2DA2 = T2

Page 75: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

75

Not Allowed! Parallel accesses: both cross or neither cross

.D1.D1AA

*A0*A0

BBB5B5

B6B6

.D2.D2

Data1Data1

*B0*B0

LDW.D1LDW.D1T2T2 *A0,B5 *A0,B5|| STW.D2|| STW.D2T2T2 B6,*B0 B6,*B0 LDW.D1LDW.D1T2T2 *A0,B5 *A0,B5|| STW.D2|| STW.D2T2T2 B6,*B0 B6,*B0

DA2 = T2DA2 = T2

Page 76: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

76

Conditions Don’t Use Cross Paths

• If a conditional register comes from the opposite side, it does NOT use a data or address cross-path.

• Examples:

[B2] ADD .L1 A2,A0,A4 [A1] LDW .D2 *B0,B5

Page 77: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

77

CPUCPURef GuideRef Guide

Full CPU DatapathFull CPU Datapath(Pg 2-2)(Pg 2-2)

‘C62x Data-Path Summary

Page 78: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

78

‘C67x Data-Path Summary ‘‘C67x C67x

Page 79: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

79

Cross Paths - Summary

• Data– Destination register on same side as unit.– Source registers - up to one cross path per execute

packet per side.– Use “x” to indicate cross-path.

• Address– Pointer must be on same side as unit.– Data can be transferred to/from either side.– Parallel accesses: both cross or neither cross.

Conditionals Don’t Use Cross Paths.

Page 80: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

80

Code Review (using side A only)

MVKMVK .S1.S1 40, A240, A2 ; A2 = 40, loop count; A2 = 40, loop count

loop:loop: LDHLDH .D1.D1 *A5++, A0*A5++, A0 ; A0 = a(n); A0 = a(n)

LDHLDH .D1.D1 *A6++, A1*A6++, A1 ; A1 = x(n); A1 = x(n)

MPYMPY .M1.M1 A0, A1, A3A0, A1, A3 ; A3 = a(n) * x(n); A3 = a(n) * x(n)

ADDADD .L1.L1 A3, A4, A4A3, A4, A4 ; Y = Y + A3; Y = Y + A3

SUBSUB .L1.L1 A2, 1, A2A2, 1, A2 ; decrement loop count; decrement loop count

[A2][A2] BB .S1.S1 looploop ; if A2 ; if A2 0, branch 0, branch

STHSTH .D1.D1 A4, *A7A4, *A7 ; *A7 = Y; *A7 = Y

Y =Y =4040 aann x xnn

n = 1n = 1**

Note: Assume that A4 was previously cleared and the pointers are initialised.Note: Assume that A4 was previously cleared and the pointers are initialised.

Page 81: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

104

Dual Resources : Twice as Nice

A0A1A2A3A4

Register File A

A15or

A31

A5A6A7

cn

xn

prdsum

cnt

..

*c*x*y

.M1.M1

.L1.L1

.S1.S1

.D1.D1

.M2.M2

.L2.L2

.S2.S2

.D2.D2

Register File B

B0B1B2B3B4

B15or

B31

B5B6B7..

32-bits

....

32-bits

Our final view of the sum of products example...

Page 82: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

105

MVK .S1 40, A2

loop: LDH .D1 *A5++, A0

LDH .D1 *A6++, A1

MPY .M1 A0, A1, A3

ADD .L1 A4, A3, A4

SUB .S1 A2, 1, A2

[A2] B .S1 loop

STW .D1 A4, *A7

y =40

cn xnn = 1

*

Optional - Resource Specific Coding

A0A1A2A3A4

Register File A

A15or

A31

A5A6A7

cn

xn

prdsum

cnt

..

*c*x*y

.M1.M1

.L1.L1

.S1.S1

.D1.D1

32-bits

..

It’s easier to use symbols rather thanregister names, but you can use

either method.

Page 83: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

106

TMS320C6000 Instruction Set

Page 84: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

107

'C6000 System Block Diagram

ExternalMemory

.D1

.M1

.L1

.S1

.D2

.M2

.L2

.S2

Re

gg

iste

r Set B

Re

gis

ter S

et A

CPU

On-chipMemoryP

ERIPHERALS

Internal Buses

To summarize each units’ instructions ...

Page 85: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

108

‘C62x RISC-like instruction set

.L .L

.D .D

.S .S

.M .M

No Unit UsedIDLENOP

.S UnitNEGNOT ORSETSHLSHRSSHLSUBSUB2XORZERO

ADDADDKADD2ANDBCLREXTMVMVCMVKMVKH

.L UnitNOTORSADDSATSSUBSUBSUBCXORZERO

ABSADDANDCMPEQCMPGTCMPLTLMBDMVNEGNORM

.M UnitSMPYSMPYH

MPYMPYHMPYLHMPYHL

.D Unit

NEGSTB (B/H/W) SUBSUBAB (B/H/W) ZERO

ADDADDAB (B/H/W)LDB (B/H/W) MV

Page 86: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

109

'C62x RISC-Like Instruction Set (by category)

ArithmeticArithmeticABSABSADDADDADDAADDAADDKADDKADD2ADD2MPYMPYMPYHMPYHNEGNEGSMPYSMPYSMPYHSMPYHSADDSADDSATSATSSUBSSUBSUBSUBSUBASUBASUBCSUBCSUB2SUB2ZEROZERO

Program CtrlProgram CtrlBBIDLEIDLENOPNOP

LogicalLogicalANDANDCMPEQCMPEQCMPGTCMPGTCMPLTCMPLTNOTNOTORORSHLSHLSHRSHRSSHLSSHLXORXOR

Data MgmtData Mgmt

LDB/H/WLDB/H/WMVMVMVCMVCMVKMVKMVKLMVKLMVKHMVKHMVKLHMVKLHSTB/H/WSTB/H/W

Bit MgmtBit MgmtCLRCLREXTEXTLMBDLMBDNORMNORMSETSET

Note: Refer to the Note: Refer to the 'C6000 CPU Reference Guide for more details.

Page 87: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

110

‘C67x: Superset of Fixed-Point (by units)

.L .L

.D .D

.S .S

.M .M

No Unit RequiredIDLENOP

.S UnitNEGNOT ORSETSHLSHRSSHLSUBSUB2XORZERO

ADDADDKADD2ANDBCLREXTMVMVCMVKMVKH

ABSSPABSDPCMPGTSPCMPEQSPCMPLTSPCMPGTDPCMPEQDPCMPLTDPRCPSPRCPDPRSQRSPRSQRDPSPDP

.L UnitNOTORSADDSATSSUBSUBSUBCXORZERO

ABSADDANDCMPEQCMPGTCMPLTLMBDMVNEGNORM

ADDSPADDDPSUBSPSUBDPINTSPINTDPSPINTDPINTSPRTUNCDPTRUNCDPSP

.M UnitSMPYSMPYH

MPYMPYHMPYLHMPYHL

MPYSPMPYDPMPYIMPYID

.D Unit

NEGSTB (B/H/W) SUBSUBAB (B/H/W) ZERO

ADDADDAB (B/H/W)LDB (B/H/W)LDDWMV

Page 88: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

111

'C64x: Superset of ‘C62x Instruction SetData Pack/UnPACK2PACKH2PACKLH2PACKHL2PACKH4PACKL4UNPKHU4UNPKLU4SWAP2/4

Dual/Quad ArithABS2ADD2ADD4MAXMINSUB2SUB4SUBABS4

Bitwise LogicalANDN

Shift & MergeSHLMBSHRMB

Load ConstantMVK (5-bit)

Bit OperationsBITC4BITRDEALSHFL

MoveMVD

AverageAVG2AVG4

ShiftsROTLSSHVLSSHVR

MultipliesMPYHIMPYLIMPYHIRMPYLIRMPY2SMPY2DOTP2DOTPN2DOTPRSU2DOTPNRSU2DOTPU4DOTPSU4GMPY4XPND2/4

Mem AccessLDDWLDNWLDNDWSTDWSTNWSTNDW

Load ConstantMVK (5-bit)

Dual ArithmeticADD2SUB2

Bitwise LogicalANDANDNORXOR

Address Calc.ADDAD

Data Pack/UnPACK2PACKH2PACKLH2PACKHL2UNPKHU4UNPKLU4SWAP2SPACK2SPACKU4

Dual/Quad ArithSADD2SADDUS2SADD4

Bitwise LogicalANDN

Shifts & MergeSHR2SHRU2SHLMBSHRMB

ComparesCMPEQ2CMPEQ4CMPGT2CMPGT4

Branches/PCBDECBPOSBNOPADDKPC

.L .L

.M .M .D .D

.S .S

Page 89: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

112

Sample C62x Compiler Benchmarks

TI C62x™ Compiler Performance Release 4.0: Execution Time in s @ 300 MHz Versus hand-coded assembly based on cycle count

Algorithm Used InAsm

CyclesAssembly Time

(s)C Cycles (Rel 4.0)

C Time (s)% Efficiency vs

Hand Coded

Block Mean Square Error MSE of a 20 column image matrix

For motion compensation of

image data348 1.16 402 1.34 87%

Codebook SearchCELP based voice

coders977 3.26 961 3.20 100%

Vector Max40 element input vector

Search Algorithms 61 0.20 59 0.20 100%

All-zero FIR Filter40 samples, 10 coefficients

VSELP based voice coders

238 0.79 280 0.93 85%

Minimum Error SearchTable Size = 2304

Search Algorithms 1185 3.95 1318 4.39 90%

IIR Filter16 coefficients

Filter 43 0.14 38 0.13 100%

IIR – cascaded biquads 10 Cascaded biquads (Direct Form II)

Filter 70 0.23 75 0.25 93%

MACTwo 40 sample vectors

VSELP based voice coders

61 0.20 58 0.19 100%

Vector SumTwo 44 sample vectors

51 0.17 47 0.16 100%

MSEMSE between two 256 element vectors

Mean Sq. Error Computation in

Vector Quantizer279 0.93 274 0.91 100%

Completely natural C code (non ’C6000 specific) Code available at: http://www.ti.com/sc/c6000compiler

Completely natural C code (non ’C6000 specific) Code available at: http://www.ti.com/sc/c6000compiler

Page 90: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

113

* Includes traceback

Sample Imaging & Telecom Benchmarks

DSP & Image Processing Kernels

Cycle Count Performance

C62x C64xCycle Improvement

C64:C62720MHz C64x vs

300MHz C62x

Reed Solomon Decode: Syndrome Accumulation (204,188,8) Packet

1680 4703.5x 8.4x

cycles/packet

Viterbi Decode (GSM)(16 states)

38.25 14*

2.7x 6.5xcycles/output

FFT - Radix 4 - Complex (size = N log (N)) (16-bit)

12.7 6.0 2.1x 5xcycles/data

Polyphase Filter - Image Scaling (8-bit)

0.77 0.332.3x 5.5x

cycles/output/filter tap

Correlation - 3x3(8-bit)

4.5 1.283.5x 8.4x

cycles/pixel

Median Filter - 3x3(8-bit)

9.0 2.1 4.3x 10.3xcycles/pixel

Motion Estimation - 8x8 MAD (8-bit) 0.953 0.1267.6x 18.2x

cycles/pixel

Page 91: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

114

Sample C62x Compiler Benchmarks

TI C62x™ Compiler Performance Release 4.0: Execution Time in s @ 300 MHz Versus hand-coded assembly based on cycle count

Algorithm Used InAsm

CyclesAssembly Time

(s)C Cycles (Rel 4.0)

C Time (s)% Efficiency vs

Hand Coded

Block Mean Square Error MSE of a 20 column image matrix

For motion compensation of

image data348 1.16 402 1.34 87%

Codebook SearchCELP based voice

coders977 3.26 961 3.20 100%

Vector Max40 element input vector

Search Algorithms 61 0.20 59 0.20 100%

All-zero FIR Filter40 samples, 10 coefficients

VSELP based voice coders

238 0.79 280 0.93 85%

Minimum Error SearchTable Size = 2304

Search Algorithms 1185 3.95 1318 4.39 90%

IIR Filter16 coefficients

Filter 43 0.14 38 0.13 100%

IIR – cascaded biquads 10 Cascaded biquads (Direct Form II)

Filter 70 0.23 75 0.25 93%

MACTwo 40 sample vectors

VSELP based voice coders

61 0.20 58 0.19 100%

Vector SumTwo 44 sample vectors

51 0.17 47 0.16 100%

MSEMSE between two 256 element vectors

Mean Sq. Error Computation in

Vector Quantizer279 0.93 274 0.91 100%

Completely natural C code (non ’C6000 specific) Code available at: http://www.ti.com/sc/c6000compiler

Completely natural C code (non ’C6000 specific) Code available at: http://www.ti.com/sc/c6000compiler

Page 92: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

116

TMS320C6000 Memory

Page 93: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

117

Memory size per device

Devices Internal EMIFA EMIFBC6201, C6701C6204, C6205

P = 64 kBD = 64 kB

52M Bytes (32-bits wide)

N/AC6202P = 256 kBD = 128 kB

C6203P = 384 kBD = 512 kB

C6211C6711 L1P = 4 kB

L1D = 4 kBL2 = 64 kB

128M Bytes (32-bits wide)

N/A

C671264M Bytes

(16-bits wide)

C6713L1P = 4 kBL1D = 4 kBL2 = 256 kB

128M Bytes (32-bits wide) N/A

C6411DM642

L1P = 16 kBL1D = 16 kBL2 = 256 kB

128M Bytes (32-bits wide) N/A

C6414C6415C6416

L1P = 16 kBL1D = 16 kBL2 = 1 MB

256M Bytes (64-bits wide)

64M Bytes (16-bits wide)

Page 94: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

118

Internal Memory Summary

Devices Internal(L2)

External

C6211C6711C6713

64 kB512M

(32-bit wide)

C6712 256 kB512M

(16-bit wide)

LINK: TMS320C6000 DSP Generation

Devices Internal(L2)

External

C6414C6415C6416

1 MBA: 1GB (64-bit)B: 256kB (16-bit)

DM642 256 kB 1GB (64-bit)

C6411 256 kB 256MB (32-bit)

Page 95: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

Performance

Making use of Parallelism

Page 96: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

120

Given this simple loop …y =

40

cn xnn = 1

*

MVK .S1 40, cnt

loop:

LDH .D1 *cp++, c

LDH .D1 *xp++, x

MPY .M1 c, x, prod

ADD .L1 y, prod, y

SUB .L1 cnt, 1, cnt

[cnt] B .S1 loop

STW .D y, *yp

cx

prody

cnt

*cp*xp*yp

.M1 .M1

.L1 .L1

.S1 .S1

.D1 .D1

How many of these instructions can we get in parallel?

short mac(short *c, short *x, int count) { for (i=0; i < count; i++) { sum += c[i] * x[i]; } …

Page 97: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

121

L2: ; PIPED LOOP PROLOG

LDW .D1 *A4++,A3|| LDW .D2 *B6++,B7

LDW .D1 *A4++,A3|| LDW .D2 *B6++,B7

[B0] B .S1 L3 || LDW .D1 *A4++,A3|| LDW .D2 *B6++,B7

[B0] B .S1 L3|| LDW .D1 *A4++,A3|| LDW .D2 *B6++,B7

[B0] B .S1 L3|| LDW .D1 *A4++,A3|| LDW .D2 *B6++,B7

MPY .M2 B7,A3,B4|| MPYH .M1 B7,A3,A5|| [B0] B .S1 L3|| LDW .D1 *A4++,A3|| LDW .D2 *B6++,B7

MPY .M2 B7,A3,B4|| MPYH .M1 B7,A3,A5|| [B0] B .S1 L3|| LDW .D1 *A4++,A3|| LDW .D2 *B6++,B7;** -----------------------*L3: ; PIPED LOOP KERNEL ADD .L2 B4,B5,B5|| ADD .L1 A5,A0,A0|| MPY .M2 B7,A3,B4|| MPYH .M1 B7,A3,A5|| [B0]B .S1 L3|| [B0]SUB .S2 B0,1,B0|| LDW .D1 *A4++,A3|| LDW .D2 *B6++,B7;** -----------------------*

C62x Intense Parallelism

What about the ‘C67x?

short mac(short *c, short *x, int count) { for (i=0; i < count; i++) { sum += c[i] * x[i]; } …

Given this C code

The C62x compiler can achieve Two Sum-of-Products per cycle

Given this C code

The C62x compiler can achieve Two Sum-of-Products per cycle

Page 98: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

122

C67x MAC using Natural C

;** --------------------------------------------------*LOOP: ; PIPED LOOP KERNEL

LDDW .D1 A4++,A7:A6|| LDDW .D2 B4++,B7:B6|| MPYSP .M1X A6,B6,A5|| MPYSP .M2X A7,B7,B5|| ADDSP .L1 A5,A8,A8|| ADDSP .L2 B5,B8,B8|| [A1] B .S2 LOOP|| [A1] SUB .S1 A1,1,A1;** --------------------------------------------------*

float mac(float *c, float *x, int count){ int i, float sum = 0;

for (i=0; i < count; i++) { sum += c[i] * x[i]; } …

A0

A15

..

.M1.M1

.L1.L1

.D1.D1

.S1.S1

.M2.M2

.L2.L2

.D2.D2

.S2.S2

B0

B15

..

Controller/DecoderController/Decoder

Memory

Can the 'C64x do better?

The C67x compiler gets two 32-bit

floating-point

Sum-of-Products per iteration

The C67x compiler gets two 32-bit

floating-point

Sum-of-Products per iteration

Page 99: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

123

C64x gets four MAC’s using DOTP2short mac(short *c, short *x, int count){ int i, short sum = 0;

for (i=0; i < count; i++) { sum += c[i] * x[i]; } …

;** --------------------------------------------------*; PIPED LOOP KERNELLOOP: ADD .L2 B8,B6,B6 || ADD .L1 A6,A7,A7 || DOTP2 .M2X B4,A4,B8 || DOTP2 .M1X B5,A5,A6 || [ B0] B .S1 LOOP || [ B0] SUB .S2 B0,-1,B0|| LDDW .D2T2 *B7++,B5:B4|| LDDW .D1T1 *A3++,A5:A4;** --------------------------------------------------*

A5

B5

A6

A7

x

=

+

m1 m0

n1 n0

m1*n1 + m0*n0

running sum

DOTP2

How many multiplies can the ‘C6x perform?

Page 100: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

124

MMAC’s How many 16-bit MMACs (millions of MACs per second)

can the 'C6201 perform?

400 MMACs (two .M units x 200 MHz)

2 .M units x 2 16-bit MACs (per .M unit / per cycle)x 720 MHz---------------- 2880 MMACs

How about 16x16 MMAC’s on the ‘C64x devices?

How many 8-bit MMACs on the ‘C64x?

5760 MMACs (on 8-bit data)

Page 101: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

125

How Do We Get Such High Parallelism?

Compiler and Assembly Optimizer use a technique called Software Pipelining

Software pipelining enables high performance (esp. on DSP-type loops)

Key point: Tools do all the work!

What is software pipelining?Let's look at a simple example ...

Page 102: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

126

Tools Use Software Pipelining

Here’s a simple example to demonstrate ...

LDH

|| LDH

MPY

ADD

______________ cycles

How many cycles wouldit take to perform thisloop 5 times?

5 x 3 = 15

Our functional units could be used like ...

Page 103: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

127

7

1

Cycle

2

3

4

5

6

Without Software Pipelining

ldh ldh

mpy

add

add

ldh ldh

mpy

ldh ldh

In seven cycles, we’re almost half-way done ...

.M1 .M2 .L1 .L2 .S1 .S2.D1 .D2

Page 104: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

128

7

1

Cycle

2

3

4

5

6

With Software Pipelining

ldh ldh

mpyldh ldh

addmpyldh ldh

addmpyldh ldh

addmpyldh ldh

addmpy

add

Completes in only 7 cyclesCompletes in only 7 cycles

It takes 1/2 the time! How does this translate to code?

.M1 .M2 .L1 .L2 .S1 .S2.D1 .D2

Page 105: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

129

S/W Pipelining Translated to Code

7

1

Cycle

2

3

4

5

6

.S1 .S2.D1 .D2ldh ldh

mpyldh ldh

addmpyldh ldh

addmpyldh ldh

addmpyldh ldh

addmpy

add

c1: LDH|| LDH

c2: MPY|| LDH|| LDH

c3: ADD|| MPY|| LDH|| LDH

Page 106: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

DSK

Code Composer Studio

Page 107: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

131

C6416 DSK

Diagnostic Utility included with DSK ...

Page 108: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

132

C6416 DSK

Diagnostic Utility included with DSK ...

Page 109: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

133

DSK’s Diagnostic Utility

CCS Overview ...

Test/Diagnose DSK hardware

Verify USB emulation link

Use Advanced tests to facilitate debugging

Reset DSK hardware

Page 110: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

134

SIM

Simulator

Code Composer Studio

DSK’s Code Composer Studio Includes: Integrated Edit / Debug GUI

Edit

DSK

EVM

Third Party

BIOS: Real-time kernelReal-time analysis

DSP/BIOSLibraries

DSP/BIOSConfigTool

Debug

Code Generation Tools

CompilerAsm Opto

Asm

Standard Runtime Libraries

.outLink

XDS

DSP Board

CCS is Project centric ...

Page 111: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

135

Code Generation

.outEditor

.sa

AsmOptimizer

.c / .cpp

Compiler

Asm.asm

Linker.obj

Link.cmd

.map

Page 112: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

136

What is a Project?

Project (.PJT) file contain:

References to files: Source Libraries Linker, etc …

Project settings: Compiler Options DSP/BIOS Linking, etc …

The project menu ...

Page 113: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

137

Project Menu

Access via pull-down menu or by right-clicking .pjt file in project explorer window

Project MenuHint:Create and open projects from the Project menu, not the File menu.

Hint:Create and open projects from the Project menu, not the File menu.

Build Options... Next slide

Page 114: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

138

Build Options

-g -q -fr"c:\modem\Debug" -mv6700

Eight Categories of Compiler options

The most common Compiler Options are ...

Page 115: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

140

Options Description

-mv6700 Generate ‘C67x code (‘C62x is default)-mv6400 Generate 'C64x code-fr <dir> Directory for object/output files-fs <dir> Directory for assembly files

-q Quiet mode (display less info while compiling)-g Enables src-level symbolic debugging-s Interlist C statements into assembly listing

Compiler’s Build Options Nearly one-hundred compiler options available to

tune your code's performance, size, etc. Following table lists the most common options:

debug options

In Chapter 4 we will examine the options which enable the compiler’s optimizer

And, the Config Tool ...

Page 116: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

142

DSP/BIOS Configuration Tool

Simplifies system design by: Automatically includes the appropriate

runtime support libraries Automatically handles interrupt vectors

and system reset Handles system memory configuration

(builds CMD file) Generates 5 files when CDB file is saved:

C file, Asm file, 2 header files and a linker command (.cmd) file

More to be discussed later …

Simplifies system design by: Automatically includes the appropriate

runtime support libraries Automatically handles interrupt vectors

and system reset Handles system memory configuration

(builds CMD file) Generates 5 files when CDB file is saved:

C file, Asm file, 2 header files and a linker command (.cmd) file

More to be discussed later …

Page 117: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

143

‘C6000 C Data Types

Type Size Representation

char, signed char 8 bits ASCIIunsigned char 8 bits ASCIIshort 16 bits 2’s complementunsigned short 16 bits binaryint, signed int 32 bits 2s complement unsigned int 32 bits binarylong, signed long 40 bits 2’s complementunsigned long 40 bits binaryenum 32 bits 2’s complementfloat 32 bits IEEE 32-bitdouble 64 bits IEEE 64-bitlong double 64 bits IEEE 64-bitpointers 32 bits binary

Page 118: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

146

GEL Scripting

GEL: General Extension Language

C style syntax Large number of debugger

commands as GEL functions Write your own functions Create GEL menu items

GEL: General Extension Language

C style syntax Large number of debugger

commands as GEL functions Write your own functions Create GEL menu items

Page 119: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

147

Debug using VB Script or Perl Using CCS Scripting, a simple script can:

Start CCS Load a file Read/write memory Set/clear breakpoints Run, and perform other basic testing

functions

Debug using VB Script or Perl Using CCS Scripting, a simple script can:

Start CCS Load a file Read/write memory Set/clear breakpoints Run, and perform other basic testing

functions

CCS Scripting

Page 120: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

148

TCONF Scripting (CDB)Tconf Script (.tcf) hello_dsk62cfg.tcf

#include <log.h>

extern LOG_Obj trace; /* created in hello.tci */

int main() { LOG_printf(&trace, "Hello World!\n"); return (0);}

Your Application hello.c

utils.loadPlatform("dsk6211");

/* load DSK6211 platform into TCOM */utils.getProgObjs(prog);

/* make all prog objects JavaScript global vars */LOG_system.bufLen = 128;

/* set buffer length of LOG_system to 128 */utils.importFile("hello");

/* import portable application script */prog.gen();

/* generate cfg files (and CDB file) */

var trace = LOG.create("trace");

/* create a new user log, named trace */trace.bufLen = 32;

/* initialize its length to 32 (words) */

Tconf Include File (.tci) hello.tci

• A textual way to configure CDB files• Runs on both PC and Unix• Create #include type files (.tci)• More flexible than Config Tool

• A textual way to configure CDB files• Runs on both PC and Unix• Create #include type files (.tci)• More flexible than Config Tool

Page 121: Chapter 2 TMS320C6000 Architectural Overview DSP Lecture 02

Chapter 2

TMS320C6000 Architectural Overview

- End -