sonics, inc. - complaint for patent infringement

.......

1 BRYAN WILSON (CA SBN 138842) KlMBERLYN. VAN VOORHIS (CA SBN 197486)

2 MICHAEL J. KRYSTON (CA SBN 260770) 'MORRISON & FOERSTER LLP

3 755 Page Mill Road Palo Alto, California 94 304-10 18' ·

4 Telephone: 650.813.5600 Facsimile: 650.494.0792

5 E~Mail: [email protected] E-Mail: KV an [email protected]

6 E-Mail: [email protected] .

.·,

i

\~ ~ ... :

7 Attorneys for Plaintiff SONICS, INC.

If, il '\":.,;;. n ·j ••• ,... ~.- j' /'.111!

8

9

10

UNITED STATES DISTRICT-COURT

NORTHERN DISTRICT OF CALIFORNIA

\ ·- l

';

11 . ·, (SAN JOSE or:riSI~ ' 1 .. - · . 12 SONICS,INC.,aDelawarecorporation, · dseNo. Jl ~$~: 11J.

v. "" .. J 13 Plaintiff, COMPLAINTFORPATENT

INFRINGEMENT 14

15 ARTERlS, INC., a Delaware corporation, DEMAND FOR JURY TRIAL

16 . Defendant.

17

18

19

20

21

22

23

24

25

26

27

28

COMPLAINT FOR PATENT INFRINGEMENT CASE NO. pa-1493661

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

COMPLAINT FOR PATENT INFRINGEMENT CASE NO.

2

pa-1493661

Plaintiff Sonics, Incorporated (“Sonics”) alleges as follows:

PARTIES

1. Sonics is a corporation organized under the laws of Delaware with its principal

place of business at 890 North McCarthy Blvd, Suite 200, Milpitas, California 95035. Sonics is a

leading provider of intelligent interconnect solutions, known as “Network-on-Chip” or NoC, that

manage the on-chip communications on System-on-Chip semiconductors (“SoCs”).

2. On information and belief, defendant Arteris Holdings, Inc. (“Arteris”) is

organized under the laws of Delaware, with its principal place of business at 111 W. Evelyn

Avenue, Suite 101, Sunnyvale, California, 94086. On information and belief, Arteris provides

Network-on-Chip technology and tools, and directs those products to various customers,

including those located in the Northern District of California. These customers use Arteris’s

technology and tools, and incorporate Arteris’s technology and tools into products that they sell,

and offer for sale in the Northern District of California and elsewhere in the United States.

JURISDICTION

3. This is an action for patent infringement arising under the patent laws of the

United States of America, 35 U.S.C. Section 1, et seq., including 35 U.S.C. Section 271. This

Court has subject matter jurisdiction pursuant to 28 U.S.C. Sections 1331 and 1338(a) in that this

is a civil action arising out of the patent laws of the United States of America.

VENUE

4. Venue in the Northern District of California is proper pursuant to 28 U.S.C.

Sections 1391(b)-(c) and 1400(b).

5. This Court has personal jurisdiction over Arteris. Arteris’s headquarters is within

this district, and Arteris has conducted and does conduct business within the State of California

and within this judicial district.

6. Arteris makes, distributes, offers for sale or license, sells or licenses, and

advertises its products and services in the United States, The State of California, and the Northern

District of California.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

COMPLAINT FOR PATENT INFRINGEMENT 3 CASE NO. pa-1493661

INTRADISTRICT ASSIGNMENT

7. This is an Intellectual Property Action to be assigned on a district-wide basis

pursuant to Civil Local Rule 3-2(c). Sonics and Arteris are both located in Santa Clara County,

making San Jose an appropriate division.

FIRST CAUSE OF ACTION (Patent Infringement of U.S. Patent No. 6,182,183)

8. Sonics incorporates by reference paragraphs 1 through 7 of this Complaint and

realleges them as though fully set forth herein.

9. On January 30, 2001, the United States Patent and Trademark Office issued U.S.

Patent No. 6,182,183, entitled “Communications system and method with multilevel connection

identification” (the “’183 patent”). Sonics is the assignee of the ’183 patent and continues to hold

all rights and interest in the ’183 patent. A true and correct copy of the ’183 patent is attached as

Exhibit A to this Complaint.

10. Arteris has directly infringed and continues to directly infringe the ’183 patent by

their manufacture, use, sale, importation and/or offer for sale during the term of the ’183 patent

products, technology and tools including, but not limited to, Arteris FlexNoC™, Arteris

FlexWay™, and Danube Network on a Chip Intellectual Property Library and earlier versions of

these products by Arteris. Sonics anticipates that additional infringing products or methods will

be found and will duly accuse such products and methods as discovery progresses. Arteris’s

infringement is literal and/or under the doctrine of equivalents.

11. Arteris has indirectly infringed and continues to indirectly infringe the ’183 patent

by inducing the purchasers, licensees, and users of their products to infringe the ’183 patent by

using products, technology and tools that include, but are not limited to, Arteris FlexNoC™,

Arteris FlexWay™, and Danube Network on a Chip Intellectual Property Library and earlier

versions of these products by Arteris.


by contributing to direct infringement by the purchasers, licensees, and users of their products,

technology and tools that include, but are not limited to, Arteris FlexNoC™, Arteris FlexWay™,

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28


and Danube Network on a Chip Intellectual Property Library and earlier versions of these

products by Arteris.

13. Arteris’s acts of infringement have been and continue to be willful, deliberate, and

in reckless disregard of Sonic’s patent rights. Arteris is thus liable to Sonics for infringement of

the ’183 patent pursuant to 35 U.S.C. § 271.

14. As a consequence of Arteris’s infringement, Sonics is entitled to recover damages

adequate to compensate it for the infringement complained of herein, but in no event less than a

reasonable royalty.

15. Arteris’s infringement has irreparably injured and will continue to irreparably

injure Sonics, unless and until such infringement is enjoined by this Court.

SECOND CAUSE OF ACTION (Patent Infringement of U.S. Patent No. 7,266,786)



17. On September 4, 2007, the United States Patent and Trademark Office issued U.S.

Patent No. 7,266,786, entitled “Method and apparatus for configurable address mapping and

protection architecture and hardware for on-chip systems” (the “’786 patent”). Sonics is the

assignee of the ’786 patent and continues to hold all rights and interest in the ’786 patent. A true

and correct copy of the ’786 patent is attached as Exhibit B to this Complaint.










1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28



Arteris FlexWay™, and Arteris Network on a Chip Intellectual Property Library and earlier












reasonable royalty.



THIRD CAUSE OF ACTION (Patent Infringement of U.S. Patent No. 7,277,975)



25. On October 2, 2007, the United States Patent and Trademark Office issued U.S.

Patent No. 7,277,975, entitled “Methods and apparatuses for decoupling a request from one or

more solicited responses” (the “’975 patent”). Sonics is the assignee of the ’975 patent and

continues to hold all rights and interest in the ’975 patent. A true and correct copy of the ’975

patent is attached as Exhibit C to this Complaint.




1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28









Arteris FlexWay™, and Danube Network on a Chip Intellectual Property Library and earlier












reasonable royalty.



FOURTH CAUSE OF ACTION (Patent Infringement of U.S. Patent No. 6,961,834)



33. On November 1, 2005, the United States Patent and Trademark Office issued U.S.

Patent No. 6,961,834, entitled “Method and apparatus for scheduling of requests to dynamic

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28


random access memory device” (the “’834 patent”). Sonics is the assignee of the ’834 patent and


patent is attached as Exhibit D to this Complaint.




FlexWay™, FlexArtist™, FlexExplorer™, and FlexVerifier™, and Danube Network on a Chip

Intellectual Property Library and earlier versions of these products by Arteris. Sonics anticipates

that additional infringing products or methods will be found and will duly accuse such products

and methods as discovery progresses. Arteris’s infringement is literal and/or under the doctrine

of equivalents.




Arteris FlexWay™, FlexArtist™, FlexExplorer™, and FlexVerifier™, and Danube Network on a

Chip Intellectual Property Library and earlier versions of these products by Arteris.




FlexArtist™, FlexExplorer™, and FlexVerifier™, and Danube Network on a Chip Intellectual

Property Library and earlier versions of these products by Arteris.






reasonable royalty.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28




FIFTH CAUSE OF ACTION (Patent Infringement of U.S. Patent No. 7,191,273)



41. On March 13, 2007, the United States Patent and Trademark Office issued U.S.

Patent No. 7,191,273, entitled “Method and apparatus for scheduling a resource to meet quality-

of-service restrictions” (the “’273 patent”). Sonics is the assignee of the ’273 patent and


patent is attached as Exhibit E to this Complaint.




FlexWay™, FlexArtist™, FlexExplorer™, and FlexVerifier™, and Danube Network on a Chip




of equivalents.




Arteris FlexWay™, FlexArtist™, FlexExplorer™, and FlexVerifier™, and Danube Network on a

Chip Intellectual Property Library and earlier versions of these products by Arteris.




1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28


FlexArtist™, FlexExplorer™, and FlexVerifier™, and Danube Network on a Chip Intellectual

Property Library and earlier versions of these products by Arteris.






reasonable royalty.



SIXTH CAUSE OF ACTION (Patent Infringement of U.S. Patent No. 6,816,814)



49. On November 9, 2004, the United States Patent and Trademark Office issued U.S.

Patent No. 6,816,814, entitled “Method and apparatus for decomposing and verifying

configurable hardware” (the “’814 patent”). Sonics is the assignee of the ’814 patent and

continues to hold all rights and interest in and to the ’814 patent. A true and correct copy of the

’814 patent is attached as Exhibit F to this Complaint.



products, technology and tools including, but not limited to, Arteris NoCcompiler™, Arteris

NoCverifier™, and FlexArtist™, and FlexExplorer, and Danube Network on a Chip Intellectual

Property Library and earlier versions of these products by Arteris. Sonics anticipates that

additional infringing products or methods will be found and will duly accuse such products and

methods as discovery progresses. Arteris’s infringement is literal and/or under the doctrine of

equivalents.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28




using products, technology and tools that include, but are not limited to, Arteris NoCcompiler™,

Arteris NoCverifier™, and FlexArtist™, and FlexExplorer™, and Danube Network on a Chip

Intellectual Property Library and earlier versions of these products by Arteris.



technology and tools that include, but are not limited to, Arteris NoCcompiler™, Arteris

NoCverifier™, and FlexArtist™, and FlexExplorer™, and Danube Network on a Chip







reasonable royalty.



SEVENTH CAUSE OF ACTION (Patent Infringement of U.S. Patent No. 7,299,155)



57. On November 20, 2007, the United States Patent and Trademark Office issued

U.S. Patent No. 7,299,155, entitled “Method and apparatus for decomposing and verifying

configurable hardware” (the “’155 patent”). Sonics is the assignee of the ’155 patent and

continues to hold all rights and interest in and to the ’155 patent. A true and correct copy of the

’155 patent is attached as Exhibit G to this Complaint.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28




products, technology and tools including, but not limited to, Arteris NoCcompiler™, Arteris





of equivalents.



using products, technology and tools that include, but are not limited to, Arteris NoCcompiler™,

Arteris NoCverifier™, and FlexArtist™, and FlexExplorer™, and Danube Network on a Chip




technology and tools that include, but are not limited to, Arteris NoCcompiler™, Arteris








reasonable royalty.



PRAYER FOR RELIEF

WHEREFORE, Sonics prays for relief as follows:

1 1. A judgment that the '183, '786, '975, '834, '273, '814 and '155 patents are,valid

2 and enforceable;

3 2. A judgment that Arteris is infringing and/or has infringed, and has contributed to

4 and induced infringement of, the '183, '786, '975, '834, '273, '814 and '155 pate~ts, and that

5 such infringement is willful and deliberate;

6 3. A permanent injunction pursuant to 35 U.S.C. § 283 that enjoins Arteris and its

7 affiliates, subsidiaries, officers, directors, employees, agents, representatives, licensees,

8 successors, assigns and all those acting for it and on its behalf, or acting in concert with them,

9 from further infringement of the '183, '786, '975, '834, '273, '814 and '155 patents;

10 4. An award of compensatory damages to Sonics, including but not limited to lost·

11 profits, but in· no event less than a reasonable royalty;

12 5. . That such damages be trebled for the willful, deliberate, and intentional

13 infringement by Arteris as alleged herein in accordance with 35 U.S.C. § 284;

14

15

6.

7.

That Sonics be awarded interest on the damages so computed;

An award of costs and attorneys' fees pursuant to 35 U.S.C. Section 285, or as

16 otherwise permitted by law, and

17 8. For such other and further relief as So nics may be entitled to as a matter of law or

18 that the Court deems just and proper.

19 DEMAND FOR JURY TRIAL

20 Pursuant to Federal Rule of Civil Procedure 38(b ), Sonics demands a trial by jury of all

21 issues triable in this action.

22

23 Dated: November 1, 2011

24

25

26

27

28


MORRISON & FOERSTER LLP

By:~~ Bry Wson

Attorneys for Plaintiff SONICS, INC.

12

1 CERTIFICATION OF INTERESTED ENTITIES OR PERSONS

2 Pursuant to Civil L.R. 3-16, the undersigned certifies that the following listed persons,

3 associations of persons, firms, partnerships, corporations (including parent corporations) or other

4 entities (i) have a financial interest in the subject matter in controversy or in a party to the

5 proceeding, or (ii) have a non-fmancial interest in that subject matter or in a party that could be

6 substantially affected by the outcome of this proceeding:

7 InveStar Capital

8 Partner Ventures

9

10

11

12

. 13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

Dated: November 1, 2011


MORRISON & FOERSTER LLP

By: ~ilf!;; Attorneys for Plaintiff SONICS, INC

13

Exhibit A

111111 1111111111111111111111111111111111111111111111111111111111111

(12) United States Patent Wingard et al.

(54) COMMUNICATIONS SYSTEM AND METHOD WITH MULTILEVEL CONNECTION IDENTIFICATION

(75) Inventors: Drew E. Wingard, San Carlos; Geert Paul Rosseel, Menlo Park; Jay S. Tomlinson, San Jose; Lisa A. Robinson, Boulder Creek, all of CA (US)

(73) Assignee: Sonics, Inc., Mountain View, CA (US)

( *) Notice: Under 35 U.S.C. 154(b), the term of this patent shall be extended for 0 days.

(21) Appl. No.: 09/191,291

(22) Filed: Nov. 13, 1998

(51) Int. Cl? ...................................................... G06F 13/00 (52) U.S. Cl. ................................ 710/129; 710/1; 710/268 (58) Field of Search ..................................... 710/100, 101,

71W106, 11~ 12~ 129, 268, 1, 9

(56) References Cited

U.S. PATENT DOCUMENTS

5,274,783 12/1993 House eta!. ......................... 395/325

A

11~

114 12.Q L, I

I

B

rml I

c

US006182183Bl

(10) Patent No.: US 6,182,183 B1 Jan.30,2001 (45) Date of Patent:

5,729,529 * 3/1998 Martinsson ........................... 370/235 5,748,914 * 5/1998 Barth eta!. .......................... 710/105 5,878,045 * 3/1999 Timbs .................................. 370/466

OTHER PUBLICATIONS

International Search Report, PCT/US99/26901, Apr. 6, 2000, 1 pg.

* cited by examiner

Primary Examiner-Aria Etienne (74) Attorney, Agent, or Firm-Blakely, Sokoloff, Taylor & Zafman LLP

(57) ABSTRACT

A communication system including at least two functional blocks, wherein an first functional block communicates with a second functional block by establishing a connection, wherein a connection is a logical state in which data may pass between the first functional block and the second functional block. One embodiment includes a bus coupled to each of the functional blocks and configured to carry a plurality of signals. The plurality of signals includes a connection identifier that indicates a particular connection that a data transfer is part of, and a thread identifier that indicates a transaction stream that the data transfer is part of.

25 Claims, 10 Drawing Sheets

112 126 l----c;0-D-R-AM----,

106

127 1----\_:0-EE-P-RO-M--,

108

115 I m I "------1 '-- 11\4

~---------------IE __ I ____ 1o_2 ____ 12_4~ '~~ D

I A

I I I B I ASIC I c I

I

'----! 104

-

I

I B I FPGA 110

A

B

c

115

-1

~-

11~

1Wl

1121

1 12

2 11

5

I I

I I

I D

11

5 I

123

l ~r

I E

l

102

l A

l

.--

I

I I

I B

l

ASIC

I

c l

~

'-- 10

4

112 IZ

O ~ 0

DRAM

11

5 10

6

127

.....-

-.--

---\_0

EEPRO

M

108

F

114

.___

_ \

124

128

~ ([

~

l ~

f---

U

I 1

I FP

GA

110

11 125

100

I

FIG

. 1

d • \Jl

• ~

~

......

~ =

......

~

~ ? ~

~=

N c c '"""'

'JJ.

= ~ ~ .....

'"""'

0 ......, '"""'

c e rJ'l

-..a-..

1--"

0

0

J-J

1--"

0

0

~

~

1--"

U.S. Patent

&;I

-

Jan.30,2001

a: 0 I-

~~I a: a I-CO

-co a: ~

-o Cf)zw

~~8~1 -o~frlco ooo ~(._)

~I • •

:::.::: (._)

0 _J

co :;;: • 0 _J • u..

~ ~ 0

I '-' _J o '-' :::.:::

I

(/) a: w I-(/)

C!J w a: Zol 0..--I-CO ~ a: =» C!J u.. z 0 (._)

I I

\ N) 0 L!)

Sheet 2 of 10 US 6,182,183 B1

,....----1NI8I ~I loo CO

•

-u..

-

a: L.U N -z 0 a: :r: (._) z >-(/)

T-I I

1---

r---

- c L 0 c K

90

f--

1--

-r-

914

I DA

TA F

LOW

BLO

CK

906

ADD

RES

S/

-C

OM

MAN

D

t ~ ~

DECO

DE

908 I

i---

STAT

E RE

GIS

TERS

AND

ST

ATE

CONT

ROL

BLO

CK 9

16

i SY

NCHR

ONI

ZER

----

----

----

----

----

----

----

----

-gf 21

90

0

918

FIG

. 3

d • \Jl

• ~

~

......

~ =

......

~

~ ? ~

~=

N c c '"""'

'JJ.

= ~ ~ ..... ~

0 ......, '"""'

c e rJ'l

-..a-..

1--"

0

0

J-J

1--"

0

0

~

~

1--"

1010

10

12

INIT

IATO

R

' (

'

INIT

IATO

R

' (

' FU

NCTI

ONA

L IN

TERF

ACE

BLO

CK

/ M

ODU

LE

" ,

" ,

1002

10

04

1010

TA

RGET

' (

'

TARG

ET

INTE

RFAC

E FU

NCTI

ONA

L M

ODU

LE

BLO

CK

...

1006

10

08

2000

FIG

. 4

d • \Jl

• ~

~

......

~ =

......

~

~ ? ~

~=

N c c '"""'

'JJ.

= ~ ~ ..... ~

0 ......, '"""'

c e rJ'l

-..a-..

1--"

0

0

J-J

1--"

0

0

~

~

1--"

U.S. Patent Jan.30,2001 Sheet 5 of 10 US 6,182,183 B1

-----------------.

2CJ) 2:::J Oco (.)

I A .. ... ..... --

(.) ----

.. ---co --

.. --~

... -... -

N ~

0 ~

\

---.. -

f+

--.. -...

~CL rCJ) ~L.U oa: 0 N M M N N

~I

•

-u..

L.U

Cl

d • \Jl

• ~

320

CLO

CK

"---if

"---if

'\

. ¥

~

......

~

322

RESE

T I

=

......

{ 32

4 CM

D

RESP

32

6 CO

NNID

34

0 ~

~

328

ADDR

? ~

~=

RE

SP

{ 33

0 DA

TA

N c

342

332

c RE

SP

'"""'

FIG

. 6

'JJ.

=- ~ ~ .....

420 CLOCK~

-···~

'\.

¥ 0'

1

0 ......,

422

RESE

T I

I I

I I

I I

I '"""

' c

424

CMD

426

CONN

ID

428

ADDR

e rJ

'l DA

TA IX

I

I I

I I

I I

430

X

X

E-DA

TAEO

X

X

E-

DATA

E1

X

E-DA

TAEO

X

>

-..a

-.. I-

" 43

2 RE

SP

I X

X

X

A-

BUSY

X

X

A-

BUSY

x-·

·-··

·x -)

00

J-J

I-

"

FIG

. 7

00

~

~

I-"

d • \Jl

• ~

520 CLOCK~

"-----

-----K

"-

---'

f '\.

¥

~

......

~

522

RESE

T I

I I

I I

I I

I =

.....

. 52

4 CM

D

526

CONN

ID ~ ~

I >

I

>

I >

I

>

I >

I

>

I

~

~

528

ADDR

I

X E

-ADD

REO

X

D-A

DDRD

O X

E-A

DDRE

1 X

D-A

DDRD

1 X

E-A

DDRE

2 X

D-A

DD

RD

2 X

>

? ~

~=

530

DATA

L

X

X

X

E-DA

TAEO

X

B-

DATA

BO

X

E-DA

TAE1

X

B-

DATA

B1

X

E-DA

TAE2

)

N c

RESP

~

X

X

A-O

VA

X

B-DV

A X

A-

OVA

X

B-

DVA

)(_

_&

QV

A

>

c 53

2 '"""

' I

I I

I I

I

FIG

. 8

'JJ.

=- ~ ~ .....

620 CLOCK~

"-----

-----K

'\

If

'\.

¥

-..J

0 ......,

622

RESE

T I

I I

I I

I I

I '"""

' c

624

CMD

626

CONN

ID

628

ADDR

...-

-e rJ

'l 63

0 DA

TA I

X

X

X

E-DA

TAEO

-x B

-DAT

ABO

X

E-

DATA

E1 ·-x

E-D

ATAE

O X

>

-..a

-.. 63

2 RE

SP L

X

X

X

A-BU

SY

X

A-O

VA

X

A-BU

SY

X

A-O

VA

X

) 1-

-"

00

J-J

FIG

. 9

1--"

0

0

~

~

1--"

U.S. Patent Jan.30,2001

N

:::.::: I- 0 0 (.,) L..U ::2: 0 C/) z __J L..U (.,) z (.,) a: 0

(.,)

0 N '<T c.o N N N N I'- I'- I'- I'-

Sheet 8 of 10

a: ~ 0.... 0 C/)

0 c::x:: L..U

c::x:: 0 a:

co 0 N N (") (") I'- I'- I'-

Q ...... •

-LL

US 6,182,183 B1

U.S. Patent Jan.30,2001 Sheet 9 of 10 US 6,182,183 B1

/1010 - Clock

' Cmd r31 ( Addr fNl )

-s DataOut rNl ReqAccept "' ...

~ Reso 131 MASTER SLAVE

K Dataln rNl 1102 ReqAccept 1104

.;: Width f31 (I) )

~ Burstr 41 (I) lnterruptOut (I) ~ ...

ErrorOut (I) -... lnterruptln (I) -

..... Errorln (I) ... - Reset (I} -......

~ ReaThreadiD r41 (A)

r ~ ResoThreadiD 141 (A)

~ ReaThreadBusv 1161 (AI) ['\r ~ ResoThreadBusv f161 (AI) (

ReaConniD 181 (AI) )

FIG. 11

DRAM

1213

DRAM

CO

NTRO

LLER

12

08

1206

L (

" I

I I

1204

12

04

1204

~1210

~1210

~1210

1202

12

02

1202

1212

~

' I

I /

1204

12

04

~1210

~1

1202

12

02

210

d • \Jl

• ~

~

......

~ =

......

~

~ ? ~

~=

N c c '"""'

'JJ.

= ~ ~ .....

'"""'

c 0 ......, '"""'

c e rJ'l

-..a-..

1--"

0

0

J-J

1--"

0

0

~

~

1--"

US 6,182,183 B1 1

COMMUNICATIONS SYSTEM AND METHOD WITH MULTILEVEL

CONNECTION IDENTIFICATION

2 One disadvantage of computer buses is that each sub

system or component connected to the bus is constrained to use the protocol of the bus. In some cases, this limits the performance of the sub-system. For example, a sub-system

FIELD OF THE INVENTION

The present invention relates to a communication system to couple computing sub-systems.

BACKGROUND OF THE INVENTION

Electronic computing and communications systems continue to include greater numbers of features and to increase

5 may be capable of handling multiple transaction streams simultaneously, but the bus protocol is not capable of fully supporting concurrent operations. In the case of a subsystem handling multiple transaction streams where each transaction stream has ordering constraints, it is necessary

10 for the sub-system to identify each increment of data received or transmitted with a certain part of a certain data stream to distinguish between streams and to preserve order within a stream. This includes identifying a sub-system that is a source of a data transmission. Conventionally, such

in complexity. At the same time, electronic computing and communications systems decrease in physical size and cost per function. Rapid advances in semiconductor technology such as four-layer deep-sub-micron complimentary metaloxide semiconductor (CMOS) technology, have enabled true "system-on-a-chip" designs. These complex designs may incorporate, for example, one or more processor cores, a digital signal processing (DSP) core, several communications interfaces, and graphics support in application-specific 20 logic. In some systems, one or several of these extremely complex chips must communicate with each other and with other system components. Significant new challenges arise

15 identification is limited to a non-configurable hardware identifier that is generated by a particular sub-system or

in the integration, verification and testing of such systems because efficient communication must take place between 25 sub-systems on a single complex chip as well as between chips on a system board. One benefit to having an efficient and flexible method for communication between subsystems and chips is that system components can be reused

component.

Current bus systems provide limited capability to preserve order in one transaction stream by supporting "split transactions" in which data from one transaction may be interleaved with data from another transaction in the same stream. In such a bus, data is tagged as belonging to one stream of data, so that it can be identified even if it arrives out of order. This requires the receiving sub-system to decode an arriving address to extract the identification information.

in other systems with a minimum of redesign. One challenge in the integration, verification and testing

of modern electronic systems stems from the fact that modern electronic systems in many application areas have functionality, cost and form-factor requirements that mandate the sharing of resources, such as memory, among multiple functional blocks, where functional blocks can be any entity that interfaces to a communication system. In such systems, the functional blocks typically possess different performance characteristics and requirements, and the communications system and shared resources must simul- 40 taneously satisfy the total requirements. Key requirements

Current bus systems do not support true concurrency of operations for a sub-system that can process multiple streams of transactions over a single interconnect, such as a

30 memory controller that handles access to a single dynamic random access memory (DRAM) for several clients of the DRAM. A DRAM controller may require information related to a source of an access request, a priority of an access request, ordering requirements, etc. Current commu-

35 nication systems do not provide for such information to be transmitted with data without placing an additional burden on the sub-system to adapt to the existing protocol.

of typical functional blocks are bandwidth and latency constraints that can vary over several orders of magnitude between functional blocks. In order to simultaneously satisfy constraints that vary so widely, communications sys- 45 terns must provide high degrees of predictability.

Traditional approaches to the design of communications systems for modern, complex computer systems have various strengths and weaknesses. An essential aspect of such approaches is the communications interface that various 50

sub-systems present to one another. One approach is to define customized point-to-point interfaces between a subsystem and each peer with which it must communicate. This customized approach offers protocol simplicity, guaranteed performance, and isolation from dependencies on unrelated 55

sub-systems. Customized interfaces, however, are by their nature inflexible. The addition of a new sub-system with a different interface requires design rework.

In order for many sub-systems to operate in conventional systems using all of their capabilities, additional knowledge must be designed into the sub-systems to provide communication over existing communication systems. This makes sub-systems more expensive and less flexible in the event the sub-system is later required to communicate with new sub-systems or components. Existing communication approaches thus do not meet the requirements of today's large, complex electronics systems. Therefore, it is desirable for a communications system and mechanism to allow sub-systems of a large, complex electronics system to inter-operate efficiently regardless of their varying performance characteristics and requirements.

SUMMARY OF THE INVENTION

One embodiment of the present invention includes a shared communications bus for providing flexible communication capability between electronic sub-systems. One embodiment includes a protocol that allows for identification of data transmissions at different levels of detail as

A second approach is to define a system using standardized interfaces. Many standardized interfaces are based on pre-established computer bus protocols. The use of computer buses allows flexibility in system design, since as many different functional blocks may be connected together

required by a particular sub-system without additional 60 knowledge being designed into the sub-system.

as required by the system, as long as the bus has sufficient performance. It is also necessary to allocate access to the bus 65

among various sub-systems. In the case of computer buses, resource allocation is typically referred to as arbitration.

One embodiment of the invention includes several functional blocks, including at least one initiator functional block and one target functional block. Some initiator functional blocks may also function as target functional blocks. In one embodiment, the initiator functional block is coupled to an initiator interface module and the target functional block is coupled to a target interface module. The initiator functional

US 6,182,183 B1 3 4

block and the target functional block communicate to their respective interface modules and the interface modules communicate with each other. The initiator functional block communicates with the target functional block by establishing a connection, wherein a connection is a logical state in 5

which data may pass between the initiator functional block and the target functional block.

semiconductor devices to communicate to each other through a shared off-chip communications resource, such as a bus.

In one embodiment, the present invention is a pipelined communications bus with separate command, address, and data wires. Alternative embodiments include a pipelined communications bus with multiplexed address, data, and control signals. The former embodiment offers higher performance and simpler control than the latter embodiment at

One embodiment also includes a bus configured to carry multiple signals, wherein the signals include a connection identifier signal that indicates a particular connection that a data transfer between an initiator functional block and a target functional block is part of. The connection identifier includes information about the connection, such as which functional block is the source of a transmission, a priority of a transfer request, and transfer ordering information. One embodiment also includes a thread identifier, which provides

10 the expense of extra wires. The former embodiment may be more appropriate for on-chip communications, where wires are relatively less expensive and performance requirements are usually higher. The latter embodiment offers higher per-wire transfer efficiency, because it shares the same wires

15 among address and data transfers. The latter embodiment may be more appropriate for chip-to-chip communications between semiconductor devices, because package pins and board traces increase the per signal cost, while total required

a subset of the information provided by the connection identifier. In one embodiment, the thread identifier is an identifier of local scope that identifies transfers between an interface module and a connected functional block, where in 20

some embodiments, an interface module connects a functional block to a shared communications bus.

communications performance is usually lower.

FIG. 1 is a block diagram of a complex electronics system 100. Shared communications bus 112 connects sub-systems 102, 104, 106, 108, and 110. Sub-systems are typically functional blocks including a interface module for interfacing to a shared bus. Sub-systems may themselves include

The connection identifier is a an identifier of global scope that transfers information between interface modules or between functional blocks through their interface modules. Some functional blocks may require all the information provided by the connection identifier, while other functional blocks may require only the subset of information provided by the thread identifier.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of one embodiment of a complex electronics system according to the present invention.

FIG. 2 is an embodiment of an interface module.

FIG. 3 is an embodiment of an interface module.

FIG. 4 is an embodiment of a communications bus.

FIG. 5 is a timing diagram showing pipelined write transfers.

FIG. 6 is a timing diagram showing rejection of a first pipelined write transfer and a successful second write transfer

25 one or more functional blocks and may or may not include an integrated or physically separate interface module. In one embodiment, the sub-systems connected by communications bus 112 are separate integrated circuit chips. Subsystem 104 is an application specific integrated circuit

30 (ASIC) which, as is known, is an integrated circuit designed to perform a particular function. Sub-system 106 is a dynamic random access memory (DRAM). Sub-system 108 is an erasable, programmable, read only memory (EPROM). Sub-system 110 is a field programmable gate array (FPGA).

35 Sub-system 102 is a fully custom integrated circuit designed specifically to operate in system 100. Other embodiments may contain additional sub-systems of the same types as shown, or other types not shown. Other embodiments may also include fewer sub-systems than the sub-systems shown

40 in system 100. Integrated circuit 102 includes sub-systems 102A, 102B, 102C, 102D and 102E. ASIC 104 includes functional blocks lOlA, 104B and 104C. FPGA 110 includes functional blocks 110A and 110B. A functional

FIG. 7 is a timing diagram showing interleaving of 45

pipelined read and write transfers.

block may be a particular block of logic that performs a particular function. A functional block may also be a memory component on an integrated circuit.

FIG. 8 is a timing diagram showing interleaved connections to a single target.

FIG. 9 is a timing diagram showing interleaved connections from a single initiator.

FIG. 10 is a block diagram of one embodiment of part of a computer system.

FIG. 11 is one embodiment of a communications bus.

System 100 is an example of a system that may consist of one or more integrated circuits or chips. A functional block

50 may be a logic block on an integrated circuit such as, for example, functional block 102E, or a functional block may also be an integrated circuit such as fully custom integrated circuit 102 that implements a single logic function.

FIG. 12 is a block diagram of one embodiment of part of 55

a computer system.

Shared communications bus 112 provides a shared communications bus between sub-systems of system 100. Shared communication bus 114 provides a shared communications bus between sub-systems or functional blocks on a single integrated circuit. Some of the functional blocks shown are connected to interface modules through which

DETAILED DESCRIPTION

The present invention is a communications system and method for allowing multiple functional blocks or subsystems of a complex electronics system to communicate with each other through a shared communications resource, such as a shared communications bus. In one embodiment,

60 they send and receive signals to and from shared communications bus 112 or shared communications bus 114. Inter-

a communications protocol allows multiple functional block on a single semiconductor device to communicate to each 65

other. In another embodiment, the communications protocol may be used to allow multiple functional blocks on different

connect 115 is a local point-to-point interconnect for connecting interface modules to functional blocks.

Interface modules 120-127 are connected to various functional blocks as shown. In this embodiment, interface modules 120, 122, 123 and 124 are physically separated from their connected functional block (A, B, C, E and F,

US 6,182,183 B1 5

respectively). Interface modules 121, and 125-128 are essentially part of their respective functional blocks or sub-systems. Some functional blocks, such as 102D, do not require a dedicated interface module. The arrangement of sub-systems, functional blocks and interface modules is 5

flexible and is determined by the system designer.

In one embodiment there are four fundamental types of functional blocks. The four fundamental types are initiator, target, bridge, and snooping blocks. A typical target is a memory device, a typical initiator is a central processing unit 10

(CPU). A typical bridge might connect shared communications buses 112 and 114. Functional blocks all communicate with one another via shared communications bus 112 or shared communications bus 114 and the protocol of one embodiment. Initiator and target functional blocks may 15

communicate a shared communications bus through interface modules. An initiator functional block may communicate with a shared communications bus through an initiator interface module and a target functional block may communicate with a shared communications bus through a target 20

interface module.

6 lated transactions to take place. These protocols allow for higher transfer efficiencies because independent transactions may use the bus while an initiator waits for a long latency target to return data that has been previously requested by an initiator.

Address/command decode block 808 decodes an address on shared communications bus 814 to determine if a write is to be performed to registers associated with initiator functional block 816. Address/command decode block 808 also decodes incoming commands. Configuration registers 810 store bits that determine the state of module 800, including bandwidth allocation and client address base. One register 810 stores an identification (ID) which is a set of bits uniquely identifying initiator functional block 816.

FIG. 3 is a block diagram of an embodiment of a target interface module 900. Target interface module 900 is connected to shared communications bus 914 and to target functional block 918. Target interface module 900 includes clock generator 902, data flow block 906, address/command decode block 908, synchronizer 912, and state registers in state control block 916. Blocks of target interface module

An initiator interface module issues and receives read and write requests to and from functional blocks other than the one with which it is associated. In one embodiment, an initiator interface module is typically connected to a CPU, a digital signal processing (DSP) core, or a direct memory access (DMA) engine.

900 that are named similarly to blocks of initiator module 800 function in substantially the same way as explained with respect to initiator block 800. State registers and state

25 control block 916 include registers that store, for example, client address base and an identifier for target functional block 918.

FIG. 2 is a block diagram of an embodiment of an initiator interface module 800. Initiator interface module 800

30 includes clock generator 802, data flow block 806, arbitrator block 804, address/command decode block 808, configuration registers 810, and synchronizer 812. Initiator interface module 800 is connected to a shared communications bus 814 and to an initiator functional block 816. In one

35 embodiment, shared communications bus 814 is a shared communications bus that connects sub-systems, as bus 112 does in FIG. 1.

In one embodiment, an initiator functional block such as initiator functional block 816 may also act as a target functional block in that it has the capability to respond to signals from other functional blocks or sub-systems as well as to initiate actions by sending signals to other functional blocks or sub-systems.

FIG. 4 is a block diagram of a part of a computer system 1000 according to one embodiment. FIG. 4 is useful in illustrating multilevel connection identification. System 1000 includes initiator functional block 1002, which is connected to initiator interface module 1004 by interconnect Clock generator 802 is used to perform clock division

when initiator functional block 816 runs synchronously with respect to shared communications bus 814 but at a different frequencies. When initiator functional block 816 runs asynchronously with respect to communications bus 814, clock generator 802 is not used, but synchronizer 812 is used.

40 1010. Initiator interface module 1004 is connected to target interface module 1006 by shared communications bus 1012. Target interface module 1006 is connected to target functional block 1008 by an interconnect 1010. Typically, shared communications bus 1012 is analogous to shared commu-

45 nications bus 112 ofFIG.1 or to shared communications bus 114 of FIG. 1. Interconnects 1010 are typically analogous to interconnect 115 of FIG. 1 in that they connect functional blocks to interface modules and are point-to-point, rather than shared, interconnects. Interconnects 1010 are typically

Arbitrator block 804 performs arbitration for access to shared communications bus 814. In one embodiment, a multi-level arbitration scheme is used wherein arbitrator module 804 includes logic circuits that manage pre-allocated bandwidth aspects of first level arbitration and also logic that manages second level arbitration. Data flow block 806 includes data flow first-in first-out (FIFO) buffers between shared communications bus 814 and initiator functional block 816, in addition to control logic associated with managing a transaction between shared communications bus 814 and initiator functional block 816. The FIFO buffers 55

50 physically shorter than shared communications bus 1012 because of their local nature. As will be explained more fully below, system 1000 uses two different levels of connection identification depending upon the requirements of a particular functional block. "Global" connection identification

stage both the address and data bits transferred between shared communications bus 814 and initiator functional block 816. In one embodiment, shared communications bus 814 implements a memory mapped protocol. Specific details of an underlying computer bus protocol are not significant to 60

the invention, provided that the underlying computer bus protocol supports some operation concurrency. A preferred embodiment of a bus protocol for use with the present invention is one that supports retry transactions or split transactions, because these protocols provide a mechanism 65

to deliver operation concurrency by interrupting a multicycle transaction to allow transfers belonging to other unre-

information is sent on shared communications bus 1012, while "local" connection information, or thread identification information, is sent in interconnects 1010.

FIG. 5 is a block diagram of one embodiment of a shared communications bus 1012. Shared communications bus 1012 is shown connected to entities A, B, C, D and E, which may be interface modules, functional blocks, or a combination of both. Shared communications bus 1012 is composed of a set of wires. Data wires 230 provide direct, high efficiency transport of data traffic between functional blocks on shared communications bus 1012. In one embodiment, shared communications bus 1012 supports a bus protocol that is a framed, time division multiplexed, fully pipelined,

US 6,182,183 B1 7

fixed latency communication protocol using separate address, data and connection identification wires. The bus protocol supports fine grained interleaving of transfers to enable high operation concurrency, and uses retry transactions to efficiently implement read transactions from target devices with long or variable latency. Details of the arbitration method used to access shared communications bus 1012 are not required to understand the present invention. The delay from when an initiator functional block drives the command and address until the target functional block drives the response is known as the latency of shared communications bus 1012. The bus protocol supports arbitration among many initiator functional blocks and target functional blocks for access to the bus. In the embodiment shown, arbitration for access to shared communications bus 1012 is performed by an initiator interface module, such as module 1004 of FIG. 4. In other embodiments, arbitration is performed by functional blocks directly, or by a combination of interface modules and functional blocks. In one embodiment, a bus grant lasts for one pipelined bus cycle. The protocol does not forbid a single functional block from becoming a bus owner for consecutive bus cycles, but does require that the functional block successfully win arbitration on consecutive cycles to earn the right.

Shared communications bus 1012 includes separate address, data, and control wires. Other embodiments may include multiplexed address, data, and control signals that share a wire or wires. Such an embodiment would provide high per-wire transfer efficiency because wires are shared among address and data transfers. A non-multiplexed embodiment of shared communications bus 1012 may be more appropriate for communication between functional blocks on a single integrated circuit chip because wires are relatively inexpensive and performance requirements are usually higher on a single integrated circuit chip.

Clock line 220 is a global signal wire that provides a time reference signal to which all other shared communications bus 1012 signals are synchronized. Reset line 222 is a global signal wire that forces each connected functional block into a default state from which system configuration may begin. Command line 224 carries a multi-bit signal driven by an initiator bus owner. In various embodiments, the multi-bit command signal may convey various types of information. For example, a command signal may indicate a transfer type, information regarding duration of a connection, and expected initiator and target behavior during the connection. In one embodiment, the command signal includes one or more bits indicating the beginning and end of a connection. In one embodiment, for example, one bit may indicate the status of a connection. If the bit is zero, the current transfer is the final transfer in the connection. After the receipt of a zero connection status bit, the next receipt of a connection status bit that is a logic one indicates that the transfer is the first in a newly opened connection. Each subsequently received one connection status bit then indicates that the connection is still open.

Supported transfer types in this embodiment include, but are not limited to read and write transfers. Address lines 228 carry a multi-bit signal driven by an initiator bus owner to specify the address of the object to be read or written during the current transfer. Response lines 232 carry a multi-bit signal driven by a target to indicate the status of the current transfer. Supported responses include, but are not limited to the following responses. A NULL response indicates that the current transfer is to be aborted, presumably because the address does not select any target. A data valid and accepted (DVA) response indicates, in the case of a read, that the

8 target is returning requested data on data lines 230. In the case of a write, a DVA response indicates that the target is accepting the provided data from data lines 230. A BUSY response indicates that the selected target has a resource

5 conflict and cannot service the current request. In this case an initiator should reattempt the transfer again later. A RETRY response indicates that the selected target could not deliver the requested read data in time, but promises to do so at a later time. In this case an initiator must reattempt the

10 transfer at a later time. Connection identifier (CONNID) lines 226 carry a multi

bit signal driven by an initiator bus owner to indicate which connection the current transfer is part of. A connection is a logical state, established by an initiator, in which data may

15 pass between the initiator and an associated target. The CONNID typically transmits information including the identity of the functional block initiating the transfer and ordering information regarding an order in which the transfer must be processed. In one embodiment, the information

20 conveyed by the CONNID includes information regarding the priority of the transfer with respect to other transfers. In one embodiment the CONNID is a eight-bit code. An initiator interface module sends a unique CONNID along with an initial address transfer of a connection. Later trans-

25 fers associated with this connection (for example, data transfers) also provide the CONNID value so both sender and receiver (as well as any device monitoring transfers on shared communications bus 1012) can unambiguously identify transfers associated with the connection. One advantage

30 of using a CONNID is that transfers belonging to different transactions can be interleaved arbitrarily between multiple devices on a per cycle basis. In one embodiment, shared communications bus 1012 implements a fully pipelined protocol that requires strict control over transaction ordering

35 in order to guarantee proper system operation. Without the use of a CONNID, ordering constraints within a particular transaction may be violated because transfers associated with a particular connection are not identified.

Because a first command may be rejected by a BUSY 40 response while a later command is already in flight, it is

essential to provide mechanisms that allow full control over which commands complete. If such control is not present, ambiguous system behavior can result. For instance, if a single initiator interface module issues a sequence of depen-

45 dent read and write commands, a busy response to one of the commands could result in later commands returning the wrong data. One solution to such problems is to avoid overlapping dependent commands. This solution, however, increases the latency of every dependent command in order

50 to ensure proper results. The present invention uses a CONNID signal, in part, to allow overlapping of dependent commands. Therefore, use of a CONNID improves system performance and efficiency. Another benefit of the CONNID of the present invention is that communication system

55 predictability is enhanced because it allows a shared functional block to respond to requests based upon quality of service guarantees that may vary between connections. For example, data requested to operate a computer display cannot tolerate unpredictable delay because delay causes the

60 display to flicker. Therefore, the CONNID may be used to prioritize data requests from a display controller so that requests from the display controller to a common resource are serviced before other requests. The present invention also allows for flexible reconfiguration of the CONNID to

65 retune system performance. FIG. 6 is a timing diagram of a pipelined write transaction

consisting of two write transfers on shared communications

US 6,182,183 B1 9

bus 1012. Reference may also be made to FIG. 5. A single pipelined bus transfer, as shown in FIG. 6, includes an arbitration cycle (not shown), followed by a command/ address/CONNID (CMD 324/ADDR 328/CONNID 326) cycle (referred to as a request, or REQ cycle), and completed by a DATA 330/RESP 342 cycle (referred to as a response,

10 that A also reject (using BUSY) any other pipe lined transfers from the same CONNID (in this case, CONNID 1), since the initiator cannot possibly know about the resource conflict until after the REQ-RESP latency has passed. Thus, target A

5 must BUSY the WRITE ADDRE1 that is issued in cycle 3, because it has the same CONNID and was issued before the initiator could interpret the BUSY response to the first write transfer, and is thus a pipelined transfer. Furthermore, the second attempt (issued in cycle 4) of the WRITE ADD REO transfer is allowed to complete because it is not a pipelined transfer, even though it overlaps the cycle 3 WRITE ADDRE1 transfer.

Note that target A determines that the cycle 4 write is not pipelined with any earlier transfers because of when it

or RESP cycle). In one embodiment, the number of cycles between a REQ cycle and a RESP cycle is chosen at system implementation time based upon the operating frequency and module latencies to optimize system performance. The 10

REQ-RESP latency, in one embodiment, is two cycles and is labeled above the DATA 330 signal line on FIG. 6. Therefore, a complete transfer time includes four shared communications bus 1012 cycles, arbitration, request, delay and response. 15 occurs and which CONNID it presents, and not because of

either the CMD nor the ADDR values. Step 1 of the algorithm guarantees that an initiator will only issue a transfer that is the oldest non-issued, non-retired transfer

Two transfers are shown in FIG. 6. On cycle 1, initiator E drives REQ fields 340 to request a WRITE transfer to address ADD REO. This process is referred to as issuing the transfer request. In one embodiment, a single target is selected to receive the write data by decoding an external 20

address portion of ADDREO. On cycle 3 (a REQ-RESP latency later), initiator E drives write data DATAEO on the DATA wires; simultaneously, the selected target A drives RESP wires 342 with the DVA code, indicating that A accepts the write data. By the end of cycle 3, target A has 25

acquired the write data, and initiator E detects that target A was able to accept the write data; and the transfer has thus completed successfully.

within a given connection. Thus, once the first WRITE ADDREOO receives the BUSY response in cycle 3, it is no longer issued, and so it becomes the only CONNID=l transfer eligible for issue. It is therefore impossible for a properly operating initiator to issue a pipelined transfer in cycle 4, given that an initial cycle 1 transfer received a BUSY response and the REQ-RESP latency is two cycles.

One embodiment of the initiator maintains a time-ordered queue consisting of the desired transfers within a given CONNID. Each transfer is marked as non-issued and non-retired as they are entered into the queue. It is further marked Meanwhile (i.e. still in cycle 3), initiator E issues a

pipelined WRITE transfer (address ADDRE1) to target A The write data and target response for this transfer both occur on cycle 5, where the transfer completes successfully. Proper operation of many systems and sub-systems rely on the proper ordering of related transfers. Thus, proper system operation may require that the cycle 3 WRITE complete after the cycle 1 WRITE transfer. In FIG. 6, the CONNID field conveys crucial information about the origin of the transfer that can be used to enforce proper ordering. A preferred embodiment of ordering restrictions is that the initiator and target collaborate to ensure proper ordering, even during pipelined transfers. This is important, because transfer pipelining reduces the total latency of a set of transfers (perhaps a single transaction), thus improving system performance (by reducing latency and increasing usable bandwidth).

According to the algorithm of one embodiment:

1. An initiator may issue a transfer Y: a) if transfer Y is the oldest, non-Issued, non-retired

transfer among the set of transfer requests it has with matching CONNID, or

b) if all of the older non-retired transfers with matching CONNID are currently issued to the same target as transfer Y. If issued under this provision, transfer Y is considered pipelined with the older non-retired transfers.

2. A target that responds to a transfer X in such a way that the initiator might not retire the transfer must respond BUSY to all later transfers with the same CONNID as transfer X that are pipelined with X.

Note that an older transfer Y that is issued after a newer transfer X with matching CONNID is not considered pipelined with X, even if Y Issues before X completes. This situation is illustrated in FIG. 7. If target A has a resource conflict that temporarily prevents it from accepting DATAEO associated with the WRITE ADD REO from cycle 1, then A responds BUSY. Step 2 of the foregoing algorithm requires

30 as pipelined if the immediately older entry in the queue is non-retired and addresses the same target; otherwise, the new transfer is marked non-pipelined. Each time a transfer issues it is marked as issued. When a transfer completes (i.e., when the RESP cycle is finished) the transfer is marked

35 non-issued. If the transfer completes successfully, it is marked as retired and may be deleted from the queue. If the transfer does not complete successfully, it will typically be re-attempted, and thus can go back into arbitration for re-issue. If the transfer does not complete successfully, and

40 it will not be re-attempted, then it should not be marked as retired until the next transfer, if it exists, is not marked as issued. This restriction prevents the initiator logic from issuing out of order. As the oldest non-Retired transfer issues, it is marked as issued. This allows the second-oldest

45 non-retired transfer to arbitrate to issue until the older transfer completes (and is thus marked as non-issued), if it is marked as pipelined.

An embodiment of the target implementation maintains a time-ordered queue whose depth matches the REQ-RESP

50 latency. The queue operates off of the bus clock, and the oldest entry in the queue is retired on each bus cycle; simultaneously, a new entry is added to the queue on each bus cycle. The CONNID from the current REQ phase is copied into the new queue entry. In addition, if the current

55 REQ phase contains a valid transfer that selects the target (via the External Address), then "first" and "busy" fields in the new queue entry may be set; otherwise, the first and busy bits are cleared. The first bit will be set if the current transfer will receive a BUSY response (due to a resource conflict)

60 and no earlier transfer in the queue has the same CONNID and has its first bit set. The first bit implies that the current transfer is the first of a set of potentially-pipelined transfers that will need to be BUSY' d to enforce ordering. The busy bit is set if either the target has a resource conflict or one of

65 the earlier transfers in the queue has the same CONNID and has the first bit set. This logic enforces the REQ-RESP pipeline latency, ensuring that the target accepts no pipelined

US 6,182,183 B1 11

transfers until the initiator can react to the BUSY response to the transfer marked first.

Application of the algorithm to the initiators and targets in the communication system provides the ability to pipeline transfers (which increases per-connection bandwidth and 5

reduces total transaction latency) while maintaining transaction ordering. The algorithm therefore facilitates high per-connection performance. The fundamental interleaved structure of the pipelined bus allows for high system performance, because multiple logical transactions may 10

overlap one another, thus allowing sustained system bandwidth that exceeds the peak per-connection bandwidths. For instance, FIG. 8 demonstrates a system configuration in which initiator E needs to transfer data to target A on every other bus cycle, while initiator D requests data from target 15

B on every other bus cycle. Since the communication system supports fine interleaving (per bus cycle), the transactions are composed of individual transfers that issue at the natural data rate of the functional blocks; this reduces buffering requirements in the functional blocks, and thus reduces 20

system cost. The total system bandwidth in this example is twice the peak bandwidth of any of the functional blocks and thus high system performance is realized. '

The present invention adds additional system-level ill_lprovements in the area of efficiency and predictability. 25

F1rst, the connection identifier allows the target to be selective in which requests it must reject to preserve in-order operation. The system only need guarantee ordering among transfers with the same CONNID, so the target must reject (using BUSY) only pipelined transfers. This means that the 30

target may accept transfers presented with other CONNID values even while rejecting a particular CONNID. This situation is presented in FIG. 9, which adds an interleaved read transfer from initiator D to the pipelined write transfer of FIG. 7. All four transfers in FIG. 9 select target A, and A 35

has a resource conflict that prevents successful completion of the WRITE ADDREO that issues in cycle 1. While the rejection of the first write prevents A from accepting any other transfers from CONNID 1 until cycle 4, A may accept the unrelated READ ADDRDO request of cycle 2 if A has 40

sufficient resources. Thus, overall system efficiency is increased, since fewer bus cycles are wasted (as would be the case if target A could not distinguish between connections).

Second, in one embodiment the connection identifier 45

allows the target to choose which requests it rejects. The target may associate meanings such as transfer priority to the CONNID values, and therefore decide which requests to act upon based upon a combination of the CONNID value and the internal state of the target. For instance, a target might 50

have separate queues for storing transfer requests of different priorities. Referring to FIG. 9, the target might have a queue for low priority requests (which present with an odd CONNID) and a queue for high priority requests (which present with an even CONNID). Thus, the CONNID 1 55

WRITE ADDREO request of cycle 1 would be rejected if the low-priority queue were full, whereas the CONNID 2 READ ADDRDO transfer could be completed successfully based upon available high-priority queue resources. Such ?ifferences in transfer priorities are very common in highly- 60

mtegrated electronic systems, and the ability for the target to deliver higher quality of service to higher priority transfer requests adds significantly to the overall predictability of the system.

As FIG. 9 implies, the algorithm described above allows 65

a target to actively satisfy transfer requests from multiple CONNID values at the same time. Thus, there may be

12 multiple logical transactions in flight to and/or from the same target, provided that they have separate CONNID values. Thus, the present invention supports multiple connections per target functional block.

A~ditionally, a? initiator may require the ability to present multiple transactions to the communications system at the same time. Such a capability is very useful for initiator such as direct memory access (DMA) devices, which transfer data between two targets. In such an application, the DMA initiator would present a read transaction using a first CONNID to a first target that is the source of the data, and furthermore present a write transaction using a second CONNID to a second target that is the data destination. At the transfer level, the read and write transfers could be interleaved. This reduces the amount of data storage in the DMA initiator, thus reducing system cost. Such an arrangement is shown in FIG. 10, where initiator E interleaves pipelined read transfers from target A with pipelined write transfers to target B. Thus, the present invention supports multiple connections per initiator functional block.

The control structures required to support implementation of the present invention, as described above with respect to the algorithm, are simple and require much less area than the data buffering area associated with traditional protocols that do not provide efficient fine interleaving of transfers. Thus, the present invention minimizes communication system area and complexity, while delivering high performance and flexibility.

Finally, the CONNID values that are associated with particular initiator transactions should typically be chosen to provide useful information such as transfer priorities but also to minimize implementation cost. It is useful to choose the specific CONNID values at system design time, so the values can be guaranteed to be unique and can be ordered to simplify comparison and other operations. Furthermore, it is frequently useful to be able to change the CONNID values during operation of the communications system so as to alter the performance and predictability aspects of the system. Preferred implementations of the present invention enable flexible system configuration by storing the CONNID values in ROM or RAM resources of the functional blocks, so they may be readily re-configured at either system build time or system run time.

FIG. 11 shows an interconnect 1010, which is a point-topoint interconnect as shown in FIG. 4. Interconnect 1010 includes additional signals as compared to the protocol described with reference to FIG. 5. As will be explained below, some of the additional signals are particularly useful as signals sent over point-to-point interconnects such as interconnects 1010. The protocol of interconnect 1010 controls point-to-point transfers between a master entity 1102 and a slave entity 1104 over a dedicated (non-shared) interconnect. Referring to FIG. 4, a master entity may be, for example, initiator functional block 1002 or target interface module 1006. A slave entity may be, for example, initiator interface module 1004 or target functional block 1008.

Signals shown in FIG. 11 are labeled with signal names. In addition, some signal names are followed by a notation or notations in parentheses or brackets. The notations are as follows:

(I) The signal is optional and is independently configurable

(A) The signal must be configured together with signals having similar notations

(Al) The signal is independently configurable if (A) interface modules exist

[ #] Maximum signal width

US 6,182,183 B1 13

The clock signal is the clock of a connected functional block. The command (Cmd) signal indicates the type of transfer on the interconnect. Commands can be issued independent of data. The address (Addr) signal is typically

14 Request Connection Identifier (ReqConniD) provides the

CONNID associated with the current transaction intended for the slave. CONNIDs provide a mechanism by which a system entity may associate particular transactions with the

an indication of a particular resource that an initiator functional block wishes to access. Request Accept (ReqAccept)

5 system entity. One use of the CONNID is in establishing request priority among various initiators. Another use is in associating actions or data transfers with initiator identity rather than the address presented with the transaction

is a handshake signal whereby slave 1104 allows master 1102 to release Cmd, Addr and DataOut from one transfer and reuse them for another transfer. If slave 1104 is busy and cannot participate in a requested transfer, master 1102 must

10 continue to present Cmd, Addr and Data Out. DataOut is data sent from a master to a slave, typically in a write transfer. Datain typically carries read data.

Response (Resp) and Datain are signals sent from slave 1104 to master 1102. Resp indicates that a transfer request that was received by slave 1104 has been serviced. Response 15

accept (RespAccept) is a handshake signal used to indicate whether the master allows the slave to release Resp and Dataln.

Signals Clock, Cmd, Addr, DataOut, ReqAccept, Resp, Datain, and RespAccept, in one embodiment, make up a 20

basic set of interface module signals. For some functional blocks, the basic set may be adequate for communication purposes.

request. The embodiment of FIG. 11 provides end-to-end connec-

tion identification with CONNID as well as point-to-point, or more local identification with Thread ID. A Thread ID is an identifier of local scope that simply identifies transfers between the interface module and its connected functional block. In contrast, the CONNID is an identifier of global scope that identifies transfers between two interface modules (and, if required, their connected functional blocks).

A Thread ID should be small enough to directly index tables within the connected interface module and functional block. In contrast, there are usually more CONNIDs in a system than any one interface module is prepared to simultaneously accept. Using a CONNID in place of a Thread ID requires expensive matching logic in the interface module to associate a returned CONNID with specific requests or In other embodiments, some or all of the remaining

signals of bus 1012 may be used. In one embodiment, Width is a three-bit signal that indicates a width of a transfer and is useful in a connection that includes transfers of variable width. Burst is a multibit signal that allow individual commands to be associated within a connection. Burst provides an indication of the nature of future transfers, such as how many there will be and any address patterns to be expected. Burst has a standard end marker. Some bits of the Burst field are reserved for user-defined fields, so that a connection may

25 buffer entries. Using a networking analogy, the Thread ID is a level-2

(data link layer) concept, whereas the CONNID is more like a level-3 (transport/session layer) concept. Some functional blocks only operate at level-2, so it is undesirable to burden

be ignorant of some specific protocol details within a connection.

30 the functional block or its interface module with the expense of dealing with level-3 resources. Alternatively, some functional blocks need the features of level-3 connections, so in this case it is practical to pass the CONNID through to the functional block.

35

Interrupt and error signals are an important part of most computer systems. Interrupt and error signals generated by initiator or target functional blocks are shown, but the description of their functionality is dependent upon the nature of a particular functional block and is not important 40

to understanding the invention.

Referring to FIG. 4, a CONNID is required to be unique when transferred between interface modules 1004 and 1006 on shared communications bus 1012. The CONNID may be sent over a local interconnect, such as interconnect 1010. In many cases, however, it is much more efficient to use only Thread ID between a functional block and its interface module. For example initiator functional block 1002 may not require all the information provided by the CONNID. Also, in some systems, multiple identical initiator functional blocks 1002 may exist with the same CONNID so that a particular target functional block 1008 receiving a transfer will not know which connection it is actually part of unless logic in initiator interface module 1004 translates the "local" CONNID to a unique "global" CONNID. The design and implementation of such a translation functionality in an interface module is complicated and expensive. In such cases, the CONNID may be sent between interface modules over shared communications bus 1012 while the Thread ID is sent between a functional block and an interface module.

In the case of an initiator functional block, a one-to-one

Request Thread Identifier (ReqThreadiD), in one embodiment, is a four-bit signal that provides the thread number associated with a current transaction intended for slave 1104. All commands executed with a particular thread 45

ID must execute in order with respect to one another, but they may execute out of order with respect to commands from other threads. Response Thread Identifier (RespThreadiD) provides a thread number associated with a current response. Because responses in a thread may return 50

out of order with respect to other threads, RespThreadiD is necessary to identify which thread's command is being responded to. In one embodiment, ReqThreadiD and RespThreadiD are optional signals, but if one is used, both must be used. 55 static correspondence may exist between Thread ID and

CONNID. For example if the Thread ID is "1", a single CONNID is mapped for a particular interface module, solving the problem of multiple, identical functional blocks.

Request Thread Busy (ReqThreadBusy) allows the slave to indicate to the master that it cannot take any new requests associated with certain threads. In one embodiment, the ReqThreadBusy signal is a vector having one signal per thread, and a signal asserted indicates that the associated 60

thread is busy. Response Thread Busy (RespThreadBusy) allows the

master to indicate to the slave that it cannot take any responses (e.g., on reads) associated with certain threads. The RespThreadBusy signal is a vector having one signal 65

per thread, and a signal asserted indicates that the associated thread is busy.

In the case of a target functional block, there is a one-toone dynamic correspondence between a Thread ID and a CONNID. If a target functional block supports two simultaneous threads, the target interface module acquires the CONNID of an open connection and associates it with a thread as needed. For example, a target interface module receives a CONNID of "7", and then maps CONNID 7 to thread "0". Thereafter, all transfers with CONNID 7 are associated with thread 0 until connection 7 is closed.

US 6,182,183 B1 15

Referring to FIG. 12, an example of a use of Thread ID, consider a series of identical direct memory access (DMA) engines in a system. In FIG. 12, elements 1202 are identical DMA engines, each connected to an initiator interface module 1204. Initiator interface modules 1204 are con- 5

nected to shared communications bus 1212. Target interface module 1206 is also connected to shared communications bus 1212 and transmits data from bus 1212 to DRAM controller 1208, which is a target functional block. Target interface module 1206 is connected to DRAM controller 10

1208 by interconnect 1214. DRAM controller 1208 controls access to DRAM 1213.

A DMA a engine is an example of an initiator functional block that also functions as a target functional block. When the DMA engine is programmed by software, it acts as a 15

target. Thereafter, the DMAengine is an initiator. Because a DMA engine performs both read and write operations, two connections can be associated with a single DMA engine. If some buffering is available in the DMA engine, read and write operations may be decoupled so that both types of 20

operations can be performed concurrently. A read may occur from a long latency storage device which requires the read data to be buffered on the DMA engine before a write operation writes the data. In one embodiment, each of DMA engines 1202 uses a Thread ID to identify the read stream 25

and a different Thread ID to identify the write stream. The DMA engine does not require more information, such as what other functional block participates in a transaction. Therefore, a CONNID is not required to be sent from the DMA engine 1202 to a connected interface module 1204. 30

Mapping of a Thread ID to a CONNID occurs in the interface module 1204.

In one embodiment, each initiator interface module 1204 maps a unique CONNID to each of two Thread IDs from a connected DMA engine 1202. Each of DMA engines 1202 35

use a single bit, for example, Thread ID of FIG. 11, to distinguish between its two threads. For each transfer over shared communications bus a unique CONNID is sent to target interface module 1206. The CONNID may include priority information, for example, assigning high priority to 40

requests for graphics data. The high priority graphics data request is immediately serviced by DRAM controller 1208 while lower priority request may be required to wait.

Because intelligence is designed into the interface modules and the communications protocols, less intelligence is 45

required of the functional block such as the DRAM controller 1208 and the DMA engines 1202. This has the advantage of making functional blocks more portable or reusable as systems evolve. For example, a DMA engine used for a high priority application may be switched with a 50

DMA engine used for a lower priority application simply by changing their respective connected interface modules.

In one embodiment, target and initiator interface modules are programmed at the transistor level so that their precise function, including their CONNID assignment, is fixed at 55

power-up. In another embodiment, the design of interface modules is in RAM so that the interface module is a reprogrammable resource. In this case, the interface module is reprogrammed, including reassignment of CONNIDs, by software. 60

The present invention has been described in terms of specific embodiments. For example, embodiments of the present invention have been shown as systems of particular configurations, including communications buses using particular protocols. One of ordinary skill in the art will 65

recognize that modifications may be made without departing from the spirit and scope of the invention as set forth in the

16 claims. For example, the present may be used in systems employing shared communications structures other than buses, such as rings, cross-bars, or meshes.

What is claimed is: 1. A communication system comprising: at least two functional blocks, wherein an initiator func

tional block of the at least two functional blocks sends transfer requests to a target functional block of the at least two functional blocks, said target functional block responding to the transfer requests, by establishing a connection, wherein a comection is a logical state in which data may pass between the first functional block and the second functional block;

a communication medium configured to carry a plurality of signals, wherein the plurality of signals comprises a connection identifier that identifies a particular connection that a data transfer is part of;

an initiator interface module coupled to the initiator functional block and to the communication medium to transfer data between the initiator functional block and the communication medium, said initiator interface module mapping the connection identifier to a thread identifier that indicates a transaction stream the data the data transfer is part of, the thread identifier communicated between the initiator interface module and initiator functional block;

a target interface module coupled to the target functional block and to the communication medium to transfer data between the target functional block and the communication medium, said target interface module raping the connection identifier to a thread identifier that indicates a transaction stream the data the data transfer is part of, the thread identifier communicated between the target interface module and target functional block;

the connection identifier sent with a transfer request from the initiator interface module to the target interface module and sent with data transfers between the target interface module and the initiator interface module.

2. The communication of claim 1, further comprising at least one bus, each bus coupling an interface module to its associated functional block, the bus comprising a plurality of signal lines, wherein the thread identifier is communicated across at least one of the plurality of signal lines.

3. The communication system of claim 1, wherein the thread identifier is sent from the target interface module to the target functional block and from the initiator interface module to the initiator functional block.

4. A communication system comprising: at least two functional blocks, wherein a first functional

block communicates with a second functional block by establishing a connection, wherein a connection is a logical state in which data may pass between the first functional block and the second functional block; and

a communication medium configured to carry a plurality of signals between interface modules;

an initiator functional block configured to send transfer requests;

an initiator interface module coupled to the initiator functional block and to the communication medium;

a target functional block that responds to transfer requests;

a target interface module coupled to the target functional block and to the communication medium;

a connection identifier configured to be sent with a transfer request from the initiator interface module to

US 6,182,183 B1 17

the target interface module, the connection identifier comprising a multi-bit value that encodes information including a transfer priority, a transfer order, and a functional block that originated the transfer, the connection identifier is one of a plurality of connection 5 identifiers associated with the initiator functional block and is mapped to a thread identifier by the initiator interface module.

5. The communication system of claim 4, wherein the connection identifier is one of a plurality of connection identifiers associated with a target functional block that

10

supports simultaneous connections, and wherein the target functional block acquires a connection identifier of an open connection and maps the connection identifier to a thread identifier.

18 allowing an initiator functional block to issue a first

transfer "Y" if the transfer "Y" is an oldest, non-issued, non-retired transfer among a set of transfer requests with a same connection identifier as the transfer "Y"; and

the target functional block giving a BUSY response to every later transfer that is pipelined with a transfer "X" and has a same connection identifier as the transfer "X" if the target functional block gives a busy response to the transfer "X" so that an initiator initiating the transfer"X" may not retire the transfer "X";

wherein a transfer "Y" that is issued after a transfer "X" is older than the transfer "X", and has a same connection identifier as the transfer "X" is considered not pipelined with the transaction "X".

10. The method of claim 9, further comprising the step of allowing the initiator functional block to issue the transfer "Y" if every non-retired transfer with the same connection identifier is older than the transfer "Y" and is currently

6. The communication system as set forth in claim 4, 15

wherein the thread identifier is configured to be sent from the target interface module to the target functional block and from the initiator interface module to the initiator functional block, and the connection identifier is configured to be sent from the target interface module to the target functional block and from the initiator interface module to the initiator functional block.

20 issued to a same target functional block as the transfer "Y". 11. The method of claim 9, wherein if the transfer "Y" is

issued, the transfer "Y" is considered pipelined with the older, non-retired transfers. 7. The method as set forth in claim 4, further comprising

a thread identifier configured to be communicated across the communication medium, the thread identifier indicating that a transaction stream that the data transfer is part of;

wherein the connection identifier is mapped from a thread identifier by the initiator interface module.

12. The method of claim 11, wherein a target functional 25 block determines whether a transfer is a pipelined transfer

based upon when the transfer occurs and upon a connection identifier associated with the transfer.

8. A communication system comprising: 30 at least two functional blocks, wherein a first functional

block communicates with a second functional block by establishing a connection, wherein a connection is a logical state in which data may pass between the first functional block and the second functional block; and 35

a communication medium coupled to interface modules and configured to carry a first plurality of signals between modules;

at least one bus, each bus coupling an interface module to

13. The method of claim 9, further comprising the steps of:

an initiator functional block maintaining a time-ordered queue of desired transfers with a same connection identifier;

the initiating functional block marking a transfer as nonissued and non-retired as it is entered into the queue.


if a next oldest entry is non-retired and addresses a same target functional block, marking the transfer as pipelined; else

marking the transfer as non-pipelined. 15. The method of claim 14, further comprising the step

of, when a transfer issues, marking the transfer as issued. 16. The method of claim 15, further comprising the step

of, when a transfer is completed, marking the transfer as non-issued.

17. The method of claim 16, further comprising the step of, if the transfer is successfully completed, marking the transfer as retired;-and deleting the transfer from the queue.

18. The method of claim 17, further comprising the step of, if the transfer is not successfully completed, re-attempting the transfer.

its associated functional block, the bus comprising a 40

plurality of signal lines, wherein the plurality of signal lines comprises a thread identifier (ID) that indicates a transaction stream that the data transfer is part of, a request thread ID signal that indicates a thread number associated with a current transaction intended for a 45

target functional block, a response thread ID signal that indicates a thread that a transfer from the target functional block is part of, a request thread busy signal that indicates to an initiator functional block that the target functional block cannot receive new requests associ- 50

ated with certain threads, and a response thread busy signal that indicates that the initiator functional block cannot receive any new responses from the target functional block that are associated with certain threads.

19. The method of claim 9, further comprising the step of the target functional block maintaining a time-ordered queue

55 having a depth that is a number of bus clock cycles between a request for a transfer and a response to the request. 9. A method for communicating between a plurality of

functional blocks in a computer system, the method comprising the steps of:

establishing a plurality of connection identifiers, wherein each connection identifier associates a particular data 60

transfer with a particular connection, wherein a connection is a logical state in which data may pass between an initiator functional block of a plurality of functional blocks and a target functional block of the plurality of functional blocks, and wherein a connec- 65

tion is established when a particular data transfer is initiated;


on each cycle of the bus clock, retiring an oldest entry in the time-ordered queue; and

on each cycle of the bus clock, adding a new entry to the time-ordered queue, including a connection identifier associated with a current request for a transfer.


if a current request for a transfer contain s a valid transfer that selects the target functional block, allowing a

US 6,182,183 B1 19

FIRST bit and a BUSY bit of an entry in the timeordered queue to be set, wherein a set FIRST bit implies that an associated transfer is a first transfer of a set of

20 24. The method of claim 23, further comprising the step

of using a connection identifier to enforce ordering among transfers.

potentially pipelined transfers; else clearing the FIRST bit and the BUSY bit.

25. The method of claim 24, further comprising the step 5 of:

22. The method of claim 21, further comprising the step of setting the FIRST bit if:

no transfer in the time-ordered queue is earlier than a current transfer, has a same connection identifier as the current transfer and has an associated FIRST bit set; 10

and

the current transfer will receive a BUSY response due to a resource conflict.

23. The method of claim 21, further comprising the step 15

of setting the BUSY bit if:

the target functional block has a resource conflict; or

an earlier transfer in the time-ordered queue has an associated FIRST bit set and has a same connection identifier as a current transfer.

in response to a first request for a data transfer issued in a first bus cycle, the target functional block setting a BUSY bit in a first time-ordered queue entry, wherein a first connection identifier is associated with the first request; and

in response to a second request for a data transfer in a next bus cycle subsequent to the first bus cycle, the target functional block clearing a BUSY bit in a second time-ordered queue entry and performing an action in connection with executing the data transfer requested in the second request.

* * * * *

UNITED STATES PATENT AND TRADEMARK OFFICE

CERTIFICATE OF CORRECTION

PATENT NO. : 6,182,183 B1 DATED :January 30,2001 INVENTOR(S) : Wingard et al.

Page 1 of 1

It is certified that error appears in the above-identified patent and that said Letters Patent is hereby corrected as shown below:

Column 13, Line 18, delete "Data1n", insert-- Dataln --.

Column 16, Line 11, delete "comection", insert-- connection--. Line 31, delete "raping", insert-- mapping--.

Column 17, Line 23, delete "method", insert-- communication system--. Lines 23-24, delete "further comprising a", insert-- wherein the--. Line 24, insert -- is -- before configured. Line 25, delete "that". Line 26, delete "the" and insert-- a--.

Column 18, Line 19, before "and", delete"-".

Signed and Sealed this

Twenty-sixth Day of November, 2002

Attest:

JAMES E. ROGAN Attesting Officer Director of the United States Patent and Trademark Office

Exhibit B

c12) United States Patent Chou et al.

(54) METHOD AND APPARATUS FOR CONFIGURABLE ADDRESS MAPPING AND PROTECTION ARCHITECTURE AND HARDWARE FOR ON-CHIP SYSTEMS

(75) Inventors: Chien-Chun Chou, San Jose, CA (US); Jay Scott Tomlinson, San Jose, CA (US); Wolf-Dietrich Weber, San Jose, CA (US); Drew Eric Wingard, San Carlos, CA (US); Sricharan Kasetti, Palo Alto, CA (US)


( *) Notice: Subject to any disclaimer, the term of this patent is extended or adjusted under 35 U.S.C. 154(b) by 0 days.

(21) Appl. No.: 10/288,973

(22) Filed: Nov. 5, 2002

(65) Prior Publication Data

US 2004/0088566 Al May 6, 2004

(51) Int. Cl. G06F 17150 (2006.01) G06F 17/00 (2006.01)

(52) U.S. Cl. ............................................... 716/1; 716/8 (58) Field of Classification Search .................... 716/1,

716/7, 8, 14, 16; 707/104.1 See application file for complete search history.

DataW1dth: 16 AddrWidth 20 NumSegments: 2 NumRegions: 5 NumProtectionKeys· 2

I I I I I

; ~,:: __ _,;,--"=====m:l

c~_l_l --=~=== Se~n..,n1S1z~ SegmniBa>e

(RO) (RW)

I I I I I I I I

"EGIOfl(O)

REGION( II

REGI0N(2J

111111 1111111111111111111111111111111111111111111111111111111111111 US007266786B2

(10) Patent No.: (45) Date of Patent:

US 7 ,266, 786 B2 Sep.4,2007



4,665,395 A 5/1987 Van Ness 5,948,089 A 9/1999 Wingard et a!. 6,006,022 A * 12/1999 Rhim eta!. .................... 716/1 6,023,565 A * 212000 Lawman eta!. ............... 716/1 6,182,183 B1 1/2001 Wingard et a!. 6,209,123 B1 * 3/2001 Maziasz et a!. ............... 716/14 6,330,225 B1 12/2001 Weber eta!. 6,367,058 B1 * 4/2002 Heile ............................ 716/7 6,516,456 B1 * 2/2003 Garnett et a!. ................. 716/8 6,543,043 B1 * 4/2003 Wang eta!. .................. 716/14 6,725,313 B1 4/2004 Wingard et a!.

2003/0200520 A1 * 10/2003 Huggins et a!. .............. 716/16 2004/0177186 A1 9/2004 Wingard et a!.

* cited by examiner

Primary Examiner-Sam Rimell (74) Attorney, Agent, or Firm-Blakely, Sokoloff, Taylor & Zafman LLP

(57) ABSTRACT

A method and apparatus of a configurable address mapping and protection architecture and hardware for on-chip systems have been described.


2 l. 2 I

' ; ,~,~~-----

l~--~---x.·.;~~--;~.J~~~~:~~~] o I ~-~~~~-------

3 region registers for segment 0

Reg,on3J>e IR\'J)

SEG~i[Nfll) L_ ____________________________ ~

seomeniS1ze Segn1eniB~s~ iRO) iRW)

I I \ I I \ I I I I I \ \ I \ I \

' ' J & 8 7 6 5 0 ~ 5 5

REGICN(4) l---~--1·_;_~~_,-:~ ... r~:~~:~~:J 0 1 1 1 ~~~~~~-• 40"0001 1

R-'IS RPTID RPKRN Reg•on Reg•oo5•ze Re>J•onBase (RO) (NP,I (RW) ~~~~l)e (RO) (R;'I)

2 region registers for segment 1

U.S. Patent Sep.4,2007

Server 104-1

Network

Client 108-1

FIG. 1

Sheet 1 of 7

• • •

102

• • •

US 7 ,266, 786 B2

~100

Server 104-S

Client 108-C

U.S. Patent Sep.4,2007 Sheet 2 of 7 US 7 ,266, 786 B2

y-200

CPU ROM RAM Storage 204 206 208 210

202

I Display ~I I Audio ~1 Keyboard Pointer ;I Mise ~1 I

Comm 220 222 224 226 228 230

229 232

FIG. 2

U.S. Patent

Service Module 0

318-0

Sep.4,2007

Processing Unit 1

302-1

Sheet 3 of 7

0 0 0 0

US 7,266,786 B2

Processing Unit N

302-N

r·-- --·------------;

Request . · ; i:lesii'ii'ahori: : ·proieciion· 1 1 j L.'!£1~~~s_.l L._!~ ___ l j . 304.

~-------3~---1--------------~----=L~·=-~·-·-·-·-·-----·-·1 Address Mapping and Protection Module

Protection Key \ Map i

I

Address Map

308 310 i '----------'----- ___I 306

Service Module 1

318-1

Request-Delivery Module

· Request 316 I L -·-·-·-·-·-·-·-

Service Module 2

318-2

FIG. 3

0 0 0 0

314

Service Module M

318-M

U.S. Patent Sep.4,2007 Sheet 4 of 7

Configurable Address Mapping and Protection Architecture:

1

. Definition of the address

1 map .

: - Definition of the protection I • I key map ! - Definition of the set of I configuration parameters I - Definition of the l

! i 1 specification language

I 4231

c-=___ _ __::j ~-~~· 1 Minimum '

Specification of the I,

I, address mapping and ,

protection module I ·

l, 4~?J ----------------fc-=::=;~;~ l Netlist I ' 427' l.. , .. ----------------/

Product Designer Starts

401

+ (1) Understand the Address Mapping and Protection

· ..... Architecture Options

403

... (2) Decide the configuration of the address mapping and

I // protection module with the minimum HW based on product requirements

405

r (3) Create the design specification using the language provided by the architecture

407

(4) Post process the "minimum specification" and generate the netlist for the optimized address mapping and protection hardware module

409

Other processing for full chip

411

FIG. 4

I !

US 7,266,786 B2

Product Requirements and Use Models:

-The number of service modules - The address width and data width of each service module -The number of different address regions -The type of an address region: (a) never change, (b) invisible to SW, (c) chang able by SW

443

I Se

gm

en

t A

dd

ress

I

501a

/ S

eg

me

nt

Re

gis

ters

S

EG

ME

NT

(O)

L',

'' I~--

--~

SE

GM

EN

T(1

) I

,,,,,,,,~,,,,,,,,,,,~

I

SE

GM

EN

T(1

) fi

; I

I

503

Ma

tch

ed

S

eg

me

nt

10

509

Ad

dre

ss t

ag o

f a

Re

qu

est

50

1

Re

gio

nA

da

-re

ss

------

-r. --

off

se

t In

Sid

e a

Re

gio

n

501R

I

501c

~

5T

a

505b

505c

Se

gm

en

t R

eg

iste

r O

's R

eg

ion

Re

gis

ters

RE

GIO

N(O

) I

I!'}''

>>I

I I

I ~

RE

GIO

N(1

) I

I,,;

.· I

·I

I

RE

GIO

N(2

) [

I '''I

-~

I I

RE

GIO

N(3

) I

L ''

' ' '

'I

I I

RE

GIO

N(4

) I

I :1

I I

I

I l

~ t

i i

l ~

I

Pro

tect

ion

ID T

ag

of

a 51

5 R

eq

ue

st

r -.,-

-----'

Pro

tect

ion

Re

gis

ters

51

7

L_

__

__

__

_ __

j P

RO

TE

CT

ION

KE

Y(O

)

L_

__

__

__

__

___l

P

RO

TE

CT

ION

KE

Y(1

)

c---

~

Ma

tch

y I

Pro

tect

ion

y K

eyiD

51

9

Se

curi

ty

521

Ch

eck

Oka

y

511

Sin

gle

S

erv

ice

Mo

du

le

Ma

tch

51

3 In

form

ati

on

FIG

. 5

e • 00

• ~

~ ~

~ =

~

rFJ

('D

'?

~ ....

N

0 0 -....l

rFJ = ('D ('

D .....

Ul

0 .....

-....l d rJl

-....l

'N

0'1

0'1 ~

00

0'

1 =

N

Da

taW

idth

: 16

A

dd

rWid

th:

20

Nu

mS

eg

me

nts

: 2

Nu

mR

eg

ion

s: 5

N

um

Pro

tect

ion

Ke

ys:

2 I I I I I I

SE

GM

EN

T(O

)

i: ·~

l-_-~:_-

1 Ao'~

OOO Seym~n1Stze

Se

gm

en

tBa

se

\ (R

O)

(RW

) \ \ \

SE

GM

EN

T( I)

L 1

1 1

0 9

6 5

0

!-~:-~

4

o0

00

1

-

-----

I S

eg

me

ntS

1ze

S

eg

rn;;

n1

Ba

se

\

(RO

) (R

W)

I \ I \

I I I I I I I \ \ I \

I I I I

FIG

. 6

I I \ I \ I I \

RE

GIO

N(O

)

RE

GlO

N(l

)

RE

G10

N(2

1

Reg

1onD

.:Jl

a\fv

·jth

0 fe

r 1

b)'

l;,

1 to

r 2

byl

t:s

2 fo

r 4

b)'

les.

3 f

o1

6 bJ

•te::.

.

2 2

8 7

L---~--1

·.~-~~i_

;_·_r::~

~:~~::: I

0 I

I H

AS

R

eg1o

nDat

a

(RO

I ~:~~

RP

TID

(N

A)

RP

KR

N

(RW

)

r:eg

1on

Ena

ble

(RW

)

Reg

1cnS

1ze

(RO

)

2 1

D

9 1

1 6

5

g B

b"O

OO

OC

00

l

Re

g1

cnB

ase

(R

W)

22

2

2 2

2 2

1

11

8

7

6 5

4 3

09

6

5

l---~-_-

x.:~.~-~

-,._I .. :

~-~::.:1

,

1 0

r-~~~--

Wim

4b'O

OO

I 1

RA

S R

eg

mn

Da

ta

(RO

) W

1d;h

(N

A)

2 2

8 7

RP

TlD

(N

A)

RP

KR

N

(RW

)

Reg

1on

En

ab

le

(RW

)

L---~-_-

x_·~b-'O

_._L:~~:

c~::: I

0 I

I R

AS

Re

g1or

~ata

(RO

) W

1dlh

(N

A)

RP

TID

(N

A)

RP

KR

N

(RW

)

Reg

ion

En

ab

le

(RW

)

Reg

1onS

1ze

(RO

)

Reg

1unS

1ze

(RO

)

3 re

gion

reg

iste

rs f

or s

egm

ent

0

2 2

B

7

Re

g1

on

8u

se

(RW

)

2 1

1 1

0 9

6 5

4

D

12b'G

DOO

OO

OO

OO

OO

I

2 1

0 9

1 I

6 5

Re

g1

on

Ba

se

(RW

)

RE

G10

N(3

) l--

-~--T~

_~·~-~1

:~~:~~::

:1 1

1 1

r~~~~~~

Wim

4bO

OO

J 1

REG1C~~:4)

RA.S

Re

g1

0n

Da

ta

(RO

) W1

c~n

(NA

)

2 2

8 7

RP

TID

(N

A)

RP

KR

N

(RW

)

Reg

1on

En

ab

le

(RW

)

l---~-_-

r.~.~~-,

-~--_r:~

~:~;-··1

,

1 1

RA

S R

eg

1o

nD

ata

(RO

J W

1dth

(N

A)

RP

TID

(N

AJ

RP

KR

N

(RW

)

Reg

ion

Ena

ble

(RW

)

Rey

1onS

1ze

(RO

) R~gionBase

(RW

)

2 1

1 1

0 9

6 5

12 -~-]

Reg

1onS

1ze

(RO

) R

eg1o

nBas

e (R

W)

2 re

gion

reg

iste

rs f

or s

egm

ent

1

e • 00

• ~

~ ~

~ =

~

rFJ

('D

'?

~ ... N

0 0 -....l

rFJ = ('D ('

D .....

0\

0 .....

-....l d rJl

-....l

'N

0'1

0'1 ~

00

0'

1 =

N

Oat

aWid

th:

16

Add

rWid

th:

20

Nu

mS

eg

me

nts

: 2

Num

Reg

ions

: 4

Num

Pro

tect

ionK

eys:

2

I I I I

I I I I I I I

SI:

GM

I:N

f(C

l)

I

1 1

6 5

o I

0

I

Seg

men

tSiz

e S

egm

entB

ase

\

SE

GM

EN

T(\

)

(NA

) (N

A)

2 3 2

1 0

9

:::;~:::

r ___ -_.b_~"

~_: __ -_

__

__

1 S

egm

entS

ize

Segn

~Cr.

tBa~

e \

~~

~~

I

FIG

. 7

\ \ I \ \ \ \ \ I I I \ \ \

I \ \ I I \ \ I

~agloJCataW1dth

0 fo

r 1

b~o•te,

1 fo

r 2

tyte

s. 2

[(,r

4 t

Jyle

s. 3

1m

8 t

yle

s

R.[

:GIO

N(D

;

RE

GtO

I'J[2

)

2 2

o 9

a 1

s ....

........

·····1··

·········1

·· 0

2b'O

1

2b'O

1

····

····

...

....

. ··

····

····

·.

RA

S

Reg

ronD

ala

RP

TID

R

PK

RN

(NA

) 7~

~~~

(NA

) (N

A)

L'

Reg

1or

En

ab

le

(NA

)

2 1

1 1

0 9

6 5

·: .........

..........

.........

. :

9 8b

'000

00(i

[11

.!..

....

....

....

··

··--

····

····

R

eg ro

n S

ize

(NA

) R

cgrc

r1B

ase

(N

A)

2 2

2

2 2

2 2

21

1

1

9 9

7

6 5

4 3

09

6

5

~--:· ··

_;;~_;_~

_]---~~:

~~-:·r.-

.. ~ .. -.1 __

_ ;_: __ .. :::

·:::··

·····:

:~~~:~

o~~-oo

~~~o-~

c __ :! R

li..S

R

euru

r•D

al.:

>

(NA

) W

1Jlh

(N

A)

RP

TID

tN

A)

RP

KR

N

(I;A

)

Reg

ro1

Enat.Jf~

(N,\

)

Re

g1

on

Srz

e

(NA

) R

eg

ron

Ba

se

(NA

)

2 re

gion

reg

iste

rs f

or

seg

me

nt

0

22

2

2 2

2 2

1

87

6

5 4

3 0

9

REG101

~(3)

[_ _

_ ~ ___

: ·:2

;0·;-r

·;~-1~

--r--;

_::r:_

·: .: ...

:~~ .... Jw 2

~r~-;~

~~ i

RA

S

Reg

ror1

Dat

8

{NA

) W

1dlh

(N

A)

2 2

8 7

Rl-

'[[{

)

{NA

) R

PK

RN

(N

A)

Reg

1on

E1a

b e

INA

)

Reg

1onS

1ze

INA

)

2 1

0 9

Roo

g10:

1Bas

e tN

A)

1 1

6 5

RE

GIO

N(4

) ::.

·~:::r

~~o,:;

J:::~~

::~:::

r::;· ___ r

_· _- 1

: .. :_,~:

:::::•

:;~:~~

~:~: j

R

AS

R

eg1o

nOal

a (N

A)

Wod

th

(NA

)

RP

TlD

(N

A)

RP

KR

N

(NA

)

Reg

1on

En

ab

le

(NA

)

Re

g•o

nS

1ze

(N

A)

2 re

gion

reg

iste

rs f

or

seg

me

nt

1

Reg

1onB

ase

(NA

)

e • 7J).

• ~

~ ~

~ =

~

rFJ

('D

'?

~ ... N

0 0 -....l

rFJ =('D ('

D .....

-....l

0 .....

-....l d rJl

"'--...1

N

0'

1 "'0

'1 -....

.1 0

0

0'1 =

N

US 7,266,786 B2 1

METHOD AND APPARATUS FOR CONFIGURABLE ADDRESS MAPPING AND

PROTECTION ARCHITECTURE AND HARDWARE FOR ON-CHIP SYSTEMS

2 FIG. 5 illustrates one embodiment of the invention show

ing in block diagram form a hardware implementation for the configurable address mapping and protection architecture;


The present invention pertains to on-chip systems. More particularly, the present invention relates to a method and apparatus for a configurable address mapping and protection architecture and hardware for on-chip systems.

FIG. 6 illustrates one embodiment of the invention showing in table form, a definition of some possible read-only, read-write, or not-accessible configuration parameters; and

FIG. 7 illustrates one embodiment of the invention show-10 ing in table form, a definition of some possible not-acces

sible configuration parameters.


The operational model for most computer and/or on-chip 15

systems involves the sending of requests from one or more processing units to one or more service modules in the system. Upon receiving a request (i.e., an instruction) from a processing unit, a service module completes the task as requested. Then, there may be responses to be returned from 20

the service module back to the processing unit. It is also very common to have a component in the system to act as both a processing unit and a service module.

Many different ways may be used to deliver requests and responses between processing units and servicing modules. 25

One of the most frequently used methods, for delivering the requests, is by addressing (plus, protection checking). For instance, a request is tagged with a "destination address" and a "source protection identification (ID)". The destination address tells where the service module(s) is (are) located, 30

and/or how to deliver the request to the service module(s). The source protection ID identifies the processing unit and is used to determine whether the service module(s) should execute the request, or whether the request can be delivered to the service module(s), thus providing access to the service 35

module selectively depending on source identity. Usually, the number of transistors (often referred to as "gates") and the resulting gate size (and thus area) of the hardware module (on for example, an integrated circuit) devoted to address decoding and protection ID checking are compara- 40

tively large. Additional circuitry, which consumes more power, may also be needed in order to make this decoding and checking hardware dynamic (i.e., configurable) during operation. For a wireless device, especially, where the demand for a smaller chip die size and a lower power 45

consumption is high, a large and power-consuming address decoding and protection-checking module is unacceptable. This presents problems.


A method and apparatus for a configurable address mapping and protection architecture and hardware for on-chip systems are described.

FIG. 1 illustrates a network environment 100 in which the techniques described may be applied. The network environment 100 has a network 102 that connects S servers 104-1 through 104-S, and C clients 108-1 through 108-C. More details are described below.

FIG. 2 illustrates a computer system 200 in block diagram form, which may be representative of any of the clients and/or servers shown in FIG. 1. More details are described below.

The term IP as used in this document denotes Intellectual Property. The term IP, may be used by itself, or may be used with other terms such as core, to denote a design having a functionality. For example, an IP core or IP for short, may consist of circuitry, buses, communication links, a microprocessor, etc. Additionally, IP may be implemented in a variety of ways, and may be fabricated on an integrated circuit, etc. The term flooding is used to denote a communication in which an incoming packet is duplicated and sent out on every outgoing pathway throughout most of a chip, system, etc.

In this disclosure, a method and apparatus for a configurable address mapping and protection architecture and hardware for on-chip systems are described. In one embodiment of the invention, circuitry for providing the necessary address mapping and protection functionality is provided in hardware. In another embodiment the invention allows a product designer to configure the address mapping and protection module at design time, such that, only the mini-


The invention is illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements and in which:

mum specified mapping and protection is implemented into the hardware. Thus, the final gate size and power consumption of the address mapping and protection hardware module

50 is determined by the specification of the product. Moreover, the address width and data word width for each of the service modules may also be considered and used to minimize the number of signal wires to/from the service modules. This may result in hardware that is not over designed

55 and may more easily meet the gate count and power consumption requirements of a product. FIG. 1 illustrates a network environment in which the

method and apparatus of the present invention may be implemented;

FIG. 2 is a block diagram of a computer system; FIG. 3 illustrates one embodiment of the invention show

ing in a block diagram form an on-chip system with N processing units, M service modules, an address mapping and protection module, and a request-delivery module;

The disclosed invention "configurable address mapping and protection architecture and hardware for on-chip sys-

60 terns" may provide: 1. A centralized, configurable address mapping and pro

tection architecture for an on-chip system.

FIG. 4 illustrates one embodiment of the invention show- 65

2. A set of configuration parameters that may lead to overall gate size reduction, power consumption reduction, and/or the elimination of unnecessary signal wires for a final address mapping and protection hardware ing in a flowchart form the process in generating optimized

address mapping and protection hardware; module.

US 7,266,786 B2 3

3. The ability for a designer to configure the address mapping and protection hardware module at design time using a specification language and achieve the goal of producing a minimized hardware implementation.

FIG. 4 shows the processing flow that can be used in generating optimized address mapping and protection hardware. First, a product designer must understand the configurable architecture (403). The architecture (423) includes the definition of the address mapping scheme, the definition of the protection scheme, the definition of the set of configuration parameters, and the definition of the specification language. Then, the designer can design the address mapping and protection module with minimum hardware based upon the product requirements (40S). Based on the product requirements and user models, the designer should be able to decide, for instance, the number of service modules in the system, the number of address regions for a service module, and how the information of an address region is going to be used (443).

Next the designer needs to specify the address mapping and protection hardware module using the provided specification language (407 and 42S). At the end (409), a postprocessing tool, which takes the design specification as input, is used and generates an optimized hardware gatelevel netlist ( 427) for the address mapping and protection hardware.

FIG. 3 shows a block diagram of one embodiment of the invention having an on-chip system with N processing units at the top (302-0, 302-1, through 302-N), and M service modules at the bottom (318-0, 318-1, through 318-M). In the middle of the system is an address mapping and protection module 306, showing the address mapping 308 and protection key map 310 followed by a request-delivery module 314. Note that FIG. 3 only shows the request delivery side of the system (i.e., the request side and does not show the return side). In one embodiment of the invention, the use model of the system is the following:

A request is sent from a processing unit (such as 320-N)

4 may be up to S segments in the system. To send a request to a specific service module, a processing unit needs to tag the request with an address that is within an address region of the module. The information about an address region is kept in a region register stored in the address map hardware. Information about an address segment is kept in a segment register, which is also stored in the address map hardware.

There may be up to K different protection keys in the system; each protection key is kept in one protection key

10 register, which resides in the protection key map hardware. Each address region is associated with at least one protection key, and each processing unit is assigned with one or more protection ID. When a request is sent from a processing unit, the request is tagged with a protection ID (such as SlS

15 shown in FIG. S) of the processing unit. After the request's destination address is decoded and a service module's address region is identified, the protection ID is checked against those protection keys associated with that address region to see whether it is safe to forward the request to the

20 service module. FIG. S shows one embodiment of the invention having a

hardware implementation of the address mapping and protection architecture mentioned above. In FIG. S, a request is shown with its destination address tag and protection ID tag

25 (such as illustrated in FIG. 3 at 304). The destination address contains three parts, a segment (base) address part SOla, a region (base) address part SOlb, and the offset within the region SOle. The address mapping hardware contains the S segment registers (S03) and the R region registers (SOSa, b,

30 and c, for example; and there can be more region pages as shown in FIG. 3). Each address segment may contain multiple address regions-this is illustrated by having one page of region registers associated with each segment register (pages SOSa, SOSb, and SOSc are associated with, for

35 example, segment register 0, segment register 1, and segment register 2, respectively. FIG. S also shows that the address segment 0 has five region registers (SOSa).

to the address mapping and protection module (306). 40

Moreover, the segment address of a request is used to match one of the segment registers (S09), and the region address is used to match one or more of the region registers kept in the address map. Combining the two matching signals, in a normal case, one single match happens (Sll). Note that a duplicate match and a failed match may be

In the address mapping and protection module 306, the "destination address" and "source protection ID" are extracted out of the request (such as that illustrated at 304). The address is decoded and compared against the address map (308) to find out where the service module 45

is and how to deliver the request to the module. The given protection ID is checked against the protection key map (310) to determine whether the request should be delivered to the service module because the protection allows it. Note that other possible fields inside a 50

request, not shown in FIG. 3, are "request type field", "data field", "data type field" (e.g., to indicate that it is

detected, if desirable, as errors. If a single match occurs, the protection key register

number associated with the region is returned (S19); and the routing information for the targeting service module is also returned (S13). The protection key register number (S19) is used to filter out unrelated matches coming out of the protection key map module (S17). A positive security okay signal (S21) indicates that the request can be delivered to the servicing module.

Table 1 shows a summary of the combinations and the a burst data stream), and "user provided request information" (e.g., user can use this field to provide a proprietary, sub request type).

Next, the request (such as that illustrated at 312 and 316) is sent to the service module (such as 318-2) by the request-delivery module (314).

55 results of a normal matching case (i.e., a single address match is identified and security check is also okay), and error cases. When a single match is identified, Information (saved in the matched region register) about the destination service module is forwarded to the downstream modules. One embodiment of the invention having a configurable

address mapping and protection architecture using segmen- 60

tation and address regions for the on-chip system mentioned above is described here. Conceptually, the address space for the entire on-chip system may be divided into R address regions. Each service module in the system may have multiple address regions. In order to reduce the complexity 65

of matching to I of the R address regions, multiple address regions may be grouped into one address segment. There

For instance, The protection key ID is forwarded to the "Security

Check Okay" circuit in order to complete the security check.

The destination service module's data word width and physical target ID (the physical target ID contains the physical location information of the service module) are forwarded to the request-delivery module.

US 7,266,786 B2 5

TABLE 1

Matching Normal Case Error Case 1 Error Case 2 Error and checking Case 3 on fields Request's Single match Single match Single match Double segment (base) address Request's region (base) address

Request's protection ID Results

match

Single match on Single match Double match Don't the region register on the region on the region care page of the register page register page of matched segment of the the matched

matched segment segment

Okay Not okay Don't care Don't care

Information about Protection Error found in Error the destination violation the region found service module is address map in the generated and segment forwarded to the address request-delivery map module

10

15

20

A set of configuration parameters and registers (plus register fields in each type of register) are also identified for the architecture mentioned above such that a designer can 25

adjust them in order to build the address mapping and protection module with minimum hardware. The final goal is to reduce the hardware module's gate size, power consumption, and signal wires. The following lists the configuration parameters, their definition, and how they can affect 30

hardware: Data Width: This parameter represents the data word size

of a request. Different data word sizes can be allowed for requests coming from different processing units. However, only a single data width parameter is used 35

here; thus, requests coming from the processing units have the same data word size. Setting this parameter to only the needed data word size can save gates and wires in the hardware module.

AddrWidth: This parameter represents the address tag 40

width for the on-chip system; i.e, the dimension of the address mapping and protection module. Setting this parameter to only the needed address width can save a great number of gates and wires in the hardware module. 45

NumSegments: This parameter indicates how many segment registers can exist in the system and is used to removed un-needed segment registers.

For each segment register: 50 SegmentSize register field: This field tells the size of a

segment; it can be used to reduce the number of bits for a segment register.

SegmentBase register field: This field indicates the segment base address of an address segment.

55 NumRegions: This parameter indicates how many region

registers can exist in the system and is used to remove un-needed region registers.

For each region register: RegionSize register field: This field tells the size of an 60

address region; it can be used to reduce the number of bits for a region register.

RegionBase register field: This field indicates the region base address of an address region.

RegionProtectionKeyRegisterNum (RPKRN) register 65

field: This field tells which protection key register is to be used by the security checking logic when a

6 single match occurs on this address region. Multiple register fields of this type can exist, however, only one is used here.

RegionData Width register field: This field tells the data word width of the service module that links to an address region. It can be used to trim data bus wires, if possible, connecting to the service module. It can also be used to indicate whether data packing or unpacking is needed; packing or unpacking may be needed when the data word size of a request's source processing unit is different from the data word size of the request's destination service module.

RegionPhysica!Target!D (RPTID) register field: This field describes the physical linkage between an address region and a service module. This physical linkage can be, for example: (1) hardware routing information to be passed on to the request-delivery module in order to deliver a request to the service module; or (2) a hardware signal bit position such that, when the request-delivery module asserts the signal, a request is sent to the service module.

RegionAddressSpace (RAS) register field: This field allows an address region of a service module to be further partitioned.

RegionEnable register field: This field indicates whether or not this region register is used for the current design or to indicate whether the region is currently available.

NumProtectionKeys: This parameter indicates how many protection key registers can exist in the system and is used to remove un-needed protection key registers. In addition, it can also save bits in each of the region registers, where a protection key number is stored.

For each protection key register: ProtectionKeyBitVector register field: This bit vector

tells which protection IDs are allowed to access the service modules that are linked by region registers pointing to this protection key register. A bit 1 in position N indicates that a request tagged with protection ID of N is okay to access the request's destination service module.

NumProtectioniDs: This parameter indicates how many different protection IDs can exist in the system and is used to removed un-needed protection key bits in the ProtectionKeyBitVector register field.

Endianess: This parameter tells whether the big endianess or little endianess is applied in the architecture; it determines the address byte location and the data byte sequence coming out of data packing/unpacking.

Moreover, the register fields of each of the registers can also be specified to be one of the following three usage types so that a minimum logic design can be applied to construct the hardware to save area and power:

Non-Accessible (NA) Register Field: A register field is hardwired to a power-on value and cannot be read, nor written.

Read-Only (RO) Register Field: A register field is hardwired to a power-on value and needs to be software visible (read-only) during operation. In this case, extra gates are needed in order to allow the software read access of the register field.

Read-Write (RW) Register Field: A register field can be read and written by software dynamically. For this type of register field, extra circuitry (for example, in the form of flip-flops and gates) is needed in order to allow software changes.

US 7,266,786 B2 7

Additionally, each register field can also be specified as an "exporting constant" (EC) register field such that the netlist portion of the register field is exported to the top-level of the final netlist. It makes the power-on value of a register field more easily to be manually modified, as needed by a product, late during the full chip generation process. For instance, the ProtectionKeyBitVector register field of each of the protection key registers can be declared as "exporting constant" field; therefore, it allows a final protection key map to be put into the chip late in the product generation 10

process. FIG. 6 and FIG. 7 show two almost identical address

mapping and protection configurations. The register fields in FIG. 6 are of RO, RW, or NA type; however, for FIG. 7, all register fields are of NA type, plus, the un-used region 15

register 1 is removed. From a first-order estimation, the number of flip-flops (also referred to as flops) saved in FIG. 7 is 82; that is no flops are used in the address map (308) and the protection key map (310) as shown in the center of FIG. 3. Also note that, if (1) the address width of the hardware 20

module is reduced from 20 to 17, and (2) the number of protection IDs is scaled down to 8, additional buffer register bits and signal wires can be saved versus the FIG. 6 case.

As mentioned previously, a specification language needs to be provided so that a designer may easily specify a 25

minimum design for a product. The following lists, in one embodiment of the invention, an example specification as shown in FIG. 6:

Address Mapping and Protection Module } DataWidth: 16 AddrWidth: 20 Endianess: little

30

}

8

-continued

RegionPhysicalTarget!D: link to ServiceModule 0 using "targetselect pin 0" { access NA } RegionAddressSpace: 0 { access RO } RegionEnable: Yes { access RW }

REGION(3) {

}

Inside SEGMENT(!) RegionSize: 4KB { access RO } RegionBase: OxlOOOO { access RW} RegionProtectionKeyRegisterNum: 1 { access RW } RegionDataWidth: lB { access NA} RegionPhysicalTarget!D: link to ServiceModule 2 using "targetselect pin 2" { access NA } RegionAddressSpace: 0 { access RO } RegionEnable: Yes { access RW }

REGION(4) {

}

Inside SEGMENT(!) RegionSize: 4KB { access RO } RegionBase: OxllOOO { access RW} RegionProtectionKeyRegisterNum: 0 { access RW } RegionDataWidth: 4B { access NA} RegionPhysicalTarget!D: link to ServiceModule 3 using "targetselect pin 3" { access NA } RegionAddressSpace: 0 { access RO } RegionEnable: Yes { access RW }

PROTECTIONKEY(O) { ProtectionKeyBitVector: Ox007B { access RW and EC }

} PROTECTIONKEY(l) {

ProtectionKeyBitVector: Ox0085 { access RW and EC }

For the above example (also shown in FIG. 6), there are 2 segments and 5 address regions; the address region 1 is disabled at the initialization time (i.e., the region register's

NumSegrnents: 2 NumRegions: 5 NumProtectionKeys: 2 NumProtectionKey IDs: 16 SEGMENT(O) {

35 RegionEnable field is set to "No"), but can be re-configured at run-time because the field is read/writable. There are two protection key registers and each has a 16-bit bit vector. The example also specifies the following at power-on:

}

SegrnentSize: 64KB { access RO } SegrnentBase: OxOOOOO { access RW }

SEGMENT(!) {

}

SegrnentSize: 64KB { access RO } SegrnentBase: OxlOOOO { access RW}

REGION(O) {

}

Inside SEGMENT(O) RegionSize: 256B { access RO } RegionBase: OxOOlOO { access RW } RegionProtectionKeyRegisterNum: 0 { access RW } RegionData Width: 2B { access NA } RegionPhysicalTarget!D: link to ServiceModule 1 using '1argetselect pin 1" { access NA } RegionAddressSpace: 0 { access RO } RegionEnable: Yes { access RW }

REGION(!) {

}

Inside SEGMENT(O) RegionSize: 4KB { access RO } RegionBase: OxOlOOO { access RW } RegionProtectionKeyRegisterNum: 1 { access RW } RegionData Width: 2B { access NA } RegionPhysicalTarget!D: link to ServiceModule 1 using '1argetselect pin 1" { access NA } RegionAddressSpace: 1 { access RO } RegionEnable: No { access RW }

REGION(2) { Inside SEGMENT(O) RegionSize: 16B { access RO } RegionBase: OxOOOOO { access RW } RegionProtectionKeyRegisterNum: 0 { access RW } RegionDataWidth: 4B { access NA}

40

The request address width is 20 bits and data word size is 16 bits.

There are four service modules: ServiceModule 0, 1, 2, and 3.

Address region 0, 1, and 2 exist in the address segment 0 and are based at address OxOOl 00, OxOl 000, and

45 OxOOOO, and of size 256 bytes, 4 K bytes, and 16 bytes, respectively. The region register 1 is not enabled at the current time, but, can be used as a future addition.

Address region 3 and 4 exist in the address segment 1 and are based at address Ox10000 and OxllOOO, respec-

50 tively; both are 4K-byte in size. Requests coming from processing units using Protection

ID 0, 1, 3, 4, 5, and 6 (the ProtectionKeyBitVector of Ox007B) can go to ServiceModule 0, 1, and 3, depending on the request address. Requests coming from

55 processing units using Protection ID 0, 2, and 7 (the ProtectionKeyBitVector of Ox0085) can go to ServiceModule 2, if the request address falls into the address region 3.

For the design shown in FIG. 7, its specification looks like 60 the following:

Address Mapping and Protection Module } DataWidth: 16

65 AddrWidth: 20 Endiness: little

US 7,266,786 B2 9

-continued

NumSegrnents: 2 NumRegions: 4 NumProtectionKeys: 2 NumProtectionKey IDs: 16 SEGMENT(O) {

}

SegrnentSize: 64KB { access NA } SegrnentBase: OxOOOOO { access NA }

SEGMENT(!) {

}

SegrnentSize: 64KB { access NA } SegrnentBase: OxlOOOO { access NA}

REGION(O) {

}

Inside SEGMENT(O) RegionSize: 256B { access NA} RegionBase: OxOOlOO { access NA} RegionProtectionKeyRegisterNum: 0 { access NA} RegionData Width: 2B { access NA } RegionPhysicalTarget!D: link to ServiceModule 1 using '1argetselect pin 1" { access NA } RegionAddressSpace: 0 { access NA } RegionEnable: Yes { access NA }

REGION(2) {

}

Inside SEGMENT(O) RegionSize: 16B { access NA } RegionBase: OxOOOOO { access NA } RegionProtectionKeyRegisterNum: 0 { access NA} RegionDataWidth: 4B { access NA} RegionPhysicalTarget!D: link to ServiceModule 0 using '1argetselect pin 0" { access NA } RegionAddressSpace: 0 { access NA } RegionEnable: Yes { access NA }

REGION(3) {

}

Inside SEGMENT(!) RegionSize: 4KB { access NA } RegionBase: OxlOOOO { access NA} RegionProtectionKeyRegisterNum: 1 { access NA} RegionDataWidth: lB { access NA} RegionPhysicalTarget!D: link to ServiceModule 2 using '1argetselect pin 2" { access NA } RegionAddressSpace: 0 { access NA } RegionEnable: Yes { access NA }

REGION(4) {

}

Inside SEGMENT(!) RegionSize: 4KB { access NA } RegionBase: OxllOOO { access NA} RegionProtectionKeyRegisterNum: 0 { access NA} RegionDataWidth: 4B { access NA} RegionPhysicalTarget!D: link to ServiceModule 3 using '1argetselect pin 3" { access NA} RegionAddressSpace: 0 { access NA } RegionEnable: Yes { access NA }

PROTECTIONKEY(O) { ProtectionKey BitVector: Ox007B { access NA and EC }

} PROTECTIONKEY(l) {

ProtectionKey BitVector: Ox0085 { access NA and EC }

As mentioned in the description, and as shown in FIG. 4, at one of the final stages, a post-processing tool, which takes the specified design (such as the specification text shown above) as input, is used and generates an optimized hardware netlist for the address mapping and protection hardware.

Thus, what has been disclosed is a method and apparatus of a configurable address mapping and protection hardware for on-chip systems.

Referring back to FIG. 1, FIG. 1 illustrates a network environment 100 in which the techniques described may be applied. The network environment 100 has a network 102 that connects S servers 104-1 through 104-S, and C clients

10 108-1 through 1 08-C. As shown, several systems in the form of S servers 104-1 through 104-S and C clients 108-1 through 108-C are connected to each other via a network 102, which may be, for example, an on-chip communication network. Note that alternatively the network 102 might be or include one or more of: inter-chip communications, an optical network, the Internet, a Local Area Network (LAN), Wide Area Network (WAN), satellite link, fiber network, cable network, or a combination of these and/or others. The

10 servers may represent, for example: a master device on a chip; a memory; an intellectual property core, such as a microprocessor, communications interface, etc.; a disk storage system; and/or computing resources. Likewise, the clients may have computing, storage, and viewing capabili-

15 ties. The method and apparatus described herein may be applied to essentially any type of communicating means or device whether local or remote, such as a LAN, a WAN, a system bus, on-chip bus, etc. It is to be further appreciated that the use of the term client and server is for clarity in

20 specifYing who initiates a communication (the client) and who responds (the server). No hierarchy is implied unless explicitly stated. Both functions may be in a single communicating device, in which case the client-server and serverclient relationship may be viewed as peer-to-peer. Thus, if

25 two devices such as 108-1 and 104-S can both initiate and respond to communications, their communication may be viewed as peer-to-peer. Likewise, communications between 104-1 and 104-S, and 108-1 and 108-C may be viewed as peer to peer if each such communicating device is capable

30 of initiation and response to communication. Referring back to FIG. 2, FIG. 2 illustrates a system 200

in block diagram form, which may be representative of any of the clients and/or servers shown in FIG. 1. The block diagram is a high level conceptual representation and may

35 be implemented in a variety of ways and by various architectures. Bus system 202 interconnects a Central Processing Unit (CPU) 204, Read Only Memory (ROM) 206, Random Access Memory (RAM) 208, storage 210, display 220, audio, 222, keyboard 224, pointer 226, miscellaneous input/

40 output (I/0) devices 228, and communications 230. The bus system 202 may be for example, one or more of such buses as an on-chip bus, a system bus, Peripheral Component Interconnect (PCI), Advanced Graphics Port (AGP), Small Computer System Interface (SCSI), Institute of Electrical

45 and Electronics Engineers (IEEE) standard number 1394 (FireWire), Universal Serial Bus (USB), etc. The CPU 204 may be a single, multiple, or even a distributed computing resource. Storage 210, may be Compact Disc (CD), Digital Versatile Disk (DVD), hard disks (HD), optical disks, tape,

50 flash, memory sticks, video recorders, etc. Display 220 might be, for example, a Cathode Ray Tube (CRT), Liquid Crystal Display (LCD), a projection system, Television (TV), etc. Note that depending upon the actual implementation of the system, the system may include some, all, more,

55 or a rearrangement of components in the block diagram. For example, an on-chip communications system on an integrated circuit may lack a display 220, keyboard 224, and a pointer 226. Another example may be a thin client might consist of a wireless hand held device that lacks, for

60 example, a traditional keyboard. Thus, many variations on the system of FIG. 2 are possible.

For purposes of discussing and understanding the invention, it is to be understood that various terms are used by those knowledgeable in the art to describe techniques and

65 approaches. Furthermore, in the description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present inven-

US 7,266,786 B2 11

tion. It will be evident, however, to one of ordinary skill in the art that the present invention may be practiced without these specific details. In some instances, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the present invention. These embodiments are described in sufficient detail to enable those of ordinary skill in the art to practice the invention, and it is to be understood that other embodiments may be utilized and that logical, mechanical, electrical, and other changes may be made without departing from 10

the scope of the present invention. Some portions of the description may be presented in

terms of algorithms and symbolic representations of operations on, for example, data bits within a computer memory. These algorithmic descriptions and representations are the 15

means used by those of ordinary skill in the data processing arts to most effectively convey the substance of their work to others of ordinary skill in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of acts leading to a desired result. The acts are those requiring 20

physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical, magnetic, or optical signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally 25

for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.

12 methods according to the present invention can be implemented in hard-wired circuitry, by programming a generalpurpose processor, or by any combination of hardware and software. One of ordinary skill in the art will immediately appreciate that the invention can be practiced with computer system configurations other than those described, including hand-held devices, multiprocessor systems, microprocessorbased or programmable consumer electronics, digital signal processing (DSP) devices, set top boxes, network PCs, minicomputers, mainframe computers, and the like. The invention can also be practiced in distributed computing envirouments where tasks are performed by remote processing devices that are linked through a communications network. This communications network is not limited by size, and may range from, for example, on-chip communications to WANs such as the Internet.

The methods of the invention may be implemented using computer software. If written in a progrming language conforming to a recognized standard, sequences of instructions designed to implement the methods can be compiled for execution on a variety of hardware platforms and for interface to a variety of operating systems. It will be appreciated that a variety of programming languages may be used to implement the teachings of the invention as described herein. Furthermore, it is common in the art to speak of software, in one form or another (e.g., program, procedure, application, driver, ... ), as taking an action or causing a result. Such expressions are merely a shorthand way of saying that execution of the software by a computer It should be borne in mind, however, that all of these and

similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the discussion, it is appreciated that throughout the description, discussions utilizing terms such as "processing" or "computing" or "calculating" or "determining" or "communicating" or "displaying" or the like, can refer to the action and processes of a computer system, or an electronic device, that manipulates and transforms data represented as physical (electronic) quantities within the electronic device or computer system's registers and memories into other data similarly represented as physical quantities within the electronic device and/or computer system memories or registers or other such information storage, transmission, or display devices.

30 causes the processor of the computer to perform an action or produce a result.

The present invention can be implemented by an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer, selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but not limited to, any type of disk including floppy disks, hard disks, optical disks, compact disk-read only memories (CDROMs), digital versatile disk (DVD), and magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), electrically programmable read-only memories (EPROM)s, electrically erasable programmable readonly memories (EEPROMs), FLASH memories, magnetic

It is to be understood that various terms and techniques are used by those knowledgeable in the art to describe communications, protocols, applications, implementations,

35 mechanisms, etc. One such technique is the description of an implementation of a technique in terms of an algorithm or mathematical expression. That is, while the technique may be, for example, implemented as executing code on a computer, the expression of that technique may be more

40 aptly and succinctly conveyed and communicated as a formula, algorithm, or mathematical expression. Thus, one of ordinary skill in the art would recognize a block denoting A+B=C as an additive function whose implementation in hardware and/or software would take two inputs (A and B)

45 and produce a summation output (C). Thus, the use of formula, algorithm, or mathematical expression as descriptions is to be understood as having a physical embodiment in at least hardware and/or software (such as a computer system in which the techniques of the present invention may

50 be practiced as well as implemented as an embodiment). A machine-readable medium is understood to include any

mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer). For example, a machine-readable medium includes read only memory

55 (ROM); random access memory (RAM); magnetic disk storage media; optical storage media; flash memory devices; electrical, optical, acoustical, or other form of propagated signals (e.g., carrier waves, infrared signals, digital signals,

or optical cards, etc., or any type of media suitable for storing electronic instructions either local to the computer or 60

remote to the computer.

etc.); etc. Thus, a method and apparatus for a configurable address

mapping and protection architecture and hardware for onchip systems have been described. The algorithms and displays presented herein are not

inherently related to any particular computer or other apparatus. Various general purpose systems may be used with programs in accordance with the teachings herein, or it may 65

prove convenient to construct more specialized apparatus to perform the required method. For example, any of the

What is claimed is: 1. A method comprising: inputting configuration parameters that identifY a first

address location to a service module;

US 7,266,786 B2 13

inputting protection parameters, wherein the protection parameters are selected from the group consisting of a number of different protection keys that can exist for modules in a device and a number of different protection IDs that can exist for modules in the device; and

14 14. The apparatus of claim 13, further comprising: a first region from the number of regions, the first region

having a field containing routing information and a position of a service module connected to the region.

15. The apparatus of claim 14, wherein the routing information comprises hardware routing information passed to a request-delivery module in order to deliver a request to the service module.

determining whether a request from a first module in the device should be passed on to the service module and executed based upon the inputted protection parameters and the first address location of the service module.

2. The method of claim 1 wherein said configuration parameters are selected from the group consisting of address width, number of segments, segment size, segment base, and number of regions.

16. The apparatus of claim 14, wherein the position 10 comprises a hardware signal bit position.

3. The method of claim 2 further comprising for each region of said number of regions a field selected from the 15

group consisting of region address size, region base address, region protection key register number, region enable, region address space, width of a service module connected to said region, and physical linkage information of a service module connected to said region. 20

17. The apparatus of claim 15, wherein if a requestdelivery module asserts a signal at the hardware signal bit position a request is sent to the service module.

18. The apparatus of claim 12, further comprising: means for generating a database for configuring circuitry,

wherein the configuring circuitry may be a register that is configured by specifYing the register as an export constant so that in the database a netlist portion of the register is exported to a top level of a netlist.

19. The apparatus of claim 18, wherein the configuring further comprises configuring the netlist in time after the configuring of the register.

4. The method of claim 3 wherein said physical linkage information of a service module connected to said region further comprises information selected from the group consisting of routing information, and a position.

5. The method of claim 4 wherein said routing information comprises hardware routing information passed to a request-delivery module in order to deliver a request to said service module.

20. The apparatus of claim 18, wherein the register may be selected from the group consisting of not-accessible

25 registers, read-only registers. read-write registers.

6. The method of claim 4 wherein said position comprises 30

a hardware signal bit position.

21. The apparatus of claim 18, wherein the register has register fields selected from the group consisting of notaccessible register fields, read-only register fields, and readwrite register fields.

22. The apparatus of claim 12, further comprising: means for checking the inputted protection and configu

ration parameters against predefined criteria set at design time for an integrated circuit.

7. The method of claim 6 wherein if a request-delivery module asserts a signal at said hardware signal bit position a request is sent to said service module.

8. The method of claim 1 further comprising: generating a database for configuring circuitry, wherein

said circuitry further comprises a register. 9. The method of claim 8 wherein said configuring results

in an operative mode selected from the group consisting of non-accessible register, read only register, and read-write register.

10. The method of claim 8 wherein said configuring said register further comprises specifYing said register as an export constant so that in said database a netlist portion of said register is exported to a top level of a netlist.

11. The method of claim 10 wherein said configuring further comprises configuring said netlist in time after said configuring of said register.

23. The apparatus of claim 22, wherein the predefined 35 criteria are selected from the group consisting of protection

ID, destination address, request type, data, data type, and user provided request information.

24. The apparatus of claim 22, wherein the means for checking checks a segment address against one or more

40 segment register fields, and checks a region address against one or more region register fields.

25. The apparatus of claim 22, wherein the means for checking checks a protection key against one or more protection register fields.

45 26. The apparatus of claim 22, wherein the means for

12. An apparatus, comprising: 50

means for inputting configuration parameters that identify

checking checks requests received from a plurality of source units against an address map and a protection key map.

27. The apparatus of claim 22, further comprising: means for generating a request for additional parameters

if the predefined criteria is met. a first address location to a service module;

means for inputting protection parameters, wherein the protection parameters are selected from the group consisting of a number of different protection keys that can 55 exist for modules in a device and a number of different

28. The apparatus of claim 12, further comprising: means for receiving a product specification; and means for generating a netlist. 29. A processing system comprising a processor, which

when executing a set of requests performs the method of claim 1. protection IDs that can exist for modules in the device;

and means for determining whether a request from a first

module in the device should be passed on to the service module and executed based upon the inputted protection parameters and the first address location of the service module.

30. A machine-readable storage medium having stored instructions thereon, which when executed performs the

60 method of claim 1.

13. The apparatus of claim 12, wherein the configuration parameters are selected from the group consisting of address 65

width, number of segments, segment size, segment base, and number of regions.

31. The method of claim 1, further comprising: receiving a product specification; and generating a netlist that represents an optimized address

mapping and hardware protection. 32. The apparatus of claim 12, further comprising: means for associating a first address region in the system

with at least one protection key; and

US 7,266,786 B2 15

means for assigning a first processing unit at least one possible protection ID.

33. An apparatus comprising: means for inputting configuration parameters that associ

ate a first address location with a service module; means for inputting protection parameters, wherein the

protection parameters are selected from the group consisting of a number of different protection keys that can exist for modules in a device and a number of different protection IDs that can exist for modules in the device; 10

means for determining whether a request from a first module in the device should be passed on to the service module and executed based upon the inputted protection parameters; and

means for comparing a first protection ID assigned to the 15

request to a map of one or more protection key registers to determine whether the request should be passed on to the service module and executed.

34. The apparatus of claim 12, further comprising: a first processing unit having at least two possible pro- 20

tection IDs. 35. An apparatus comprising: a configuration register to input configuration parameters

that associate a first address location with a service module; 25

a protection key register to input protection parameters, wherein the protection parameters are selected from the group consisting of a number of different protection keys that can exist for modules in a device and a number of different protection IDs that can exist for 30

modules in the device; and security checking logic to determine whether a request

from a first module in the device should be passed on to the service module and executed based upon the inputted protection parameters and the first address 35

location of the service module; and a first field in a software database to direct the security

checking logic on which protection key register is to be used by the security checking logic when matching a first protection ID associated with the request to first 40

address location of the service module. 36. An apparatus comprising:

16 keys that can exist for modules in a device and a number of different protection IDs that can exist for modules in the device;

security checking logic to determine whether a request from a first module in the device should be passed on to the service module and executed based upon the inputted protection parameters and the first address location of the service module; and

a bit vector to direct the security checking logic on which protection IDs associated with the request are allowed to access one or more service modules that are linked by region registers pointing to a first protection key register.

37. The apparatus of claim 12, wherein a first field of protection key registers can be re-configured at run-time because the first field is read/writable.

38. The method of claim 1, further comprising: configuring the protection parameter information at

design time of an implementation of the device. 39. A machine-readable storage medium having stored

instructions thereon, which when executed generates the apparatus of claim 12.

40. An apparatus, comprising: a first set of registers for inputting configuration param

eters that associate a first address location with service module;

a second set of registers for inputting protection parameters, wherein the protection parameters are selected from the group consisting of a number of different protection keys that can exist for modules in a device and a number of different protection IDs that can exist for modules in the device; and

a comparator to determine whether a request from a first module in the device should be passed on to the service module and executed based upon the inputted protection parameters in the second set of registers and the inputted configuration parameters in the first set of registers, wherein the comparator compares a first protection ID assigned to the request to a map of one or more protection key registers to determine whether the request should be passed on to the service module and executed. a configuration register to input configuration parameters

that associate a first address location with a service module;

a protection key register to input protection parameters, wherein the protection parameters are selected from the group consisting of a number of different protection

41. A machine-readable storage medium having stored 45 instructions thereon, which when executed generates the

apparatus of claim 40.

* * * * *

Exhibit C

c12) United States Patent Vinogradov et al.

(54) METHODS AND APPARATUSES FOR DECOUPLING A REQUEST FROM ONE OR MORE SOLICITED RESPONSES

(75) Inventors: Glenn S. Vinogradov, Philadelphia, PA (US); Drew E. Wingard, San Carlos, CA (US)

(73) Assignee: Sonics, Inc., Mountain view, CA (US)


This patent is subject to a terminal disclaimer.

(21) Appl. No.: 10/980,736

(22) Filed: Nov. 2, 2004

(65)

(51)

(52)

(58)

(56)

Prior Publication Data

US 2006/0095635 Al May 4, 2006

Int. Cl. G06F 13100 (2006.01) G06F 13114 (2006.01) G06F 13136 (2006.01) H04J 3124 (2006.01) U.S. Cl. ...................... 710/314; 710/100; 7101112;

710/305; 370/469 Field of Classification Search ................ 710/100,

7101112, 305, 214; 370/469 See application file for complete search history.

References Cited


5,699,460 A * 5,745,791 A 5,761,348 A 5,799,203 A * 5,802,399 A 5,948,089 A * 6,122,690 A

12/1997 Kopet et al ................. 382/307 4/1998 Peek et a!. 6/1998 Honma 8/1998 Lee eta!. ...................... 710/8 9/1998 Yumoto et al. 9/1999 Wingard eta!. ............ 710/107 9/2000 Nannetti et a!.

111111 1111111111111111111111111111111111111111111111111111111111111

EP

US007277975B2

(10) Patent No.: US 7,277,975 B2 (45) Date of Patent: *Oct. 2, 2007

6,182,183 B1 * 112001 Wingard eta!. ............ 710/305

(Continued)

FOREIGN PATENT DOCUMENTS

1179785 212002

(Continued)

OTHER PUBLICATIONS

Culler, David E.: "Split-Phase Busses," CS 258, Computer Science Division, U.C. Berkely, Spring 1999, pp. 1-23.

(Continued)

Primary Examiner-Paul R. Myers Assistant Examiner-Jeremy S Cerullo (74) Attorney, Agent, or Firm-Blakely Sokoloff Taylor & Zafman LLP

(57) ABSTRACT

Embodiments of apparatuses, systems, and methods are described for communicating information between functional blocks of a system across a communication fabric. Translation logic couples to the communication fabric. The translation logic implements a higher level protocol layered on top of an underlining protocol and the communication fabric. The translation logic converts one initiator transaction into two or more write transactions and then transmits the write transactions using the underlining protocol of the communication fabric. The translation logic converts the initiator transaction into two or more write transactions and then transmits the write transactions using the underlining protocol of the communication fabric so that the communication fabric does not block or poll for responses, and that data may be transferred in a direction opposite from the initiator transaction request.


US 7,277,975 B2 Page 2


6,330,283 B1* 12/2001 Lafe ...................... 375/240.18 6,393,500 B1* 5/2002 Thekkath ..................... 710/35 6,678,423 B1* 112004 Trenary et a!. ............. 382/250 6,725,313 B1* 4/2004 Wingard et a!. ............ 710/305 6,785,753 B2 * 8/2004 Weber eta!. ............... 710/105 6,868,459 B1 3/2005 Stuber 6,970,013 B1 1112005 Corey 6,981,088 B2 12/2005 Holm eta!.

2002/0038393 A1 * 3/2002 Ganapathy et a!. ........... 710/22 2002/0107903 A1 * 8/2002 Richter et a!. .............. 709/201 2002/0161848 A1 * 10/2002 Willman et a!. ............ 709/213 2003/0099254 A1 * 5/2003 Richter ....................... 370/466 2003/0191884 A1 * 10/2003 Anjo eta!. ................. 710/307 2003/0212743 A1 * 1112003 Masri eta!. ................ 709/204 2004/0017820 A1 * 112004 Garinger eta!. ............ 370/419 2005/0210164 A1 * 9/2005 Weber eta!. ................. 710/35 2005/0216641 A1 * 9/2005 Weber eta!. ............... 710/307 2006/0092944 A1 * 5/2006 Wingard et a!. ......... 370/395.2 2006/0095635 A1 * 5/2006 Vinogradov et a!. ........ 710/309


JP 63296158 12/1988

OTHER PUBLICATIONS

International Search Report, PCT/US2005/008239, mailed Jun. 13, 2005, 3 pp.

International Search Report, PCT/US2005/008235, mailed Nov. 30, 2005, 6 pp. AMBA 3.0 (AMABA AXI), undated. MIPS32 24 K Family of Synthesizable Processor Cores, Product Brief, MIPS Technologies, 2005 . Open Core Protocol Monitor, Data Sheet, Mentor Graphics, undated . IP Core-Centric Communications Protocol Introducing Open Core Protocol 2.0, www.design-reuse.com, Sep. 11, 2006. Enabling Reuse Via an IP Core-Centric Communications Protocol: Open Core Protocol, Wolf-Dietrich Weber, Sonics, Inc., 2000. NECoBus: A High-End SOC Bus with a Portable & Low-Latency Wrapper-Based Interface Mechanism, Anjo et a!., NEC Corporation, IEEE 2002. Open Core Protocol Specification, Release 2.0, pp. 1-3, 31-57, copyright 2003. Socket-Centric IP Core Interface Maximizes IP Applications, www. ocpip.org, undated. Definition of Burst Mode Access and Timing, www.pcguide.com, Sep. 12, 2006. Definition of Direct Memory Access (DMA) Modes and Bus Mastering DMA, www.pcguide.com, Sep. 12, 2006. Definition of PCI Bus Performance, www.pcguide.com, Sep. 12, 2006. Weiss, Ray, "PCI-X Exposed," Tech OnLine, http://www.techonline. corn/community/ed_resource/feature_article/7114, Jun. 28, 2005.

* cited by examiner

U.S. Patent Oct. 2, 2007 Sheet 1 of 10 US 7,277,975 B2

-Q)

:to... 0>

(X)~ :to...

0 «S ~C\J - r-ctS ,.... .-::: c

:to... 0 -ctS -Q) - ~ C\J./'¥ c

ctS 1-

0 ,....

102 \ C

PU

~ 14

6

Gra

phic

s 1

1 r'\

, ...

. _,

...

112

114

Net

wor

k A

dapt

er

wra

pped

with

10

6 10

4 \ I

tra~

lati

on

int

ligen

ce

.------,

MP

EG

M

PE

G

2 D

ecod

e

126~

I I

.....,..

... I

I M

emo

ry

Sch

edu

ler

I 11

6 /'

• ....

....

Co

ntr

oll

er I

Fig

ure

1 b

I I

108 I

110 J HOD

Con

trol

ler

1 si

gnal

int

erfa

ce

·~

I P

CI

I I

Fire

wal

l

+

Jf

117

115

e • 00

• ~

~ ~

~ =

~

0 (') .... N

~ N

0 0 -....l rFJ = ('D ('

D .....

N

0 ..... .... 0 d rJl

-....l

'N

-....l

-....l

\c

-....l u. =

N

26

0

262

26

4

26

6

26

8

27

0

27

2

e •

~ ~

~ ~

~ ~

~ 0

0

• ~

~ ~

~

Win

ne

r of

In

In

In

I

=

Tg

t T

gt

~

Arb

itrat

ion

1 2

1 2

1 th

is r

ound

0 ('

) .... N

~ N

0 0

Re

qu

est

se

nt

I I

Req

I

I

I R

eq

Req

I

I -..

..l

with

CM

D a

nd

1 2

1 A

dd

ress

I

I I

I I

I I

rFJ =- ('D ('

D .....

(.H

0 .....

Re

spo

nse

se

nt

I I

I I

I I

Asp

I

Asp

.... 0

with

Dat

a a

nd

2

1

Ad

dre

ss

I I

I I

I I

I d rJ

l

1 [I

] -..

..l

Tim

e m

ea

sure

d

3 4

5 6

7 'N

-..

..l

in i

nte

rco

nn

ect

-..

..l \c

cl

ock

cyc

les

-....l

Fig

ure

2 u.

=

N


Start

Generating a read request containing a piece of information that communicates that N number of read requests in this burst are going to related addresses in a target. The

pieces of information may also communicate that theN number of read request are for a block transaction such as two-dimensional multimedia data request. The N number

of read request may have pieces of information including a command type, an

302 address, a burst length, a length, a height, and a stride of the read request.

Communicating the N number of read requests across a signal interface to an initiator agent connected to a shared resource, such as an interconnect. Two or more agents connected to the shared resource may form a distributed arbitration mechanism for

304 access to the shared resource.

Detecting for the presence of the additional pieces of information in the read request. If detected, the initiator agent and the target agent communicate requests and

responses to each other through write-type request packets and response packets with annotations added to these packets. If not detected, responses and requests are

process as a direct transaction through the normal request and response protocol 306 path.

Converting the N number of read requests to a single write-type request packet with annotations in the data field of the write request to indicate how many read request

were combined, such as a burst length, and the addresses associated with each read request, such as a burst sequence based upon the addresses being related. If the N number of read request indicated non-incrementing address burst block transaction,

the single request packet may also includes a length of a raster linerow occupied by a target data, a number of rows of raster lines occupied by the target data, and a length difference between starting addresses of two consecutive raster linesrows occupied

by the target data. If the block transaction is for two dimensional data then the single request packet also includes 1) a width of the two dimensional object measured in the length of the row, 2) a height of the two dimensional object measured in the number of

rows the two dimensional object will occupy, and 3) a stride of the two dimensional object is measured in the length difference between the starting addresses of two

consecutive rows.

j_ Figure 3a Cont.


l Gain ina access to the shared resource by winning a round of arbitration.

310

~ Transmitting this single write-type request packet with annotations over the

shared resource. The underlining protocol may transmitting this single request packet with annotations across the communication fabric. The initiator agent may

transmit the request packet in a non-blocking manner.

312

• Relinquishing control of the shared resource to allow other agents to issue their

314 transactions.

• Receiving the single request packet with annotations and detecting for the

annotations.

316

.~ lionve111ng me stngte reques1 pacKel tmo me ongtnat t\J numoer OT reaa requests,

each read request with its original start address. Decoding the single write request and storing both the initiator's address, such as a con ID, Identification

tag, etc., and the number of read requests in this series that were combined into

318 this single write request.

• Transmitting the converted N number of read requests across a signal interface to

320 the target.

• Generating responses to the original number of read request, each response

322 carrying data in bit words.

l Figure 3b Cont.

-


l When the responses are available, they will be communicated from the target IP

core. 324

+ Communicating the N number of responses to the read request across a signal

interface to the target agent connected to the shared resource. 326

• Receiving the data responses and determining if these are responses to the

single request packet. 328

~ If these are responses to the single request packet, converting each response

into write-type response packets. Generates the address in the address field of the response packet by using the stored address of the original initiator's address.

Noting the number of response packets in this series sent back to the initiator agent. Annotating the last response packet in this series as the last/final packet in a control field such as the Reqinfo field. If these are not responses to the single

request packet, process as a direct transaction through the normal response path. 330

• Gaining access to the shared resource by winning a round of arbitration.

332

• Transmitting the N number of write type data response packets with annotations

334 over the shared resource.

+ Figure 3c


Cont.

! Receiving the response packets and detecting for the presence of the annotations indicating that these are response packets that correspond to the single request

336 packet.

+ Converting each write-type data response packet into a standard data response

corresponding to the original read requests with the initiator's address as the 338 destination address.

• Upon transmitting the last write type data responses in this series, clearing the stored information regarding this transaction. Also relinquishing control of the

340 shared resource to allow other agents to issue their transactions.

+ Checking for the last/final packet annotation in the response packets. Upon

converting the last write-type response packet in this series, clearing the stored

342 information regarding this transaction.

• Communicating the N number of data responses to the original burst read

344 requests across the signal interface to the initiator.

l End

Figure 3d

He

ad

er f

ield

s ,-----

Re

qin

f\

--

Bur

st

Bur

st

Bur

st

CM

D

Add

ress

B

lock

S

eque

nce

Leng

th

474

476

478

480

482

Fig

ure

4a

490~

I I M

Bur

stL

engt

h

Bur

stB

iock

Str

ide

4' ~'

;-\'

;-

Fig

ure

4b

Bur

st

Bur

st

Dat

a B

lock

B

lock

S

ize

Str

ide

484

486

~2

! I

(

488

Bur

stB

lock

Siz

e

4~2

e • 00

• ~

~ ~

~ =

~

0 (') .... N

~ N

0 0 -....l rFJ = ('D ('

D .....

QO

0 ..... .... 0 d rJl

-....l

'N

-....l

-....l

\c

-....l u. =

N

Re

qu

es

t In

itia

tor

OC

P

Re

sp

on

se

RD

32

F

OU

R

Ox

O 0

0 0

0 0

0 0

(

4 )

OV

A

(Da

ta

)

RO

3 2

T

WO

O

x0

00

00

00

4

( 4

) O

VA

(D

ata

J

I RD

32

T

WO

O

xO

OO

OO

OO

B

( 4

J 50

2 I O

VA

(D

ata

I

RD

3 2

L

AS

T

Ox

OO

OO

OO

O

( 4

J --

OV

A

(Da

ta

J

I R

eq

ue

st

IB·I

M

0 C

P

Re

sp

on

se

\ R

eq

ue

st

TM

-IB

O

CP

R

es

po

ns

e

-

~ 3 2

. 1 o

.

OV

A

WR

3 2

OF

LT

l O

xF

FlO

OO

OO

0

xF

D

ata

O

( l

• O

VA

) O

VA

W

R3

2 L

AS

T

Ox

OO

OO

OO

OO

O

xF

N

A

4.

WR

32

D

FL

Tl

Ox

FF

l00

00

4

Ox

F

Da

ta

l (

1 •

OV

A)

OV

A

WR

32

O

FL

Tl

Ox

FF

10

00

08

0

xF

D

ata

2

( 1

• O

VA

) O

VA

504

WR

32

L

AS

T

Ox

FF

lOO

OO

C

Ox

F

Da

ta3

(1

,

OV

A)

OV

A

'---

\ /s

e ~

Re

qu

es

t

WR

12

B

LA

ST

O

xO

OO

OO

OO

O

OxO

OO

F r

/ I

WR

12

B

LA

ST

O

xF

F1

D0

00

0

Ox

FF

FF

(D

ata

3 ,

Da

ta

2, D

ata

l, D

ata

O)

DV

A ~

.n

--------

---~

·---

> ....

~··

Re

qu

es

t T

M-T

B

OC

P

R

on

se

50

8 R

eq

ue

st

TB

-IM

O

C

LA

ST

O

xO

0 0

0 0

0 0

0

0 X

OF

N

A

(32

O

VA

R

D)

WR

12

8

LA

ST

0

x F

F 1

0 0

0 0

0

0 X

F F

FF

(D

ata

3,

( 1

----

--·-

··

\ R

eq

ue

st

\ """

'""

RD

64

T

WO

o

xo

oo

o 0

00

0

DV

A

IDa t

a

1,

D a

aO

) 51

0 R

O 6

4

LA

ST

o

xo

oo

o 0

00

8

Ta

rge

t O

CP

D

VA

(O

ata

3.D

a

a 2

I

---------

----------

--

Figu

re 5

e • 00

• ~

~ ~

~ =

~

0 (') .....

N

~ N

0 0 -....l

rFJ = ('D ('

D .....

\0

0 ..... .... 0 d

rJl

-....l

'N

-....l

-....l

\c

-....l u. =

N

~----

1 60

8

L

Req

uest

In

itia

tor

OC

P

Res

pons

e ----

RD

32

FOU

R

OxO

OO

OO

OO

O

(Ox

00

20

, 2

, 1

, B

LT

, 4

) rN

A

(Da

taO

)

RD

32

TW

O

Ox

00

00

00

04

(O

x0

02

0,

2,

1,

BL

T,

4)

OV

A

(Da

tal)

RD

32

'IW

O

O:x

00

00

00

08

(O

x0

02

0,

2,

1,

BL

T,

4)

OV

A

(Da

ta2

J

RD

32

LA

sr

OxO

OO

OO

OO

C

(0x

00

20

, 2

, l,

B

LT

, 41

O

VA

(D

ata

3)

RD

32

FOU

R

Ox

00

00

00

20

(O

x0

02

0,

2,

1.

BL

T,

4) I

602

RD

32

TWO

O

x0

00

00

02

4

(Ox

00

20

, 2

, 1

, B

LT

, 4

) R

D32

1W

O

Ox

00

00

00

28

(0

x0

02

0,

2,

l,

BL

T,

41

RD

32

LA

ST

O

x0

00

00

02

C

(Ox

00

20

, 2

, 1

, B

LT

, 4

)

OV

A

{D

ataO

l O

VA

(D

ata

l)

OW

\ (D

ata

2)

OV

A

(Da

ta3

) I

_____

.. ___ .. _

__

Req

uest

IB

-IM O

CP

\r--

----

--,

Res

pons

e R

eque

st

TM

-IB

OC

P

WR

32

LA

ST

OxO

OO

OO

OO

O

OxF

{0

:<0

02

0,

2}

{32

, B

LT

, 4

, RD

} W

R32

D

FL

Tl

OxF

FlD

OO

OO

O

xF

Dat

aO

(0,

llVA

l 1'2

R32

D

FL

Tl

Ox

FF

lD0

00

4

OxF

D

ata

l (0

, O

VA

) W

R32

D

FL

Tl

Ox

FF

10

00

08

O

xF D

at.a

2

{0,

DV

A)

WR

32

lAS

T

OxP

FlD

OO

OC

O

xP

Data

l (0

, D

VA

} W

R32

D

FL

Tl

0:x

FF

10

00

00

O

xF

Dat

aO

(1,

OV

A}

WR

32

DF

LT

l O

xP

FlD

00

04

O

xP

Data

l ( 1

, O

VA

} W

R32

D

FL

Tl

OxF

FlD

OO

OS

O

xF D

ata

2

(1,

OV

A}

604

WR

32

LA

ST

Ox.

FFlO

OO

OC

O

xF

Oata

3

( 1,

DV

A}

Req

uest

~--------------------------------------~~----------------,

WR

128

IAS

T

OxO

OO

OO

OO

O

OxO

OO

P (O

x0

02

0,

2}

(32

, 60

6 /

WR

128

LA

ST

O

xFFl

DO

OO

O

OxF

PF

F

(Dat

a3 ,

Oata

2, D

ata

l, D

ata

OJ

CO,

OV

Al

WR

128

LA

ST

OxF

FlD

OO

OO

O

xFF

FF

(D

ata

3,

Data

2, D

ata

l, D

ataO

) (1

, D

VA

)

Req

uest

T

M-T

B O

C

Res

pons

e R

eq

ue

st

TB

-IM

OC

P

WR

128

LA

ST

O

xFP

lDO

OO

O

OxF

PF

F

WR

64

LA

ST

O

xOO

OO

OO

OO

O

xOF

(0

x0

02

0,

2)

(32

, B

LT

, 4

, R

D)

l WR

128

LA

sr

OX

F'Fl

DO

OO

O

Ox

FF

FF

Req

uest

T

arge

t O

CP

R

espo

nse

RD

64

'IW

O

OxO

OO

OO

OO

O

UV

A

(Da

tal,

Da

taO

) R

D6

4

LA

ST

O

x0

00

00

00

8

DV

A

(Da

ta3

,Da

ta2

) R

D64

'IW

O

Ox

00

00

00

20

ov

.a.

(Data

l, D

ata

O)

RD

64

L

AS

T

0x0

00

00

02

8

OV

A

(Da

ta3

, D

ata

2)

610

Fig

ure

6

(Data

3,

Oata

2, D

ata

l, D

ataO

) (D

ata

l, D

ata

2, D

ata

l, D

ataO

)

_j

Re

spo

nse

I

e • 00

• ~

~ ~

~ =

~

0 (') .... N

~ N

0 0 --.l

rFJ = ('D ('

D ..... .... 0 0 ..... .... 0 d rJl

---l

'N

---l

---l

\c

---l u. =

N

US 7,277,975 B2 1

METHODS AND APPARATUSES FOR DECOUPLING A REQUEST FROM ONE OR

MORE SOLICITED RESPONSES

TECHNICAL FIELD

Embodiments of the present invention pertain to the field of communication fabrics, and, more particularly, to a shared interconnect in a System On a Chip.

BACKGROUND 10

2 FIG. la illustrates a block diagram of an embodiment of

a communication fabric with translation intelligence coupled to the communication fabric.

FIG. lb illustrates a block diagram of an embodiment of a shared interconnect with intelligent network adapters coupled to the shared interconnect.

FIG. 2 illustrates an embodiment of an example pipelined arbitration process without blocking or polling for solicited responses.

FIGS. 3a through 3d illustrate a flow diagram of an embodiment of a request packet and response packet transaction over a shared resource.

In classical bus-based architectures, communications between on-chip cores use a blocking protocol. Specifically, while a transfer is underway between an initiator and a 15 target, the bus resources are not available for any other transfers to occur.

FIG. 4a illustrates an example block transaction request packet for a two-dimensional data object.

FIG. 4b illustrates an example frame buffer to store multimedia data for display on a display device.

FIG. 5 illustrates an example conversion of multiple read requests into a single request packet and the associated responses.

Some on-chip interconnect architectures incorporate the use of pipelined polling protocol that alleviates the main inadequacy of a blocking protocol, yet still losing efficiency 20 when communicating with targets with high and unpredictable latency.

FIG. 6 illustrates an example conversion of a burst transaction for two-dimensional data converted into a single request packet and the associated responses.

For example, when an Intellectual Property (IP) core issues a read request to an on-chip SRAM device with predictable short latency, the response may be guaranteed to 25 become available on the bus during the first attempt by the initiator to accept it. When an IP core issues a read request


In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the invention. However, it will be understood by those skilled in the art that certain embodiments of the present invention may be practiced without these specific details. In other instances, well-known methods, procedures,

to an off-chip DRAM device with unpredictable and often high latency, multiple accesses to the bus may be required before the response becomes available to be accepted by the 30 requesting entity. Each such access to the bus results in wasted cycles that ultimately degrade the overall bandwidth and efficiency of the system.

components, and circuits have not been described in detail so as to not obscure the presented embodiments of the invention. The following detailed description includes sev-SUMMARY

Embodiments of apparatuses, systems, and methods are described for communicating information between functional blocks of a system across a communication fabric. The communication fabric implements either 1) a protocol that blocks the communications fabric during a time between transmission of a request and a transmission of an associated response, 2) a protocol that polls a responding block to solicit a response to an issued request, or 3) a protocol that only transfers data in the same direction as requests across the communication fabric. Translation logic couples to the communication fabric. The translation logic implements a higher level protocol layered on top of an underlining protocol and the communication fabric. The translation logic converts one initiator transaction into two or more write transactions and then transmits the write transactions using the underlining protocol of the communication fabric. The translation logic converts the initiator transaction into two or more write transactions and then

35 era! network adapters, which will be described below. These network adapters may be implemented by hardware components, such as logic, or by a combination of hardware and software. The term data response and response packets should both be construed to be responses. The term request

40 and request packets should both be construed to be requests. A transaction may be a complete unit of data communication that is composed of one or more requests and one or more associated responses. In a write transaction, the direction of data transmission is the same as the associated requests. In

45 a read transaction, the direction of data transmission is the same as the associated responses. The purpose of a transaction may be to move data between functional blocks. The association between the one or more requests that form a transaction may be based on transferring one or more data

50 words between the same initiator and target wherein the one or more data words have a defined address relationship. A single request may be associated with the transfer of one or more data words. Similarly, a single response may be associated with the transfer of one or more data words. A transmits the write transactions using the underlining pro

tocol of the communication fabric so that the communica- 55 write transaction may move data from an initiator to a target, in the same direction as the request. A read transaction may move data from a target to an initiator, in the same direction as the response. The number of requests and the number of responses that form a transaction may be the same, or there

tion fabric does not block or poll for responses, and that data may be transferred in a direction opposite from the initiator transaction request.

Other features and advantages of the present invention will be apparent from the accompanying drawings and the detailed description that follows.


Embodiments of the present invention are illustrated by way of example and are not intended to be limited by the figures of the accompanying drawings, in which:

60 may be more requests than responses, or there may be more responses than requests. A request may be transferred as part of a transaction. The communication fabric may use coupled resources for transmitting requests and responses. Alternately, the communication fabric may use separate resources

65 for transmitting requests and responses. An on-chip interconnect may be a collection of mechanisms that may be adapters and/or other logical modules along with intercon-

US 7,277,975 B2 3

necting wires that facilitate address-mapped and arbitrated communication between multiple functional blocks on an SOC (System-on-Chip). A burst may be a set of transfers that are linked together into a transaction having a defined address sequence and number of transfers. A single (nonburst) request on an interface with burst support may be encoded as a request with any legal burst address sequence and a burst length of 1.

Apparatuses, systems, and methods are described for communicating information between functional blocks across a communication fabric. Translation intelligence connects to a communication fabric. The translation intelligence may include detection and conversion logic. The detection logic detects for a read request containing burst information that communicates one or more read requests in a burst from an initiator Intellectual Property (IP) core that are going to related addresses in a single target IP core. The conversion logic converts the one or more read requests to a single request with annotations in a field of the request to indicate how many read requests were combined and the addresses associated with each read request based upon the addresses in the target IP core being related. The series of burst transactions, such as a non-incrementing address pattern burst transaction may be indicated as a block transaction. A request generated for the block transaction includes annotations indicating a length of a raster line occupied by a target data, a number of rows of raster lines occupied by the target data, and address spacing between two consecutive raster lines occupied by the target data. The conversion logic converts the one or more read requests to a single request with annotations in a field of the request to indicate how many read requests were combined and the addresses associated with each read request based upon the addresses in the target IP core being related. If the block transaction is for two-dimensional data then the single request packet also includes 1) a width of the two-dimensional object, 2) a height of the two-dimensional object measured in the number of rows that the two-dimensional object will occupy, and 3) a stride of the two-dimensional object is measured in the address spacing between the addresses of two consecutive raster lines.

FIG. 1a illustrates a block diagram of an embodiment of a communication fabric with translation intelligence coupled to the communication fabric. The system may include a plurality of initiators 2-8, a plurality of targets 10-12, and a communication fabric 14. Information between the functional blocks 2-12 is communicated across the communication fabric 14, such as an interconnect, a bus, a network on a chip, or similar communication structure in a system. Translation intelligence 16-26 connects to the communication fabric 14. A request from an initiator, such as the first initiator 2, may be communicated to a target, such as the first target 12, over the communication fabric 14 to solicit one or more responses from the target. The request and the one or more responses form a transaction. The transfer of the request and the one or more responses may be split by communicating the one or more responses to the initiator when the one or more responses become available without the initiator having to poll for the communicated responses or block the communication fabric 14 waiting for a transmission of the one or more responses. The initiator is decoupled from the waiting on the one or more responses to the issued request because the initiator may issue additional requests prior to receiving the one or more responses. The initiator is decoupled from the waiting on the one or more

4 responses to the issued request because the initiator may relinquish control of the communication fabric 14 prior to receiving the response.

The translation intelligence 16-26 may include detection logic and conversion logic. The detection logic detects for a read request containing burst information that communicates one or more read requests in a burst transaction from an initiator, such as an Intellectual Property (IP) core, that are going to related addresses in the single target. Thus, the

10 translation logic may detect for a burst transaction communicating either an incrementing address burst transaction or a non-incrementing address pattern burst transaction. The conversion logic converts the one or more read requests in the burst transaction to a single request with annotations in

15 a field of the request to indicate how many read requests were combined and the address sequence associated with the transaction. The detection logic communicates to the conversion logic that the burst information is detected. Alternatively, the detection logic converts the one or more read

20 requests to the single request with annotations if an address of the target indicates that it is capable of decoding the annotations in the single request.

The translation intelligence on the target side includes conversion logic to convert the single request with annota-

25 tions into an original number of read requests, where each read request has its original target address.

A high level protocol implemented by the translation intelligence 16-26 may provide significant performance enhancements, such as the decoupling of the request from an

30 initiator from the one or more solicited responses from a target, converting one or more read requests in a burst transaction to a single request with annotations, as well as feature-set enrichment, specifically in regards to non-incrementing (2-dimensioinal block, wrap, XOR) burst

35 sequences. The high level protocol may be implemented on top of an underlining transaction level protocol implemented by an existing communication fabric. In an embodiment, the translation intelligence implements a higher level protocol layered on top of an underlining protocol and the commu-

40 nication fabric. The underlining protocol and communication fabric may implement either 1) a protocol that blocks the communications fabric during a time between transmission of a request and a transmission of an associated response, 2) a protocol that polls a responding block to

45 solicit a response to an issued request, or 3) a protocol that only transfers data in the same direction as requests across the communication fabric. Thus, the existing communication fabric may be a polling, a blocking and/or write only type communication fabric. However, the high level proto-

50 col cooperates with the existing polling and/or blocking transaction level layered protocol to provide the above significant performance enhancements as well as feature-set enrichment. Thus, the translation intelligence adds a capability to do something the underlining protocol was not

55 capable of when the underlining protocol and communication fabric was originally designed.

In an embodiment, the translation intelligence 16-22 coupled to the initiators may convert one or more burst read requests from an initiator to a single request packet write.

60 The translation intelligence 24-26 coupled to the target may convert the single request packet write back to one or more burst read requests. The translation intelligence 24-26 coupled to the target may convert the burst read responses to response packet writes. The translation intelligence 16-22

65 coupled to the initiator may convert the response packet writes back to burst read responses solicited by the original burst read requests.

US 7,277,975 B2 5

The translation logic converts one 1mtlator transaction into two or more write transactions and then transmits the write transactions using the underlining protocol of the communication fabric so that the communication fabric does not block or poll for responses, and that data may be transferred in a direction opposite from the initiator transaction request.

6 but a shared interconnect 144 will be used as an example shared resource. The two or more network adapters 120-142, such as bridge modules with translation intelligence, connect to the communication fabric to form a distributed arbitration mechanism for access to the shared resource.

Note, generally protocols that only transfer data in the same direction as requests across the communication fabric are, for example, a write only network, a packet based 10

network, and other similar networks. A write only network has two separate ports. The first port only issues write type transactions. The second port only receives write type transactions. The interface does not receive read requests or responses. Some communication systems have one input/ 15

output port that issues and receives both Read and Write

The underlining protocol and communication fabric may implement either 1) a polling based protocol that solicits responses to an issued request over the communication fabric, 2) a blocking based protocol that blocks the communication fabric waiting for the one or more responses to become available for consumption by an initiator functional block, or 3) a write-only type network. The translation intelligence wrapped around the network adapters adds a capability to do something the underlining protocol was not capable of when the underlining protocol and communication fabric was originally designed. The translation intelli-

transactions. The first initiator may then communicate a single request

fully describing attributes of a two-dimensional data block across the communication fabric to a target to decode the single request. The single request may contain am10tations indicating a length of a row occupied by a target data, a number of rows occupied by the target data, and an address spacing between two consecutive rows occupied by the target data. Address spacing between two consecutive rows can be 1) a length difference between the starting addresses of two consecutive row occupied by the target data, 2) a difference between an end of a previous rows to the beginning of next row or 3) similar spacing.

Transmitting all of this information in a single twodimensional block request provides additional eff1ciency of temporal proximity important to efficient accesses to offchip DRAM-type memory devices that may be SDRAM, DDR SDRAM, DDR2-SDRAM, etc. Annotating the initial target address of the transaction to the early request phase of the transaction provides additional efficiency by communicating the locality information to the memory subsystem before the remaining attributes arrive during the later data phase of the transaction.

FIG. 1b illustrates a block diagram of an embodiment of a shared interconnect with intelligent network adapters coupled to the shared interconnect. A plurality of initiator Intellectual Property (IP) cores 102-118 may couple to a corresponding network adapter via a signal interface. An IP core may be a discrete wholly integrated functional block of logic that performs a particular function, such as a memory component, a wireless transmitter component, a Central Processing Unit (CPU) 102, Digital Signal Processors 116, hardware accelerators such as Moving Pictures Experts Group video compression components 104 and 106, Direct Memory Access components 118, etc. for a System On a Chip (SOC). Initiator Intellectual Property (IP) cores on the SOC may be CPUs 102, multimedia chip sets 108, etc. Some of the network adapters will be initiator network adapters, such as the first network adapter 120. Some of the network adapters will be target network adapters, such as the second network adapter 122. Target IP cores may be Memory Schedulers 118, PCI bus controllers 120, etc. Translation intelligence may be wrapped around a typical distributed network adapter connected to the shared interconnect such as the translation intelligence 124 wrapped around the first network adapter 120. The translation intelligence 124 may be added as an initiator bridge module with translation intelligence and/or a target bridge module with translation intelligence.

Note the described communication mechanism and network adapters may be used for any communication fabric

gence implements a higher level protocol layered on top of the underlining protocol and the communication fabric. The translation intelligence separates the transaction of the

20 issued request from the responses by communicating the responses to the initiator when the responses become available without the initiator having to poll for the communicated responses. The translation intelligence may also combine multiple initiator read requests into a single write type

25 request that carries the number of the multiple requests that are combined and an address sequence associated with the multiple requests.

In operation, a first initiator IP core 102 may generate a read request with additional pieces of information including

30 a burst length and a burst sequence. A burst length communicates that multiple read requests in this burst are coming from this same initiator IP core and are going to related addresses in a single target IP core. A burst type communicates the address sequence within the target IP core. The

35 burst type may indicate that the request is for a series of incrementing addresses or non-incrementing addresses but a related pattern of addresses such as a block transaction. The burst sequence may be for non-trivial 2-dimensional block, wrap, xor or similar burst sequences. If the block transaction

40 is for two-dimensional data then the request also includes 1) a width of the two-dimensional object measured in the length of the row (such as a width of a raster line), 2) a height of the two-dimensional object measured in the number of rows the two-dimensional object will occupy, and 3) a stride

45 of the two-dimensional object that is measured in the address spacing between two consecutive rows.

The first initiator IP core 102 communicates multiple read requests across a first signal interface 146 to the first initiator network adapter 120 connected to the shared interconnect.

50 The Initiator IP cores communicate to the shared interconnect 144 with intelligent network adapters 120-142 through signal interfaces, such as the first signal interface 146. In an embodiment, the standardized core-signaling interface (also known as a "core socket") may be provided by the Open

55 Core Protocol. Target IP cores also communicate to the shared interconnect 144 with intelligent network adapters 120-142 through these signal interfaces.

The translation intelligence 124 in the initiator network adapter may detect for the presence of the additional pieces

60 of information in the read request. If detected, the initiator network adapter and the target network adapter communicate requests and responses to each other through special write-type request packets and response packets with annotations added to these special request packets and response

65 packets. If the burst information is detected, the translation intelligence 124 in the initiator network adapter 120 converts the multiple read requests to a single write request with

US 7,277,975 B2 7

annotations in the data field of the write-type request packet to indicate how many read requests were combined, such as a burst length annotation, and the addresses associated with each read request, such as a burst address sequence annotation. Note, in an embodiment, control fields, such as a Reqinfo can be sub fields embedded within the data field of the write-type request packet. The Reqinfo field can be used as a location to communicate additional annotations.

The additional pieces of information detected for by the translation intelligence 124 in the initiator network adapter 10

120 may also be the target address of an incoming burst request. The initiator translation intelligence 124 may perform an address look up on the target address to determine if the target is capable of decoding the annotations. The initiator translation intelligence 124 converts the one or 15

more read requests to the single write-type request packet with annotations based upon an address of the single target indicating that it is capable of decoding the annotations in the single request packet. The initiator network adapter 120 may bypass the translation intelligence 124 and the single 20

request packet conversion process if the address of the target is not listed as capable of decoding the annotations of the single request packet.

In an embodiment, the address decoding logic in each initiator bridge module 124 labels certain regions as packet- 25

capable targets (e.g. those targets serviced by a network adapter with translation intelligence). All transactions not addressed to these regions will be sequenced as direct transactions. A direct transaction occurs as a typical request or response transaction across the communication fabric and 30

doesn't undergo the single request packet conversion process when communicated over the shared resource. The direct transactions essentially bypasses the translation intelligence causing a request to be forwarded directly to a standard initiator network adapter module, and a response to 35

be forwarded directly back to the initiating IP core. Thus, the detection logic in the translation intelligence

124 detects for additional information in a read request containing burst information that communicates one or more read requests in a burst from an initiator Intellectual Prop- 40

erty (IP) core that are going to related addresses in a single target IP core. The detection logic may also perform an address lookup of the target address to determine whether the target is capable of decoding the annotations of the special single write-type request packet. If not capable, then 45

the network adapter performs a standard direct transaction. The detection logic may commnnicate to conversion logic in the translation intelligence 124 that the burst information is detected.

8 the shared resource. The transmission logic in the initiator network adapter 120 transmits this single write-type request packet with annotations. The initiator network adapter 120 relinquishes control of the interconnect 144 to allow other network adapters 122-142 to issue their transactions after the single request packet with annotations is transmitted. The initiator network adapter 120 may also issue additional requests prior to receiving the solicited responses.

The second network adapter 122, a target network adapter, contains receiving logic to receive the single request packet with annotations and detection logic to detect the annotations. The second network adapter 122 also contains conversion logic to convert the single request packet with annotations into the original number of read requests, where each read request has its original target address. The translation intelligence 148 in the target network adapter decodes annotations of the single write-type request packet, such as burst length and burst address sequence, to perform the conversion. The translation intelligence 148 also stores both the initiator's address, such as an initiator's identification tag, and the number of read request in this burst series that were combined into this single write-type request packet.

The target network adapter 122 transmits the converted number of read requests across a second signal interface 150 to the target IP core. The target IP core, such as a Memory Scheduler 118, generates responses to the multiple read request. Each response carries data in bit words.

The initiator network adapter 120 does not need to check on the status of the target IP core's generation of responses to the multiple read request. The initiator network adapter does not need to poll. The target IP core 118 commnnicates the multiple responses to the read requests across the second ssignal interface 152 to the target network adapter connected to the shared interconnect 144 as the one or more responses become available without the initiator having to poll for the communicated responses. The translation intelligence 148 in the target network adapter 120 receives the two or more responses to the two or more number of read requests, each response carrying data in bit words. Note, "N" number may indicate a numerical number such as 2, 5, 8, etc.

The translation intelligence 148 in the target network adapter 122 converts each data response into a special write-type response packet with the address of the initiator in the address field of the write-type response packet. The target network adapter 122 generates the address in the address field of the write-type response packet by using the stored address of the original initiator's address, such as a con ID, identification tag, etc., in the translation intelligence 148. The translation intelligence 148 in the target network

50 adapter 122 notes the number of response packets in this series sent back to the initiator network adapter 120. The target network adapter 122 annotates the last response packet in this series as the last/final packet in a control field

The single request as well as the direct requests may be transmitted using the underlining protocol of the communication fabric. The transfer of the request is decoupled from the one or more responses by communicating the one or more responses to the initiator when the one or more responses become available without the initiator having to 55

poll for the responses or block the communication fabric waiting for the responses. The request and the one or more responses form a fully decoupled Single Request multiple Data (SRMD) transaction. The implementation allows for simultaneous utilization of the decoupled SRMD protocol 60

for access to high latency targets and the native polling protocol of the underlining fabric for access to low latency targets.

The initiator network adapter 120 gains access to the shared interconnect 144 by winning a round of arbitration. 65

The initiator network adapter 120 transmits this single write-type request packet with additional annotations over

such as the Reqinfo field. The target network adapter 122 gains access to the com-

munication fabric by winning a round of arbitration. The target network adapter 122 transmits the multiple write-type data response packets with annotations over the shared interconnect 144.

The translation intelligence 124 in the initiator network adapter 120 receives the write-type data response packets and detects for the presence of the annotations. If detected, the translation intelligence 124 in the initiator network adapter 120 converts each write-type data response packet into a standard data response to the read request.

Upon transmitting the last write-type data response packet in this series, the translation intelligence 148 in the target

US 7,277,975 B2 9

network adapter 122 clears its stored information regarding this request and response transaction. The target network adapter 122 relinquishes control of the communication fabric to allow other network adapters 120, and 126-142 to issue their transactions after each response packets is transmitted.

10 write request carrying in-band information fully describing the burst and the response routing. The single request may carry in-band information fully describing the burst and the response routing. If the address sequence represented by the burst type is determined to be non-incrementing, additional attributes of the specific burst type may be annotated to the request.

FIG. 2 illustrates an embodiment of an example pipelined arbitration process without blocking or polling for

The translation intelligence 124 in the initiator network adapter 120 checks for the last/final packet annotation in the response packets. Upon converting the last write-type data responses in this series, the translation intelligence 124 in the initiator network adapter 120 clears its stored information regarding this transaction.

The initiator network adapter 120 communicates the multiple data responses to the initial read requests across the first signal interface 146 to the initiating IP core 102.

10 solicited responses. The interconnect with intelligent network adapters may be non-blocking. A first initiator network adapter may win access to the shared resource, such as the interconnect, on a first clock cycle of the communication fabric 260. The first initiator network adapter may send a

Thus, the shared interconnect 144 with intelligent network adapters 102-142 may use a set of extensions with the write-type request and response packets to facilitate accelerated burst performance for multimedia and computing initiator cores.

15 request with command and address annotations across that communication fabric on a second clock cycle of the communication fabric 262. The second initiator network adapter may also win access to the communication fabric on the second clock cycle of the communication fabric 262. The

20 second initiator network adapter may send a request with command and address annotations across that communication fabric on a third clock cycle of the communication fabric 264.

In an embodiment, the communication fabric with intelligent network adapters 102-142 implements the higher level protocol that has the ability to enhance the burst read capabilities for target cores with long and/or variable latencies. The communication fabric with intelligent network adapters 102-142 accomplishes this task by converting most read transactions from initiator IP cores from multiple transactions into a single write-type transaction. The writetype transaction carries the read request information from the initiator IP core to the target IP core. Also, one or more 30

write transactions carry read response packets and data from the target IP core back to the initiator core. Thus, the communication fabric with intelligent network adapters 102-142 may convert a Read access into an implicit Write back to the source to improve efficiency of access to targets 35

with long or variable latency such as Dynamic RAM-based memory subsystems.

The first initiator network adapter may again win access 25 to the communication fabric on a third clock cycle of the

communication fabric 264. The first initiator network

The communication fabric with intelligent network adapt-ers 102-142 may be a configurable, scalable System On a Chip inter-block communication system. The interconnect 40

144 with intelligent network adapters 102-142 decouples IP cores from each other in the system. This enables IP cores to be independently designed and reused without redesign. The network adapters 102-142 are mated to an IP core to provide a decoupling of core functionality from inter-core 45

communications. The single request may also have armotations indicating

the width of the data bus of the initiator, which corresponds to represent the unit size of the requests. The translation logic being configured to detect and decode a width of the 50

data bus communicated in a request allows IP cores that vary

adapter may send another request with command and address armotations across that communication fabric 264 on a fourth clock cycle of the communication fabric 266.

On a fifth clock cycle 268, a response to the second request has been generated. The response has become available. The second target network adapter may win access to the shared resource. On a sixth clock cycle 270, a response to the second request may be transmitted across that shared resource. Also on a sixth clock cycle 270, a response to the first request has been generated and a first target network adapter may win access to the shared resource. On a seventh clock cycle 272, a response to the first request may be transmitted across that shared resource.

Thus, an initiating core may issue two or more requests prior to receiving a response to the first request because the request is processed in a non-blocking pipeline process. The first network adapter issued requests on the second clock cycle 262 and the fourth clock cycle 264. The second communication fabric clock cycle 262 started a delay period in communication fabric clock cycles to when the data responses to the read request would be generated. The target then arbitrates for use of the communication fabric when the responses are available. The initiator does not poll, i.e. generate status checks transmitted across the communication fabric, to check on the status of the availability of the response. Thus, the initiating core may issue two or more requests prior to receiving a response to the first request and does not need to block the communication fabric waiting for a transmission of the one or more responses.

As discussed, the interconnect with intelligent network adapters may be a non-blocking interconnect architecture using a higher level split-transaction protocol to communicate between subsystems on the chip. The arbitration process

in width to connect to same communication fabric with annotations that show the data bus width of the requesting device. For example, an on chip processor may have a port with data width of 64 bits wide. The on-chip processor may 55

send a burst request for multiple responses to a memory that has a port with data width of 128 bits wide. The 128 bits wide memory may send half the number of requests from what the initiator thinks it needs at twice data bandwidth. The translation logic knows data width of the target and detects the data bandwidth of the requesting device from the annotation in the request. The translation logic performs a data width conversion to communicate the unit size of the request. The unit size of a burst length from an initiator may

60 to deliver the data response is separated from the arbitration process to send the read request. A read transaction is initiated by transmitting the annotated read request to all other network adapters connected to the interconnect. The issuing network adapter then relinquishes control of the

be in bytes or words of predefined width. The high-level hardware protocol converts the series of

one or more requests from an initiator to a single read or

65 interconnect, allowing other network adapters to issue their transactions. When the data is available, the target network adapter supplying the data, typically memory, arbitrates for

US 7,277,975 B2 11

the interconnect, and then transmits the data along with the initiators address and transaction ID so that the original issuer can match the data with the particular request. The interconnect is non-blocking because the issuing network adapter relinquishes control of the interconnect to allow other network adapters to issue their transactions. The non-blocking transaction protocol enables multiple individual initiator-target transactions to occur concurrently across the interconnect structure. The protocol allows each network adapter to have up to multiple transactions in 10

progress at one time. Referring to FIG. 1, in an embodiment, all of the request

packet transfers over the interconnect 144 with intelligent network adapters 102-142 can be memory-mapped. Unique address spaces may be assigned to all target network adapt- 15

ers. Up to four non-overlapping address spaces can be assigned to each target network adapter. A System Address Map configured by a user in the Address Map pane of Intellectual Property generator tool can be implemented as a distributed collection of address match registers located 20

inside the corresponding target network adapters. Each address match register corresponds to a target address space. This helps initiating network adapters determine whether a transaction addresses a target region defined in the target network adapter and is capable of receiving and processing 25

request packet transactions. In an embodiment, the internal packet format between

network adapters may be as follows. Each request packet has a packet header. This header contains information carried as in-band request qualifiers. Each response packet has its 30

header carried as in-band request qualifiers; all data fields associated with response packets contain the actually requested read data for the overall transaction, except in the case of errors.

All packet headers in the protocol may be constructed 35

having defined fields in the header part of the packet. The packet header may contain various field entries such as Command type, (i.e. read transaction, write transaction or WriteNonPost transaction), Burst incrementing code, Address sequence, and a Reqinfo sub-control field for 40

additional annotations. Further, block transactions can be supported and may be treated as a series of incrementing bursts.

12 packet format assists to achieve burst acceleration and payload throughput in the interconnect 144 with intelligent network adapters 102-142 by implementation of a SingleRequest/Multiple-Data (SRMD) internal transfer protocol. Request packets may be formed using a simple (posted) Write burst of one or more transfers. The address field of the first transfer in the packet specifies initiator word-aligned starting address for the transaction specified by the request packet. The lowest address bits may be determined from the ByteEn field. The packet header carries the first address and Reqinfo field. For read packets, the initiating network adapter issues a single transfer carrying the packet header. This transfer uses a LAST armotation as its burst code. For write request packets, the translation intelligence 124 propagates burst codes along with the addresses received from the initiator IP core to the target network adapter. The target network adapter may ignore these later addresses, and simply use a local address sequencer to generate the appropriate target protocol addresses.

The initiator bridge module with translation intelligence 124 annotates most of the request packet header via its Reqinfo output to the initiating network adapter. The user Reqinfo field may be located after the header fields and/or in the data field. The width of the Reqinfo field can be user-programmable.

If a new transaction request of an initiating network adapter is addressing a target network adapter, it can be remapped to a pseudo-SRMD transaction with the request format described above.

The response packet format may differ from the request packet. The Response Packet Format may have a CMD field, an address field, a burst field that also indicates the Last response in this series, a control Reqinfo field, and a data field. The response packet format uses a few bits of the Reqinfo field to transmit its header. The MData field may be annotated as a "don't care" on all non-posted write responses as well as on read error responses, and should be set to 0. For successful reads, the MData field carries the read data. The address field can be composed of a fixed field which identifies the response packet address region where initiating the network adapter is located in the address space, a response target ID, such as an initiating network adapter SBConniD, and a number of reserved bits. The Reqinfo field When an initiating network adapter decodes a Write,

WriteNonPost, or Read transaction with a valid incrementing burst code that addresses a target network adapter, the initiating network adapter generates the write-type request packet transaction. All request packet transactions can be issued as incrementing posted-write bursts. The incrementing burst packet header has its Reqinfo sub-fields encoded BurstLength and the initiating network adapter's address information.

45 can also be used to transfer the value of number of responses returned from the target, and a bit indicating whether the current transfer is the LAST in a sequence of response packets to the burst read request packet. Request packets may use two low-order bits of the Reqinfo field to distin-

50 guish between direct transactions (identified as all O's) and response packet (identified as not all O's) transfers.

The protocol works together with the intelligent network adapter to define explicit uses of the Reqinfo field to support enhanced transaction options. In addition to the capabilities 55

of the legacy initiator network adapter, the initiator bridge module with translation intelligence 124 accelerates precise incrementing bursts of types RD, WR, and WNP with an arbitrary length between 1 and 63 OCP words, as well as block burst (2-D) transactions, which are characterized by 60

length (width), size (height), and stride. The initiator bridge module with translation intelligence 124 also supports up to 8 bits of user-defined Reqinfo request information.

For direct transactions (except where the address matches a packet response region), the operation can be essentially a bypass of the bridge logic, i.e., the request is forwarded directly to standard initiator network adapter module, and the response is forwarded directly back to the initiating OCP. For packet transactions, the initiator bridge module with translation intelligence 124 uses the result of the address decode, together with the other request fields, to construct the packet header, and to issue a request packet. The typical network adapter logic can be used to provide any required responses for posted writes, and to all but the last response to non-posted write packets. The initiator bridge module with translation intelligence 124 then waits for the response Request and response packets may have some differences

in their formats. The packet may have a CMD field, a burst field, an

Address filed, a control field, and a data field. The request

65 packets, if the associated request expects an explicit response. When a response packet arrives, the initiator bridge module with translation intelligence 124 extracts

US 7,277,975 B2 13

response information from the Reqinfo fields of the packet, and provides the response to the initiating core together with any requested read data.

The target network adapter can receive both direct and packet requests. A request packet may be indicated by annotations in the command field in the Reqinfo header arriving with the request packet. The target bridge module with translation intelligence 148 interprets this packet header, and then issues the appropriate protocol request via the target core interface. Each target bridge module with 10

translation intelligence 148 is capable of propagating/generating all types oflegal bursts, regardless of the capabilities of the attached target, to ensure maximum interoperability. The target bridge module with translation intelligence 148 passes responses to direct requests through to the normal 15

response path and generates response packets in response to a request packet. The header for such responses indicates the response code, and whether the response is last in a burst. The target bridge module with translation intelligence 148 routes the response packet back to the initiator bridge 20

module with translation intelligence 124 wrapped around the requesting initiator network adapter 120.

The target bridge module with translation intelligence 148 tracks the response pipeline, discarding any posted write responses, and all but the last non-posted write responses. 25

The target bridge module with translation intelligence 148 generates response packets for the read responses and the last non-posted write response. The header for such responses indicates the response code, and whether the response is last in a burst. The target bridge module with 30

translation intelligence 148 creates the response address by concatenating some user-defined upper address bits (that map the transfer into reserved spaces in the direct address map) together with the SBConniD of the request. This routes the response packet back to the initiating network adapter. 35

Any read data associated with the response is transported as write data via the Mdata field.

The translation intelligence 124 may have additional logic

14 and communicated over the communication fabric for the block transaction for the two-dimensional data object.

Referring to FIGS. 4a and 4b, the block transaction may be a request for two-dimensional data. The width 490 of the two-dimensional object can be measured in the length of the raster line. The burst block length field 482 may contain the width 490 of the two-dimensional object. The height 492 of the two-dimensional object can be measured in the number of rows of raster lines the two-dimensional object will occupy. The burst block size field 484 may contain the number of rows of raster lines 492 that the two-dimensional object will occupy. The stride 494 of the two-dimensional object can be measured in the length difference between the starting addresses of two consecutive raster lines. The burst block stride field 486 may contain the stride 494 of the two-dimensional object.

The translation intelligence has logic to convert, if a block transaction annotation is detected, the block transaction to the single read or write request packet with annotations in a Reqinfo field of the request to indicate 1) how many read requests were combined, 2) the addresses associated with each read request, 3) the length of the raster line occupied by a target data, 4) the number of rows of raster lines occupied by the target data, and 5) the length difference between the starting addresses of two consecutive raster lines occupied by the target data.

In an embodiment, the fields of the Request Packet Header may have parameters such as a Burst Length 482 with a range of Range: 1-63, BurstB!ock Size 484 of the block burst indicating the number of rows, BurstB!ock Stride 486 indicating the start-to-start spacing of block bursts, and the Initiator information.

Thus, the translation intelligence in the initiator network adapter converts the N number of read requests in the incrementing burst transaction to a single write-type request packet with annotations in a control field of the request packet to indicate how many read requests were combined, such as a burst block, the addresses associated with each read request, such as a burst sequence, the length of a raster to detect for a series of incrementing burst transactions

indicated as a block transaction. The request generated for the block transaction includes annotations indicating that multiple read requests in this burst are coming from this same initiator and are going to related addresses in a single target. The annotations also include 1) a length of the row, occupied by a target data, 2) a number ofrows occupied by the target data, and 3) an address spacing between two consecutive rows occupied by the target data.

40 line the 2D object occupies, the number of raster lines the 2D object will occupy, such as a raster height dimension, and the length difference between raster lines, such as a stride.

FIG. 4a illustrates an example block transaction request packet for a two-dimensional data object. The block transaction request packet 472 may have many sub fields in the header portion of the packet. The block transaction request packet may contain a command field 474, an address field 476, a block burst indicating field 478, a burst sequence field 480, a burst block length field 482, a burst block size field 484, a burst block stride field 486, as well as other similar fields. The Reqinfo field may contain the burst block field 478 through burst length field 482. The burst block size 484 and burst block size 486 may be located in the data field/ payload area. The two-dimensional data may be multimedia data, imaging data, or similar data.

FIG. 4b illustrates an example frame buffer to store multimedia data for display on a display device. The rows of memory cells 488 of the frame buffer may correspond to the width of the raster lines for that display device. The twodimensional multimedia data object may occupy neighboring rows of the frame buffer corresponding to raster lines on the display device. A single request packet may be generated

In an embodiment, the two-dimensional block transaction can be defined as an aggregate transaction of BurstB!ock-

45 Size incrementing Read or Write burst transactions of equal length (MBurstLength), with a constant stride (BurstB!ockStride) between the starting addresses of consecutive member burst transactions. BurstB!ockSize represents the number of member burst transactions (rows). BurstB!ockStride

50 is measured in ip.data_wdth words. Referring to FIG. 1, in an embodiment, the interconnect

144 with intelligent network adapters 102-142 manages to more effectively move the high frame rate video, 3D graphics, and 2D still image data streams found in multimedia

55 applications. The performance of multimedia SOCs can depend on efficiently streaming large amounts of variable data through the device and to and from its single external DRAM memory. The interconnect 144 with intelligent network adapters 102-142 may concurrently handle multiple

60 asymmetrical and variable length data flows at the maximum throughput of the memory channel with an aggregate bandwidth of, for example, up to 4 Gigabyte/second. Configurable levels of QoS (quality of service) enable tailoring of data channel characteristics to meet application-specific

65 performance requirements. The 2D block transaction feature accelerates memory

access to fundamental data structures used in video and

US 7,277,975 B2 15 16

of information including a command type, an address, a burst length, a length, a height of the read request, and a stride of the read request.

In block 304, the initiator IP core communicates the multiple read requests across a signal interface to an initiator network adapter connected to a shared resource, such as an interconnect. Two or more network adapters connected to the communication fabric may form a distributed arbitration mechanism for access to the shared resource.

graphics applications. As a result of these new features the interconnect 144 with intelligent network adapters 102-142 may deliver a sustained level, for example, of 90 percent interconnect utilization in the extremely demanding traffic conditions created by multiple image-based data flows between the multimedia SOC and its memory subsystem. Media and consumer-video SOC chips, for example, have to move multiple streams of data from place to place at high speed. The interconnect 144 with intelligent network adapt-ers 102-142 may be well suited to maintain all that band- 10

width, with MPEG video going here, DSP streams going there, and so on.

In block 306, the translation intelligence in the initiator network adapter detects for the presence of the additional pieces of information in the read request. If detected, the initiator network adapter and the target network adapter communicate requests and responses to each other through Consumer Digital Multimedia Convergence (CDMC)

applications have a large reliance upon the ability to efficiently move two-dimensional graphics, image, and/or video data between processing engines and memory (principally some form of DRAM). The protocol has defined user extension capabilities that can be used to capture 2D-related transaction information in the optional control field known as Reqinfo. The interconnect 144 with intelligent network adapters 102-142 will have an option to be configured to support a defined use of these extension fields to explicitly capture 2D transactions, and will thereby support efficient bridging between 2D-capable initiators and non-2D targets.

As discussed, a block transaction may be simply a chained set of incrementing bursts. The BurstBlockSize (Y-count) parameter determines the number of such bursts (rows) in the set. The BurstB!ockStride parameter specifies the distance (spacing) between the starting addresses of two consecutive rows in the physical memory. The main implications of block transaction's are related to the fact that this operation requires a set of "holding" registers saving the member burst (row) length (X-count), the starting address of the previous row, and another counter (Y-counter) that counts the number of rows. At the beginning of a block transaction the X-count Holding Register is loaded with the RBurstLength value, and the Address Holding Register is loaded with the value of RAddr. Upon arrival of the Data phase of the Packet Header transfer the RBurstB!ockSize (Y-count) and the RBurstB!ockStride registers are initialized. When the X -counter reaches one, it is reloaded with it's original value from it's holding register, while the starting address of the next burst is loaded with the sum of the previous starting address held in it's holding register and the RBurstB!ockStride register. A block transaction is considered completed when both the X and the Y counters are done.

15 special write type request packets and response packets with annotations added to these packets. If not detected, responses and requests are processed as a direct transaction through the normal request and response protocol path. Also, if the address of the target is not indicated as capable of

20 decoding a single request packet, then processing the responses and requests as a direct transaction through the normal request and response protocol path. The translation intelligence/logic may implement a higher level protocol layered on top of an underlining protocol and the commu-

25 nication fabric. In block 308, the translation intelligence in the initiator

network adapter converts the multiple read requests to a single write-type request packet with annotations in the data field of the write request to indicate how many read requests

30 were combined, such as a burst length, and the addresses associated with each read request, such as a burst address sequence. If the multiple read request indicated a nonincrementing address burst block transaction, the single request packet may also include a length of a row occupied

35 by a target data, a number of rows occupied by the target data, and a length difference between starting addresses of two consecutive rows occupied by the target data. If the block transaction is for two-dimensional data then the single request packet also includes 1) a width of the two-dimen-

40 sional object measured in the length of the row, 2) a height of the two-dimensional object measured in the number of rows the two-dimensional object will occupy, and 3) a stride of the two-dimensional object is measured in the length difference between the starting addresses of two consecutive

45 rows. In block 310, the initiator network adapter gains access to

the communication fabric by winning a round of arbitration. In block 312, the initiator network adapter transmits this

single write-type request packet with annotations over the communication fabric. The underlining protocol of the initiator network adapter transmits this single request packet with annotations. The initiator network adapter may transmit the request packet in a non-blocking manner and issue

At the IP OCP, a block transaction can be composed of 50

BurstB!ockSize legal OCP 2.4 bursts. This implies that the Burst field has to contain valid incrementing OCP burst codes with a LAST in the Burst field of the last transfer of every member transaction (row). It is then legal for the initiator to issue both read and write block transactions.

55 additional requests prior to receiving the responses to the initial request. FIGS. 3a through 3d illustrate a flow diagram of an

embodiment of a request packet and response packet transaction over a shared resource.

In block 314, the initiator network adapter relinquishes control of the communication fabric to allow other network adapters to issue their transactions.

In block 316, the translation intelligence in the target network adapter receives the single request packet with annotations and detects for the annotations.

In block 302, the initiator Intellectual Property (IP) core generates a read request containing a piece of information 60

that communicates that N number read requests in this burst that are coming from this same initiator and are going to related addresses in a single target IP core. The pieces of information may also communicate that the multiple read requests are for a block transaction such as two-dimensional multimedia data request. The Initiator IP core may also generate the multiple read request with several more pieces

In block 318, the translation intelligence in the target network adapter converts the single request packet into the

65 original multiple read requests, each read request with its original start address. The translation intelligence in the target network adapter decodes the single write request and

US 7,277,975 B2 17

stores both the initiator's address, such as a con ID, and the number of read requests in this series that were combined into this single write request.

In block 320, the target network adapter transmits the converted multiple read requests across a signal interface to the target IP core.

In block 322, the target IP core generates responses to the original number of read requests, each response carrying data in bit words.

In block 324, the initiator network adapter does not need to check on the status of the target IP core generation of responses to the number of read requests. When the responses are available, they will be communicated from the target IP core.

In block 326, the target IP core communicates the multiple responses to the read request across a signal interface to the target network adapter connected to the shared resource.

In block 328, the translation intelligence in the target network adapter receives the data responses and determines if these are responses to the single request packet.

18 FIG. 5 illustrates an example conversion of multiple read

requests into a single request packet and the associated responses.

In block 502, the initiator sends multiple read requests indicating that this is a burst transaction. In block 504, the first instance of the translation logic converts the multiple request to a single write request with annotations in a control field and a data field of the write request to indicate how many read requests were combined into the write request

10 and an address sequence associated with a burst transaction. In block 506, the underlining protocol transmits the single write request over the communication fabric. In block 508, the second instance of translation logic converts the single write request into an original number of read requests, where

15 each read request has its original target address. The second instance of translation logic also performs a data width conversion of the number of original requests from the initiator to match the data width capability of the target.

In block 510, the target generates the responses. The 20 target communicates the number of responses to the second

instance of translation logic as the number of responses become available. In block 508, the second instance of translation logic makes some armotations to the write responses and transmits them over the communication fab-

In block 330, if these are responses to the single request packet, the translation intelligence in the target network adapter converts each response to the read request into the special write type response packets. The target network adapter generates the address in the address field of the response packet by using the stored address of the original initiator's address, such as a con ID, in the translation intelligence. The translation intelligence in the target network adapter notes the number of response packets in this series sent back to the initiator network adapter. The target network adapter armotates the last response packet in this series as the last/final packet in a control field such as the Reqinfo field. If these are not responses associated with the single request packet, then process the response as a direct 35

transaction through the normal response path.

25 ric. In block 504, the first instance of translation logic converts the write responses into data responses and communicates the multiple responses to the initiator.

FIG. 6 illustrates an example conversion of a burst transaction for two-dimensional data converted into a single

30 request packet and the associated responses. FIG. 6 operates similar to the process described in FIG. 5. The read request in FIG. 6 contain additional information such as a number of rows occupied by the target data, and a length difference between starting addresses of two consecutive rows occupied by the target data.

In an embodiment, an example communication fabric with translation intelligence may be implemented in the following enviroument. In the Consumer Digital Multimedia Convergence market a number of portable digital multime-

In block 332, the target network adapter gains access to the communication fabric by winning a round of arbitration.

In block 334, the target network adapter transmits the multiple write type data response packets with armotations over the shared resource.

In block 336, the translation intelligence in the initiator network adapter receives the response packets and detects for the presence of the annotations indicating that these are response packets that correspond to the single request packet.

In block 338, the translation intelligence in the initiator network adapter converts each write-type data response packet into a standard data response corresponding to the original read requests with the initiator's address as the destination address.

In block 340, upon transmitting the last write type data responses in this series, the translation intelligence in the target network adapter clears its stored information regarding this transaction. The target network adapter also relinquishes control of the communication fabric to allow other network adapters to issue their transactions.

In block 342, the translation intelligence in the initiator network adapter checks for the last/final packet armotation in the response packets. Upon converting the last write-type response packet in this series, the translation intelligence in the initiator network adapter clears its stored information regarding this transaction.

In block 344, the initiator network adapter communicates the multiple data responses to the original burst read requests across the signal interface to the initiating IP core.

40 dia devices are appearing in the consumer market that have technical requirements that in most ways mirror those for the wireless market. Some others (digital video camcorders, portable DVD players, etc ... ) have technical requirements that are still closer to the main-stream multimedia devices,

45 yet with much higher emphasis on power and size. Some other sub-markets in CONSUMER DIGITAL MULTIMEDIA include High-Definition digital set top boxes, which increasingly are morphing into generalized Residential Gateways and Home Servers, as well as High-Definition

50 digital television devices, which, in tum, increasingly raise their feature and performance requirements closer to the ones normally associated with the DSTB.

Real time processing of video frames may consume most of the processing capability in a SOC. Video frames are

55 generally too large to be kept memory resident inside an SOC. Thus they are stored externally in the densest form of memory. The processing of video frames requires multiple steps, several of which are most often performed by dedicated hardware processing units. As a result, the multimedia

60 SOC architecture can be controlled by the need to transport massive amounts of data in either small or large chunks (depending on a particular implementation) between multiple internal processing elements and external DRAM, and by the need to keep those processing elements busy working

65 in parallel most of the time. This application requirement affects communication fab

rics in several ways. First, bandwidth guarantees become

US 7,277,975 B2 19 20

following: a graphic user interface; a common set of processing elements; and a library of files containing design elements such as circuits, control logic, and cell arrays that define the intellectual property generator.

The instructions and operations also may be practiced in distributed computing environments where the machinereadable media is stored on and/or executed by more than one computer system. In addition, the information transferred between computer systems may either be pulled or

valuable. Second, effective interface and sharing of external memory is valuable. The external memory may dictate the clock and data rates that must be supported. The communication fabric should be efficient at handling the external memory's asymmetric latency characteristics. The protocol implemented by the communication fabric can aid in efficient management of the external memory. Some data block sizes are larger than for computing applications, while some implementation of MPEG-2 decoders and encoders require efficient support of extremely small data blocks. The increasing utilization of Media Processors for Graphics, Audio, and other multimedia applications result in some degree of morphing between multimedia and computing requirements with a detrimental affect on the more traditional multimedia traffic patterns. As SOC designs evolve into Residential Gateway and Home Server functions they take on even more of the attributes of compute type applications.

10 pushed across the communication media connecting the computer systems.

While some specific embodiments of the invention have been shown, the invention is not to be limited to these embodiments. For example, the higher layer protocol may

The communication fabric should have techniques for exploitation of long bursts and 2D characteristics of some multimedia flows. Some multimedia applications may have

15 communicate a single write request or a single read request over the communication fabric to solicit either multiple data responses or multiple data writes from a target. The invention is to be understood as not limited by the specific embodiments described herein, but only by scope of the

20 appended claims.

10 to 20 initiators, and very few targets (often one or two DRAM subsystems and a peripheral interconnect). It may be assumed that one of the targets is the head of a low performance peripheral interconnect (Synapse 3220) for 25

access to control ports on the initiators and low-bandwidth peripheral devices. All but one or two of the initiators could be bandwidth sensitive. Often these initiators may have multiple concurrent internal activities, so multi-threaded ( 4 to 8) interfaces can be more common in this space than in 30

most others. The ultimate sources and destinations of the audio and

video data may be external interfaces. Very often these interfaces have clock rates that are dictated by interface standards. The set of them that must be supported typically 35

has a lot of asynchronicity. This application requirement directly impacts the interconnect requirements by putting a premium on effective clock decoupling.

In the foregoing specification, the invention has been described with reference to specific exemplary embodiments 40

thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention as set forth in the appended claims. The specification and drawings are, accordingly, to be regarded in an illustrative sense rather 45

than a restrictive sense. For example, the information representing the apparatuses

and/or methods may be contained in an Instance, soft instructions in an IP generator, or similar machine-readable medium storing this information. A machine-readable 50

medium includes any mechanism that provides (e.g., stores and/or transmits) information in a form readable by a machine (e.g., a computer). For example, a machine-readable medium includes read only memory (ROM); random access memory (RAM); magnetic disk storage media; opti- 55

cal storage media; flash memory devices; DVD's Electrically Programmable ROMs; Electrically Erasable PROMs; FLASH memory; magnetic or optical cards; or any type of media suitable for storing electronic instructions. The information representing the apparatuses and/or methods stored 60

on the machine-readable medium may be used in the process of creating the apparatuses and/or methods described herein.

The IP generator may be used for making highly configurable, scalable System On a Chip inter-block communication system that integrally manages data, control, debug and 65

test flows, as well as other applications. In an embodiment, an example intellectual property generator may comprise the

What is claimed is: 1. An apparatus, comprising: a communication fabric to facilitate communications

between functional blocks of a system, wherein the communication fabric implements either 1) a protocol that blocks the communications fabric during a time between transmission of a request and a transmission of an associated response, 2) a protocol that polls a responding block to solicit a response to an issued request, or 3) a protocol that only transfers data in the same direction as requests across the communication fabric; and

translation logic to implement a higher level protocol layered on top of an underlining protocol and the communication fabric, wherein the translation logic converts one initiator transaction into two or more write transactions and then transmits the write transactions using the underlining protocol of the communication fabric.

2. The apparatus of claim 1, wherein a first write transaction includes a write request with annotations in a control field and a data field of the write request to indicate how many read requests were combined into the write request and an address sequence associated with a burst transaction.

3. The apparatus of claim 1, wherein the translation logic relinquishes control of the communication fabric to allow other functional blocks to issue their transactions after a first write transaction from the two or more write transactions is transmitted over the communication fabric to prevent the blocking or the polling on the communication fabric.

4. The apparatus of claim 1, wherein the communication fabric is located in a System On a Chip.

5. A computer readable medium containing instructions to cause a machine to generate the apparatus of claim 1.

6. An apparatus, comprising: a communication fabric to facilitate communications

between functional blocks of a system; and translation logic to implement a higher level protocol

layered on top of an underlining protocol and the communication fabric, wherein the translation logic combines multiple initiator read requests into a single request that carries the number of requests that were combined into the single request and an address sequence associated with the requests.

7. The apparatus of claim 6, wherein the single request is transmitted using the underlining protocol of the communication fabric.

US 7,277,975 B2 21

8. The apparatus of claim 6, wherein the communication fabric is an interconnect located in a System On a Chip and the functional blocks are Intellectual Property cores.

9. A computer readable medium containing instructions to cause a machine to generate the apparatus of claim 6.

10. A method, comprising:

converting two or more read requests from an initiator into a single request that carries the number of the multiple requests that are combined and an address 10

sequence associated with the multiple requests; and

transmitting the single request across a communication fabric.

11. The method of claim 10, wherein the single request 15

has annotations in a field of the request to indicate how many read requests were combined and address sequence associated with the multiple requests.

12. The method of claim 10, wherein a higher level 20 protocol layered on top of an underlining protocol combines the multiple read requests into the single request and uses an underlining protocol to transmit the single request across the communication fabric.

13. A computer readable medium containing instructions 25

to cause a machine to generate apparatuses that perform the operations in claim 10.

14. The method of claim 10, further comprising:

detecting a width of a data bus of a requesting device in 30

the single request; and

performing a data width conversion between the width of the data bus of the requesting device and a width of a data bus of a target based on the data unit size indicated 35

in the request.


separating a transfer of an issued request by an initiator functional block from an associated number of 40 responses with a higher level protocol layered on top of a communication fabric that implements an underlining protocol that either 1) solicits responses to the issued request over the communication fabric, or 2) blocks the communication fabric waiting for the number of 45

responses to become available for consumption by the initiator functional block; and

communicating the number of responses to the initiator functional block as the number of responses become 50 available without the initiator functional block having to poll for the communicated responses.


relinquishing control of the communication fabric to 55

allow other functional blocks to issue their transactions

22 19. A system, comprising:

a communication fabric to facilitate communications between functional blocks of a system;

translation logic to implement a higher level protocol layered on top of the communication fabric that implements either 1) a polling based protocol that solicits responses to an issued request over the communication fabric, or 2) a blocking based protocol that blocks the communication fabric waiting for the one or more responses to become available for consumption by an initiator functional block; and

wherein the translation logic separates the transaction of the issued request from the responses by communicating the responses to the initiator when the responses become available without the initiator having to poll for the communicated responses.

20. The system of claim 19, wherein the translation logic to convert a number of read requests from an initiator functional block into a single request that carries the number of the multiple requests that are combined and an address sequence associated with the multiple requests.

21. The system of claim 19, wherein a first instance of the translation logic converts one or more read requests to a single request based upon an address of the single target indicates that it is capable of decoding the single request.

22. The system of claim 19, wherein the communication fabric is an interconnect in a System On a Chip.

23. The system of claim 19, further comprising:

a second instance of translation logic to convert the single request into an original number of read requests, where each read request has its original target address.


a second instance of translation logic to decode annotations in the single request, and to store both the initiator's identification tag and the number of read request that were combined into the single request.


a second instance of translation logic to transmit the responses back across the communication fabric based on the stored transaction attributes.

26. A computer readable medium containing instructions to cause a machine to generate the apparatuses in the system of claim 19.

27. An apparatus, comprising:

a communication structure to facilitate communications between functional blocks of a system; and

translation logic to convert multiple read requests from an initiator functional block into a single request that carries the number of the multiple requests that are combined and an address sequence associated with the multiple requests.

after the request is transmitted over the communication fabric to prevent blocking the communication fabric waiting for one or more responses to become available for consumption by an initiator functional block.

28. A computer readable medium containing instructions 60 to cause a machine to generate the apparatus of claim 27.


issuing a second request from the same initiator functional block prior to receiving the communicated responses.

18. A computer readable medium containing instructions to cause a machine to generate apparatuses that perform the operations in claim 15.

29. The apparatus of claim 27, wherein the single request also carries a width of a data bus of an initiator functional block to represent a data unit size of the initiator functional

65 block.

* * * * *

Exhibit D

(12) United States Patent Weber

(54) METHOD AND APPARATUS FOR SCHEDULING OF REQUESTS TO DYNAMIC RANDOM ACCESS MEMORY DEVICE

(75) Inventor: Wolf-Dietrich Weber, San Jose, CA (US)



(21) Appl. No.: 09/977,510

(22) Filed: Oct. 12, 2001

(65)

(51) (52)

(58)

(56)

Prior Publication Data

US 2003/0074519 Al Apr. 17, 2003

Int. Cl? ............................................... G06F 13/14 U.S. Cl. ....................... 711/169; 711/154; 711/155;

711!167; 711!168 Field of Search ........................ 711!167-169, 151,

711!105, 154--155

References Cited


5,218,456 A 6/1993 Stegbauer et a!. 5,274,769 A 12/1993 Ishida 5,287,464 A 2/1994 Kumar eta!. 5,363,484 A 11/1994 Desnoyers et a!. 5,469,473 A 11/1995 McClear et a!. 5,530,901 A 6/1996 Nitta 5,557,754 A 9/1996 Sone eta!. 5,664,153 A 9/1997 Farrell 5,809,538 A 9/1998 Pollmann et a!. 5,926,649 A 7/1999 Ma eta!. 5,982,780 A 11/1999 Bohm eta!. 5,996,037 A 11/1999 Emnett 6,023,720 A * 2/2000 Aref eta!. 6,092,137 A 7/2000 Huang eta!. 6,104,690 A 8/2000 Feldman eta!.

111111 1111111111111111111111111111111111111111111111111111111111111 US006961834B2

(10) Patent No.: US 6,961,834 B2 Nov.l, 2005 (45) Date of Patent:

6,122,690 A 9/2000 Nannetti et a!. 6,167,445 A 12/2000 Gai eta!. 6,253,269 B1 6/2001 Cranston et a!. 6,330,225 B1 12/2001 Weber eta!. 6,335,932 B2 1!2002 Kadambi et a!. 6,363,445 B1 3/2002 Jeddeloh

OTHER PUBLICATIONS

Rixner, Scott, et al., Memory Access Scheduling, To appear in ISCA-27 (2000), Computer Systems Laboratory, Standford University, Stanford, CA 94305, pp. 1-11.* Rixner et al., "A Bandwidth-Efficient Architecture for Media Processing", Micro-31, 1998, pp. 1-11. Lamport, Leslie; How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs, IEEE Transactions On Computers, vol. C-28, No.9, Sep. 1979, pp. 690-691. Search Report for PCT/US02/05288; mailed May 20, 2002, 1 page. Search Report for PCT/US02/05438, mailed May 24, 2002, 1 page. Rixner, Scott, et al., Memory Access Scheduling, To appear in ISCA-27 (2000), Computer Systems Laboratory, Stanford University, Stanford, CA 94305, pp. 1-11. Search Report for PCT/US02/05287, mailed Jul. 11, 2002, 2 pages. Search Report for PCT/US02/05439, mailed Jun. 26, 2002, 1 page.

* cited by examiner

Primary Examiner-T Nguyen (74)Attorney, Agent, or Firm-Blakely, Sokoloff, Taylor & Zafman LLP

(57) ABSTRACT

The present invention provides for the scheduling of requests to one resource from a plurality of initiator devices. In one embodiment, scheduling of requests within threads and scheduling of initiator device access is performed wherein requests are only reordered between threads.


PER THREAD REQUESTS 325

320 330

SCHEDULING DECISION

TO DRAM CONTROLLER

310

U.S. Patent Nov. 1, 2005 Sheet 1 of 5 US 6,961,834 B2

10 REQUESTS/"\../

50 RESPONSES/"\../

15 /"\..,/

PER-THREAD REQUEST QUEUES

~50

THREAD ~ STATE

CMD DATA

L___D_RA_M _ ____.r FIG. 1

PER THREAD REQUESTS 325

320 'V

. l}fO ~45 355

THREAD

rnftM -1 D~ I QOS SCHEDULER SCHEDULER STATE

~ 360

~ COMBINER f -I ~ SCHEDULING DECISION

FIG. 3

30 rJ· • • •

I ~

TO DRAM CONTROLLER

310

U.S. Patent Nov.l, 2005

DETERMINE PREFERRED ORDER FOR QOS

GUARANTEES

DETERM1NE PREFERRED ORDER FOR DRAM

EFFICIENCY

FIND NEXT-BEST DRAM EFFICIENCY ORDER

Sheet 2 of 5

205

210

US 6,961,834 B2

BOTH THESE STEPS PERFORMED WITHIN

THE CONSTRAINTS OF THREAD ORDERING

220 SCHEDULE REQUEST

~-.tACCORDING TO DRAM EFFICIENCY ORDER

( END)

FIG. 2

U.S. Patent

LATENCY AND

EFFICIENCY

EAGER SWITCH

SCHEDULING DECISION

Nov.l, 2005 Sheet 3 of 5

LATENCY

FIG. 4

US 6,961,834 B2

EFFICIENCY

LAZY SWITCH

PER-THREAD REQUESTS SOFTWARE

• • •

DRAM DATA ,.___ BUS

SCHEDULER

WRITE I

I l._ 525

I 5 I 20

~

SWITCH R POINT

51 r-1

.. DIRECTION

COUNT

rJ 515

FIG. 5

EGISTER 0

STATE

U.S. Patent Nov.l, 2005 Sheet 4 of 5

AT LEAST ONE REQUEST AVAILABLE

NO

RESET COUNT

PROCESS REQUEST FOR CURRENT DJRECTION

INCREMENT COUNT

(END)

620

625

COUNT HAS REACHED

SWITCH POINT?

NO

FIG. 6

US 6,961,834 B2

NO

U.S. Patent Nov.l, 2005 Sheet 5 of 5

705 PER-THREAD REQUEST /\./

US 6,961,834 B2

! J ... ,,.--+-----1 1!---. .-----!. ~

715

THREAD r---. STATE

SELECTIVE FILTER 710

• • • SELECTED REQUESTS

FIG. 7

ABSOLUTE DRAM SCHEDULING

• • • THREAD QOS SCHEDULING

• • • COST FUNCTION DRAM

SCHEDULING (SORTED BY cosn

• • • PRIORITIZE BY ARRIVAL

TIME

l WINNING REQUEST

860

rJ.

~ DRAM STATE

830 V'J

840 /'

I+-- ARRIVAL HISTORY

FIG. 8

US 6,961,834 B2 1 2

SUMMARY OF 1HE INVENTION METHOD AND APPARATUS FOR SCHEDULING OF REQUESTS TO DYNAMIC

RANDOM ACCESS MEMORY DEVICE


The mechanism described herein applies to systems where multiple independent initiators are sharing a dynamic random access memory (DRAM) subsystem.

The present invention provides for the scheduling of requests to one resource, such as a DRAM subsystem, from

5 a plurality of initiators. Each initiating thread is provided different quality-of-service while resource utilization is kept high and a strong ordering model is maintained.

10 BACKGROUND


FIG. 1 illustrates one embodiment of the system of the present invention.

In systems that are built on a single chip it is not uncommon that there are several independent initiators (such as microprocessors, signal processors, etc.) accessing

FIG. 2 is a simplified flow diagram illustrating one embodiment of combining thread scheduling and device

15 scheduling. a dynamic random access memory (DRAM) subsystem that for cost, board area, and power reasons is shared among these initiators. The system may require different qualities of service (QOS) to be delivered for each of the initiators. Secondly, the memory ordering model presented to the 20

initiators is important. Ideally, the initiators want to use a memory model that is as strongly ordered as possible. At the same time, the order in which DRAM requests are presented to the DRAM subsystem can have a dramatic effect on DRAM performance. Yet re-ordering of requests for thread 25

QOS or DRAM efficiency reasons can compromise a strongly ordered memory model. What is required is a unified DRAM scheduling mechanism that presents a strongly ordered memory model, gives differential quality of service to different initiators, and keeps DRAM efficiency as 30

high as possible. The request stream from each different initiator can be

described as a thread. If a DRAM scheduler does not re-order requests from the same thread, intra -thread request order is maintained, and the overall DRAM request order is 35

simply an interleaving of the sequential per-thread request streams. This is the definition of Sequential Consistency, the strongest memory ordering model available for systems that include multiple initiator components. (For further discussion regarding Sequential Consistency, see Leslie Lamport, 40

How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs, IEEE Transactions On Computers, Vol. C-28, No. 9, September 1979, pgs. 690--691.)

FIG. 3 illustrates one embodiment of a DRAM and thread scheduler.

FIG. 4 is a simplified example illustrating the tradeoff of cost function scheduling.

FIG. 5 illustrates one embodiment of a cost function DRAM bus scheduler.

FIG. 6 is a flow diagram illustrating one embodiment of a cost function DRAM bus scheduling process.

FIG. 7 illustrates one embodiment of a scheduling component as a request filter.

FIG. 8 illustrates one embodiment of ordering of thread scheduling and device scheduling to achieve the desired results.


The mechanism described herein applies to systems where multiple independent initiators share a dynamic random access memory (DRAM) subsystem.

In one embodiment, the present invention allows different initiators to be given a pre-defined quality of service independent of one another while at the same time keeping DRAM efficiency as high as possible and presenting a strong memory ordering model to the initiators.

FIG. 1 shows a high-level block diagram of one embodi-ment of a DRAM scheduling system. Requests 10 from different initiators arrive over a multi-threaded interface 15. An initiator may be embodied as a device or a process. Requests 10 from different initiators are communicated

45 across different threads that are identified by different thread identifiers ("thread IDs") at the interface. This allows requests to be split by thread (or initiator) into per-thread request queues, e.g. 20, 25, 30. Requests from these thread queues 20, 25, 30 are presented in parallel to the DRAM and

Existing systems either order the requests at a different point in the system than where the DRAM efficiency scheduling occurs (if any is done), and/or the systems re-order requests within a processing thread. For example, requests may be carried from the initiators to the DRAM Controller via a standard computer bus. Request order (between threads and within threads) is established at the time of access to the computer bus, and is not allowed to be changed by the DRAM controller. In this case, DRAM scheduling for efficiency is more constrained than it needs to be resulting in lower DRAM efficiency. In a different example, each ini- 55

tiator may have its own individual interface with the DRAM Controller, allowing the DRAM controller to schedule requests while maintaining thread ordering. This kind of system has the potential of achieving sufficient results, but

50 thread scheduler block 35. The scheduler block 35 decides

it is wasteful of wires to the DRAM controller. It is possible, 60

in such a system, to reorder DRAM requests within a thread. While this may result in higher DRAM efficiency, it also considerably loosens the memory model, i.e. it no longer presents a memory model of Sequential Consistency. It is important to retain a strong memory model while at the same 65

time allowing a reordering of memory requests to achieve a high DRAM efficiency and QOS guarantees.

the order in which requests are presented to the DRAM Controller 40, which in turn is responsible for sending the requests to the actual DRAM subsystem 45. When responses 50 return from the DRAM controller 45, they are sent back to the initiators via the multi-threaded interface 15. The delivery of requests from the initiators was described using a multi-threaded interface and thread identifiers. An alternative embodiment uses individual single-threaded interfaces for each initiator.

The DRAM and Thread scheduler 35 acts as the synchronization point that establishes the order in which DRAM requests are processed. Even though requests can arrive over the multi-threaded interface in one order, the requests may be re-ordered by the scheduler block 35 in order to satisfy thread quality of service (QOS) guarantees, or in order to increase DRAM efficiency. Conversely, the DRAM Controller 40 block processes requests in order, so that the order

US 6,961,834 B2 3

established by the scheduler block 35 is indeed the order in which requests are committed. However, if the scheduler block 35 does not re-order requests from the same thread, intra -thread request order is maintained, and the overall DRAM request order is simply an interleaving of the 5

sequential per-thread request streams. One embodiment of the process is illustrated by the

simplified flow diagram of FIG. 2. At step 205, a preferred request order for QOS guarantees is identified or determined. The preferred order for processing requests for 10

DRAM efficiency is determined at step 210. In performing steps 205 and 210 the constraints of the memory ordering model are taken into account. If the preferred DRAM efficiency order satisfies QOS guarantees, step 215, then a request is scheduled according to the DRAM efficiency 15

order, step 220. If the DRAM efficiency order does not satisfy QOS guarantees, step 215, the next-best DRAM efficiency order is determined, step 225. This step is repeated until the DRAM efficiency order meets QOS guarantees.

The process illustrated by FIG. 2 is only one embodiment. 20

Other embodiments are also contemplated. For example, in one embodiment, a request order is determined that satisfies QOS guarantees and is then modified to optimize DRAM efficiency.

FIG. 3 offers a more detailed look at one embodiment of 25

the DRAM and Thread Scheduler of FIG. 1. The requests 320, 325, 330 from different threads are presented and sequenced to the DRAM controller 310. The scheduling decision for which request gets to proceed at any one time is derived using a combination of thread quality of service 30

scheduling and DRAM scheduling. The thread quality of service scheduler 340 keeps and

uses thread state 350 to remember thread scheduling history and help it determine which thread should go next. For example, if threads are being guaranteed a certain amount of 35

DRAM bandwidth, the thread scheduler 340 keeps track of which thread has used how much bandwidth and prioritizes threads accordingly. The DRAM scheduler 345, on the other hand, attempts to sequence requests from different threads so as to maximize DRAM performance. For example, the 40

scheduler 345 might attempt to schedule requests that access the same DRAM page close to each other so as to increase the chance of getting DRAM page hits. The DRAM scheduler 345 uses and keeps state 355 on the DRAM and access history to help with its scheduling decisions. 45

The thread quality of service scheduler 340 and the DRAM scheduler 345 are optimized for different behavior and may come up with conflicting schedules. Outputs of the two schedulers 340, 345 have to be combined 360 or

50 reconciled in order to achieve the promised thread quality of service while still achieving a high DRAM efficiency.

4 An example of cost function scheduling is request sched

uling based on the direction of a shared DRAM data bus. Typically, there is a cost associated with changing the DRAM data bus direction from read to write and vice versa. It is thus advantageous to collect requests that require the same data bus direction together rather than switching between every request. How many requests should be collected together depends on the expected request input pattern and a trade-off between efficiency and latency, an example of which is illustrated in FIG. 4. If the DRAM scheduling algorithm is set to switch frequently between directions, the expected efficiency is low because a lot of switches result in many wasted data bus cycles. On the other hand, the average waiting time (latency) of a request is low because it gets serviced as soon as it arrives.

If the DRAM scheduling algorithm is set to switch less frequently (i.e. to collect more requests of each direction together) the overall DRAM efficiency is likely to be higher but the average latency of requests is also higher. The best point for overall system performance is not easily determined and depends on the request pattern, the trade-off between latency and efficiency, and the cost of switching.

The example below uses bus direction as the basis for cost-function scheduling. However, it is contemplated that a variety of other criteria may be used to implement costfunction scheduling. Other examples of cost-function sched-uling include deciding when to close one DRAM page and open another and deciding when to switch DRAM requests to use a different physical bank.

FIG. 5 illustrates one embodiment of a DRAM bus scheduler that is programmable so as to allow dynamic adjustment of the switch point for optimum performance. In one embodiment, the scheduler 505 keeps track of the last direction (read or write) of the data bus 510, and a count 515 of the number of requests that had that direction. A register 520 is added to hold the switch point information. In one embodiment, this register 520 can be written from software 525 while the system is running in order to dynamically configure the DRAM scheduler for optimum performance. For example, it may be desirable to update the switch point dynamically according to the application and/or by the application. In one embodiment, the switchpoint is empirically determined based upon past and possibly current performance.

As requests are presented on the different threads, the scheduler 505 looks at the current direction of the DRAM data bus, the count of requests that have already been sent, the configurable switch point, and the direction of incoming new requests. Before the count reaches the switch point, requests that have the same direction as the current DRAM data bus are preferred over those going in the opposite direction. Once the switch point is reached, requests to the opposite direction are preferred. If only requests from one direction are presented, there is no choice in which direction

The DRAM scheduler 345 itself has to balance several different scheduling goals. In one embodiment, scheduling components can be categorized into two broad categories, referred to herein as absolute and cost-function scheduling.

Absolute scheduling refers to scheduling where a simple yes/no decision can be made about every individual request. An example is DRAM bank scheduling. Any given DRAM request has exactly one bank that it addresses. Either that bank is currently available to receive the request, or it is busy with other requests and there is no value in sending the request to DRAM at this time.

55 the next request will go. In the present embodiment, a count and compare function is used to determine the switch point. However, it is contemplated that other functions may be used. Furthermore, although the example herein applies the count and compare function to bus direction, all types of

Cost-function scheduling is more subtle, in that there is no immediate yes/no answer to every request. At best it can be said that sending the request to DRAM at a certain time is more or less likely to yield a high DRAM efficiency.

60 measures for the count may be used. One embodiment of the process is illustrated by FIG. 6. At

step, 605, considering that at least one request is available, it is determined whether there are any requests for the current direction of the bus. If there are not, the bus direction

65 is changed, step 610, the count resets step 615, and the request is processed using the new direction of the bus 620. The count keeping track of the number of requests per-

US 6,961,834 B2 5

formed in the current bus direction is incremented, step 625. 6

around costs 3 cycles and switching from one physical DRAM bank to another costs 1 cycle, then DRAM data bus scheduling is placed ahead of physical bank scheduling. If more than one request emerges from the bottom of the

If there are requests for the current direction of the bus, it is then checked to see if the count has reached the switch point, step 630. If the switch point has been reached then it is determined whether there are any requests for the opposite direction of the bus, step 635. If there are not, then the request for the current direction is processed, step 620, and the count incremented, step 625. In addition, if the count has not reached the switch point, step 630, then the process continues with the request for the current direction being processed and the count being incremented, steps 620 and 625.

5 cost-function DRAM scheduler, they are priority ordered by arrival time. This last filter 840 prevents requests from getting starved within their thread priority group.

It is readily apparent that the above is just one implementation of a DRAM scheduling system. It is readily recog-

It is desirable, in one embodiment, to combine thread quality of service scheduling and DRAM scheduling to achieve a scheduling result that retains the desired quality of service for each thread while maximizing DRAM efficiency. One method for combining the different scheduling components is to express them as one or more request filters, one

10 nized that different filter types, having different thresholds, and switch points and/or different ordering of filters can be implemented to achieve desired results. Furthermore, although represented in the drawings as separate filter elements, the filters may be implemented by a single logic

of which is shown in FIG. 7. Per-thread requests 705 enter, and are selectively filtered, so that only a subset of the requests filters through, i.e. exits, the filter 710. The decision

15 processor or process that performs the stages of the process representative of the filtering functions described above. The invention has been described in conjunction with one embodiment. It is evident that numerous alternatives, modifications, variations and uses will be apparent to those

20 skilled in the art in light of the foregoing description.

What is claimed is: 1. A process for scheduling requests to access a resource,

said requests originating from at least one thread from at

of which requests should be filtered out is made by the control unit 715 attached to the filter. The unit 715 bases its decision on the incoming requests and possibly some state of the unit 715. For example, for a cost function filter that decides to switch the direction of the DRAM data bus, the decision is based on the current direction of the bus, the number of requests that have already passed in that direction since the last switch and the types of requests being presented from the different threads. The decision might be to 30

continue with the same direction of the DRAM data bus, and

25 least one initiator, said process comprising combining scheduling of requests between threads and scheduling of requests of initiator access to the resource and processing at least one of, read and write, requests within each thread in

so any requests that are for the opposite direction are filtered out.

Once the different scheduling components have been expressed as filters, the various filters can be stacked to 35

combine the scheduling components. The order of stacking the filters determines the priority given to the different scheduling components.

FIG. 8 is a block diagram of one embodiment illustrating the ordering of the different portions of the two scheduling 40

algorithms to achieve the desired results. Each of the blocks 810, 820, 830, 840 shown in FIG. 8 acts like a filter for requests entering 805 and emerging 860. For each filter, for example, 810, 820, 830 only requests that meet the criteria of that stage of scheduling are allowed to pass through. For 45

example, DRAM bank scheduling 810 allows only requests to available banks to pass through and filters out those requests that do not meet the criteria. Thread quality of service scheduling 820 allows only threads that are in the desired priority groups to pass through. Data bus scheduling, 50

an example of cost-function scheduling, 830 might preferentially allow only reads or writes to pass through to avoid data bus turnaround.

More particularly, in one embodiment, DRAM requests 805 from different threads enter and the absolute DRAM 55

the order that they are issued. 2. The process as set forth in claim 1, wherein combining

comprises using a combination of thread quality of service (QOS) scheduling and resource scheduling.

3. The process as set forth in claim 2, wherein combining further comprises:

determining an order of requests to meet QOS guarantees; determining an order of requests for resource efficiency;

and if the resource efficiency order satisfies QOS guarantees,

and intra-thread order is maintained, scheduling a request according to a first resource efficiency order, else scheduling a request in accordance with a second resource efficiency order.

4. The process as set forth in claim 1, further comprising maintaining and using a thread scheduling history to at least in part to determine scheduling of threads.

5. The process as set forth in claim 4, wherein thread scheduling history comprises thread bandwidth usage.

6. The process as set forth in claim 1, further comprising maintaining a state and access history on the device to at least in part determine scheduling of the resource.

7. The process as set forth in claim 1, wherein scheduling is determined by prioritizing threads according to bandwidth usage and sequencing requests from different threads so as to achieve a determined device performance.

8. The process as set forth in claim 1, wherein scheduling is selected from the group consisting of absolute and costfunction scheduling.

9. The process as set forth in claim 1, wherein the resource

scheduling components 810 are exercised, so that requests that cannot be sent to DRAM are filtered out, and only requests that can be sent continue on to the thread scheduler 820. The thread scheduler 820 schedules requests using the quality of service requirements for each thread. The scheduler 820 filters out requests from threads that should not receive service at this time. Any remaining requests are passed on to the cost-function DRAM scheduler 830. Here, requests are removed according to cost-function scheduling. If there is more than one cost-function component to DRAM scheduling, the different components are ordered from highest switch cost to lowest. For example, if data bus turn-

60 is a dynamic random access memory (DRAM) and scheduling is selected from the group consisting of deciding when to close dynamic random access page (DRAM) and open another, and deciding when to switch DRAM requests to use a different physical bank of DRAM, and deciding when to

65 switch direction of a bus coupled to the DRAM. 10. A scheduling apparatus for scheduling access to a

resource, comprising:

US 6,961,834 B2 7

an input coupled to receive at least one access request originating from at least one thread from at least one initiator;

8 (DRAM) and a cost function scheduling is selected from the group consisting of deciding when to close a DRAM page and open another, deciding when to switch DRAM requests to use a different physical bank of DRAM, and deciding logic to combine scheduling of requests between threads

and scheduling of initiator access to the resource and processing requests within each thread in the order that they are issued.

5 when to switch direction of a bus coupled to the DRAM.

11. The scheduling apparatus as set forth in claim 10, wherein the resource is selected from the group consisting of a process, apparatus and a dynamic random access memory 10

(DRAM). 12. The scheduling apparatus as set forth in claim 10,

wherein the logic utilizes a combination of thread quality of service (QOS) guarantees and resource cost-function scheduling.

13. The scheduling apparatus as set forth in claim 10, further comprising a thread scheduling history, said logic using the thread history to at least in part determine scheduling.

15

14. The scheduling apparatus as set forth in claim 13, 20

wherein the thread scheduling history comprises thread bandwidth usage.

15. The scheduling apparatus as set forth in claim 10, further comprising a state and access history used to at least in part determine scheduling of the resource.

16. The scheduling apparatus as set forth in claim 10, wherein scheduling of threads from at least one initiator is selected from the group consisting of absolute and cost function scheduling.

25

17. The scheduling apparatus as set forth in claim 10, 30

wherein the resource is dynamic random access memory


means for scheduling requests to access a resource, wherein the requests originate from at least one thread of at least one initiator;

means for combining scheduling of requests between threads; and

means for scheduling of requests of initiator access to the resource and processing requests within each thread in the order that they are issued.

19. The apparatus of claim 18, further comprising: means for determining an order of requests to meet QOS

guarantees;

means for determining an order of requests for resource efficiency, wherein if the resource efficiency order satisfies QOS guarantees, and intra -thread order is maintained, then scheduling a request according to a first resource efficiency order, otherwise scheduling a request in accordance with a second resource efficiency order.

20. The apparatus of claim 18, further comprising:

means for deciding when to switch DRAM requests to use a different physical bank of DRAM.

* * * * *

Exhibit E

c12) United States Patent Weber

(54) METHOD AND APPARATUS FOR SCHEDULING A RESOURCE TO MEET QUALITY-OF-SERVICE RESTRICTIONS

(75) Inventor: Wolf-Dietrich Weber, San Jose, CA (US)



(21) Appl. No.: 10/963,271

(22) Filed: Oct. 11, 2004


US 2005/0086404 Al Apr. 21, 2005

Related U.S. Application Data

(63) Continuation of application No. 09/977,602, filed on Oct. 12, 2001, now Pat. No. 6,804,738.

(51) Int. Cl. G06F 13114 (2006.01)

(52) U.S. Cl. ........................ 710/244; 710/45; 7101117; 370/395.21; 370/395.41; 370/395.42

(58) Field of Classification Search ..................... None

(56)

See application file for complete search history.

References Cited


4,688,188 A 5,107,257 A 5,218,456 A 5,265,257 A

8/1987 Washington 4/1992 Fukuda 6/1993 Stegbauer et a!.

11/1993 Simcoe et a!.

I IIIII

EP

1111111111111111111111111111111111111111111111111111111111111 US007191273B2

(10) Patent No.: US 7,191,273 B2 (45) Date of Patent: Mar.13,2007

5,274,769 A 12/1993 Ishida 5,287,464 A 2/1994 Kumar eta!. 5,363,484 A 1111994 Desnoyers et a!.

(Continued)


02 70 7854 1112004

(Continued)

OTHER PUBLICATIONS

Lamport, Leslie; "How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs", IEEE Transactions on Computers, vol. C-28, No. 9, Sep. 1979, pp. 690-691.

(Continued)

Primary Examiner--Glenn A. Auve (74) Attorney, Agent, or Firm-Blakely, Sokoloff, Taylor & Zafman, LLP

(57) ABSTRACT

The present invention is directed to a method and apparatus for scheduling a resource to meet quality of service guarantees. In one embodiment of three levels of priority, if a channel of a first priority level is within its bandwidth allocation, then a request is issued from that channel. If there are no requests in channels at the first priority level that are within the allocation, requests from channels at the second priority level that are within their bandwidth allocation are chosen. If there are no requests of this type, requests from channels at the third priority level or requests from channels at the first and second levels that are outside of their bandwidth allocation are issued. The system may be implemented using rate-based scheduling.


US 7,191,273 B2 Page 2


5,379,379 A 111995 Schwartz et a!. 5,469,473 A 1111995 McClear et al. 5,530,901 A 6/1996 Nitta 5,546,546 A 8/1996 Bell et al. 5,557,754 A 9/1996 Stone eta!. 5,634,006 A * 5/1997 Baugher et a!. ............ 709/228 5,664,153 A 9/1997 Farrell 5,673,416 A 9/1997 Chee et al. 5,745,913 A 4/1998 Pattin et a!. 5,748,629 A 5/1998 Caldara et al. 5,809,538 A 9/1998 Pollmann et a!. 5,917,804 A 6/1999 Shah eta!. 5,926,649 A 7/1999 Ma et al. 5,948,089 A 9/1999 Wingard et a!. 5,982,780 A 1111999 Bohm et al. 5,996,037 A 1111999 Ernnett 6,023,720 A 212000 Aref eta!. 6,092,137 A 7/2000 Huang eta!. 6,104,690 A 8/2000 Feldman et al. 6,119,183 A 9/2000 Briel eta!. 6,122,690 A 9/2000 Nannetti et a!. 6,141,713 A 10/2000 Kang 6,167,445 A 12/2000 Gai eta!. 6,199,131 Bl 3/2001 Melo eta!. 6,212,611 Bl 4/2001 Nizar eta!. 6,253,269 Bl 6/2001 Cranston et a!. 6,266,718 Bl 7/2001 Klein 6,330,225 Bl 12/2001 Weber eta!. 6,335,932 B2 112002 Kadambi et a!. 6,363,445 Bl 3/2002 Jeddeloh 6,430,156 Bl * 8/2002 Park eta!. .................. 370/232 6,499,090 Bl 12/2002 Hill eta!. 6,510,497 Bl 112003 Strongin et a!. 6,530,007 B2 3/2003 Olarig eta!. 6,578,117 B2 6/2003 Weber 6,628,609 B2 9/2003 Chapman et a!. 6,636,482 B2 10/2003 Cloonan et a!. 6,721,325 Bl * 4/2004 Duckering et al. ...... 370/395.4 6,804,738 B2 10/2004 Weber 6,804,757 B2 10/2004 Weber 6,862,265 Bl * 3/2005 Appala eta!. 370/235 6,961,834 B2 1112005 Weber 6,976,106 B2 12/2005 Tomlinson et a!.

200110026535 AI * 10/2001 Amou eta!. ................ 370/235 2002/0129173 Al 9/2002 Weber eta!. 2002/0138687 Al 9/2002 Yang eta!. 2002/0174227 Al 1112002 Hartsell et al. 2003/0074504 Al 4/2003 Weber 2003/0074519 Al 4/2003 Weber 2003/0079080 Al 4/2003 DeMoney


EP 02 72 1116 5/2006 wo WO 00/29956 A 5/2000 wo WO 01/75620 A 10/2001

OTHER PUBLICATIONS

Rixner, Scott et a!., "Memory Access Scheduling", to appear in ISCA-27 (2000), Computer Systems Laboratory, Stanford University, Stanford, CA 94305, pp. 1-11. Search Report for PCT/US02/05438, mailed May 24, 2002, 1 page. Search Report for PCT/US02/05288, mailed May 20, 2002, 1 page. Search Report for PCT/US02/05439, mailed Jun. 26, 2002, 1 page. Search Report for PCT/US02/05287, mailed Jul. 11, 2002, 1 page. Rixner eta!., "A Bandwidth-Efficient Architecture for Media Processing", Micro-31, 1998, pp. 1-11. Drew Wingard, "MicroNetworks-Based Integration for SOCs." In Design Automation Conference, 2001, pp. 673-677, 5 pages.

Ron Ho, eta!., "The Future of Wires". In Proceedings of the IEEE, vol. 89, No.4, pp. 490-504, Apr. 2001, 15 pages. William J. Dally, et al., "Route Packets, Not Wires: On-Chip Interconnection Networks." In Design Automation Conference, pp. 684-689, Jun. 2001, 6 pages. Jim Kurose, "Open Issues and Challenges in Providing Quality of Service Guarantees in High-Speed Networks", ACM Computer Communication Review, vol. 23, No. 1, pp. 6-15, Jan. 1993, 10 pages. Hui Zhang, "Service Disciplines for Guaranteed Performance Service in Packet-Switching Networks", Proceedings of the IEEE, vol. 83, No. 10, Oct. 1995, pp. 1374-1396, 23 pages. Dimitrios Stiliadis, eta!., "Latency-Rate Servers: A General Model for Analysis of Traffic Scheduling Algorithms", In Proceedings of IEEE INFOCOM 96, Apr. 1996, pp. 111-119, 9 pages. K. Lahiri, et a!., "LOTTERYBUS: A New High-Performance Communication Architecture for System-on-Chip Designs". In Proceedings of Design Automation Conference 2003, Las Vegas, Jun. 2001, pp. 15-20, 6 pages. William J. Dally, "Virtual-channel Flow Control", In Proceedings of the 17th Int. Symp. on Computer Architecture, ACM SIGARCH, May 1990, vol. 18, No. 2, pp. 60-68, 9 pages. Drew Wingard, et a!., "Integration Architecture for System-on-aChip Design", In Proc. of the 1998 Custom Integrated Circuits Conference, May 1998, pp. 85-88, 4 pages. Weber, Wolf-Dietrich, eta!., "Enabling Reuse via an IP Core-centric Communications Protocol: Open Core Protocol", In Proceedings of the IP 2000 System-on-Chip Conference Mar. 2000, pp. 1-5. Ivo Adan and Jacques Resing, "Queueing Theory", Eindoven University of Technology, Feb. 14, 2001, pp. 23-27. PCT International Search Report for PCT/US2004/035863, Int'l filing Oct. 27, 2004, mailed Aug. 9, 2005, 9 pages. Wingard, Drew, "Sonics SOC Integration Architecture," Sonics, Inc., 1500 presentation, 1999, 25 pages, www.OCP-IP.org. Kamas, Alan, "The SystemC OCP Models; An Overview of the SystemC Models for the Open Core Protocol," NASCUG, 2004, 30 pages. Wingard, Drew, "Socket-Based Design Using Decoupled Interconnects," Sonics, Inc., 30 pages, downloaded Jun. 14, 2004, www . OCP-IP.org. Haverinen, Anssi, "SystemC™ Based SoC Communication Modeling for the OCP Protocol," White Paper, Oct. 2002, Vl.O, 39 pages. Wingard, Drew, "Tiles-An Architectural Abstraction for PlatformBased Design," Perspective article 2, EDA Vision, Jun. 2002, 3 pages, www.edavision.com . Weber, Wolf-Dietrich, "Efficient Shared DRAM Subsystems for SOCs," Sonics, Inc., 2001, 6 pages. "Open Core Protocol Specification," OCP International Partnership, Release 1.0, 2001. Wingard, Drew PhD., "Integrating Semiconductor IP Using f!Networks," ASIC Design, Jul. 2000 electronic engineering, 3 pages. Wingard, Drew, "Tiles: the Heterogeneous Processing Abstraction for MPSoC," Sonics, Inc., Smart Interconnect IP, 2004, 35 pages, www.OCP-IP.org. Chou, Joe, "System-Level Design Using OCP Based TransactionLevel Models," presentation, Denali MemCom Taiwan 2005, OCP International Partnership, 23 pages. Wingard, Drew, "A Non-Blocking Intelligent Interconnect for AMBA-Connected SoCs," Sonics, Inc., Co Ware Arm Developer's Conference, Oct. 6, 2005, 39 pages. Weber, Wolf-Dietrich, et. a!., "A Quality-of-Service Mechanism for Interconnection Networks In System-on-Chips," 1530-1591105, 2005 IEEE, 6 pages. Casini, Phil, "Measuring the Value of Third Party Interconnects," Sonics, Inc., White Paper, 2005, 11 pages, www.sonicsinc.com. European Search Report for International Application No. EP 02 71 3653, mailed on May 29, 2006, pp. 3 total.

* cited by examiner

U.S. Patent

TOP

BOTIOM

Mar.13,2007 Sheet 1 of 3 US 7,191,273 B2

PRIORITY CHANNELS (WITHIN ALLOCATION)

ALLOCATED-BANDWIDTH CHANNELS fWITHI N

ALLOCA ION)

BEST-EFFORT CHANNELS

PRIORITY & ALLOC. BANDWIDTH CHANNELS (OUTS I DE ALLOCATION)

40

1--+-~RESOURCE

FIG. 1

f-) 205

lJ 210

Lr 215

J 220

FIG. 2

OPTIONALLY COMBINE INTO ONE LEVEL

35

U.S. Patent Mar.13,2007 Sheet 2 of 3 US 7,191,273 B2

FIG. 3

ISSUE REQUEST FROM ONE

CHANNEL AT LEVEL

ISSUE REQUEST FROM ONE

CHANNEL AT LEVEL

U.S. Patent Mar.13,2007 Sheet 3 of 3 US 7,191,273 B2

412 414 416 418

I~~~ ®f®®® ®I ~ '""--- __ ....,.,A..._ ____ __,A _____ .....,.,/

y y y ONE 420 430

SCHEDULING PERIOD

410 ®REQUEST SERVICED

FIG. 4

r--------, I ON OVERFLOW I , ______ . ___ ,

SIGN ALLOCATION

PRIMARY OUTPUT (ABOVE OR BELOW ALLOCATION)

COUNT

,.---------, I ON REQUEST I

I SERVICE I '---------'

FIG. 5

RATE COUNT

TIME

,.----, I EVERY I 1CYCLE 1

'----'

US 7,191,273 B2 1

METHOD AND APPARATUS FOR SCHEDULING A RESOURCE TO MEET QUALITY-OF -SERVICE RESTRICTIONS

This application is a continuation application of and claims the benefit of U.S. application Ser. No. 09/977,602 filed Oct. 12, 2001, which will issue as U.S. Pat. No. 6,804,738 on Oct. 12, 2004.

FIELD OF THE INVENTION 10

2 within the allocation, requests from channels at the second priority level that are within their bandwidth allocation are chosen. If there are no requests of this type, requests from channels at the third priority level or requests from channels at the first and second levels that are outside of their bandwidth allocation are issued. The system may be implemented using rate-based scheduling.


The objects, features and advantages of the invention will be apparent from the following detailed description in which:

The field of the invention relates to a system where access to a resource is scheduled to provide a particular qualityof-service to two or more requestors competing for access to that resource.

FIG. 1 is a simplified diagram of one embodiment of an 15 arbitration system that operates in accordance with the

technology of the present invention. BACKGROUND

In computer systems it is common that a given resource (such as a system bus, a memory bank, etc.) is shared 20

between several competing requesting devices or processes ("requesters") that would like to make use of the resource. Access to that resource therefore has to be arbitrated, in order to determine which requester can access the resource when there are concurrent and conflicting requests to the 25

resource. It is desirable to be able to specifY different quality-of-service (QOS) guarantees for different requestors

FIG. 2 illustrates one embodiment of priority order. FIG. 3 is a simplified flow diagram illustrating one

embodiment of an arbiter. FIG. 4 illustrates an embodiment of rate-based schedul

ing. FIG. 5 is a simplified diagram illustrating one embodi

ment of rate-based scheduling in accordance with the teachings of the present invention.


FIG. 1 shows one embodiment of an arbitration system. Requests 10 arrive from different requesting devices or processes and are stored in a channel, e.g., channels 15, 20, 25 that are contemplated to be logically or physically implemented. In this embodiment, each channel accommodates requests from one requestor. Thus, requests from device A are stored in channel 0 (15), requests from device

in order for the system to operate properly. Examples of QOS guarantees include data bandwidth and latency. For example, it may be desirable to allow a processor to have 30

very high-priority and therefore low-latency access to a memory system. Another example is that one might want a video system to have a certain reserved bandwidth on a system bus so that the video screen can be updated as required at a fixed frame rate.

Existing arbitration schemes that aim to provide QOS guarantees include fixed-priority arbitration and time division multiplexing. In fixed-priority arbitration each requestor is assigned a fixed priority and requesters are serviced in priority order. In time division multiplexing, 40

each requestor is pre-allocated a certain set of fixed access periods during which it can access the resource. While these arbitration schemes have their value in certain systems, they fall short of providing QOS guarantees when there is a mix

35 B are stored in channel 1 (20), etc. In the present embodiment, it is assumed here that requests within each channel are serviced in the order they are received in each channel, but this is not a necessary requirement for the invention described herein.

of requesters with different QOS requirements and perhaps 45

unpredictable request arrival times. For example, it is not possible to give any kind of bandwidth guarantee to multiple different requestors if fixed-priority arbitration is used unless the exact request pattern of each initiator is known a priori. Time division multiplexing is inefficient when the arrival 50

times of requests are not deterministic, or when the requests require differing amounts of service time from the resource depend on the type of request or the recent history of other requests.

What is desired is a resource scheduling scheme that can 55

provide different QOS guarantees to different requesters and further can efficiently handle non-deterministic arrival and service times.

The arbitration unit 30 is responsible for scheduling access by each channel to the resource 35. A resource can be a variety of different apparatuses or processes, including memory and the like. The arbitration unit 30 is configured with the desired quality-of-service (QOS) guarantees for each of the channels using the QOS configuration unit 40. QOS may include a variety of criteria including one or more minimum, maximum or ranges of performance criteria for a particular dataflow, process or device. The nnit 30 also keeps track of recent scheduling decisions using the scheduling history unit 45. Although the QOS unit 40 and scheduling history nnit 45 are illustrated as separate units, it is readily apparent that the functionality of one or both of the quality of service configuration unit 40 and scheduling history unit 45 can be configured to be part of the arbitration unit 30 or joined into a single unit coupled to the arbitration unit 30. Further, it is contemplated that the one or more of the units 30, 40, 45 may be physically part of the resource 35.

If more than one channel15, 20, 25 has a request waiting for service, the arbitration nnit 30 selects the channel that


The present invention is directed to a method and apparatus for scheduling a resource to meet quality of service guarantees. In one embodiment of three levels of priority, if

60 can proceed to service using the scheduling history and desired QOS information retrieved respectively from the scheduling history unit 45 and quality of service configuration unit 40. The next request of the selected channel

a channel of a first priority level is within its bandwidth 65

allocation, then a request is issued from that channel. If there are no requests in channels at the first priority level that are

proceeds to access the resource and exits the system. In one embodiment, the arbiter 30 uses the scheduling

history 45 to determine if certain QOS guarantees can be met. For example, it is possible that the amonnt of time

US 7,191,273 B2 3

needed to access the resource depends on the relative timing of access of the resource or the type of request. For example, when accessing a bi-directional system bus, it may take longer for a write request to access the bus if it has recently been accessed with a read request, because the bus direction may need to be turned around first. This information may be determined from the scheduling history 45 and in turn affects the scheduling history 45. As noted above, the amount of time needed to access the resource may also depend on the type of request. For example, when accessing a dynamic 10

random access memory (DRAM) memory system, a request to an open DRAM page might take much less time than a request to a closed DRAM page.

4 are no eligible priority requests, allocated-bandwidth requests are serviced. Best-effort channels and priority or allocated-bandwidth charmels that are outside of their allocated bandwidth are serviced with the lowest priority. These two groups can either be combined or serviced as two separate priorities 215, 220 as shown in FIG. 2.

Using this scheduling method, allocated bandwidth and priority channels are substantially guaranteed to receive their allocated bandwidth. Amongst the two, priority channels are serviced with a higher priority, so these charmels experience a lower access latency to the resource. If and when the priority and allocated-bandwidth charmels are not using up the total available bandwidth of the resource, best-effort and other charmels that are outside of their

15 allocation can make use of the resource, thus ensuring that the resource does not sit idle while there are requests

In one embodiment, the different QOS modes used for scheduling may include priority service, allocated-bandwidth service and best-effort service. Each channel is assigned one QOS mode. For a channel to receive priority service or allocated-bandwidth service, it must be given a bandwidth allocation. Priority service provides bandwidth guarantees up to the allocated bandwidth, plus minimum 20

latency service. Allocated-bandwidth service provides only bandwidth guarantees up to the allocated bandwidth. Besteffort service provides no QOS guarantees, and may actually receive no service at all. Additional quality-of-service modes are possible.

waiting. FIG. 3 illustrates one embodiment of an arbitration pro

cess. For purposes of discussion, it is assumed that one channel is assigned to each level. However, it is contemplated that multiple channels can be operative at the same level. At step 300, if there are pending requests to be service by a particular resource, and resource bandwidth is available, step 305, a first level, for example, the level with the

Further arbitration, e.g., using the scheduling history, may 25 highest priority of service is examined, step 310. At step

315, if any requests from channels at the first level are within their bandwidth allocation, a request is issued from one channel of the first level, step 320. If requests from charmels

be used to determine selection of one of a plurality of pending requests at the same level of QOS. For example, if the scheduling history indicates the resource, e.g. a bus, has been operating in one direction, the arbiter may grant 30

priority to requests arguing operation in that same direction. In the present embodiment, each of the charmels 15, 20,

25 are allocated one QOS mode and this information is placed in the QOS configuration unit 40. In order to allocate bandwidth to different channels, it is important to calculate 35

the overall bandwidth available in the resource that is being accessed. This total bandwidth may be relatively easy to calculate, or it may depend on the request stream itself and must therefore be estimated using a particular expected request stream for a particular system. For example, when 40

estimating the total available bandwidth of a DRAM system, it may be necessary to estimate the expected fraction of page hits and page misses in the request stream. When it is not possible to make a precise estimate of the total available bandwidth, a conservative estimate should be made that can 45

be achieved under all or substantially all foreseeable conditions in the given system.

In the present embodiment, priority charmels and allocated-bandwidth charmels are all allocated a certain amount of bandwidth. In order to meet the QOS guarantees under all 50

conditions, no more than the total available bandwidth of the resource should be allocated to these channels.

Channels using a priority QOS mode or allocated-bandwidth mode receive a higher QOS than channels that use a best-effort mode, but only while the charmels continue to 55

operate within their respective bandwidth allocation. It is possible for a channel to request more than its allocated bandwidth, but in this case the QOS may degrade to below or equal best-effort channels.

FIG. 2 illustrates one example of the priority order in 60

which charmels get access to a resource. The top level 205 is reserved for priority channels that are within their bandwidth allocation. If there are any charmels with requests in this category, they are provided access (serviced) as soon as possible, thus achieving low-latency access to the resource. 65

The next lower level 210 is for allocated-bandwidth channels that are within their bandwidth allocation. Thus, if there

at the first level are not within allocation, requests from channels at the next level, which in one embodiment are channels at a level of a next-lower priority, are examined, step 325. If the charmels at the level are within allocation, or alternately are not assigned an allocation bandwidth, step 330, a request is issued from a channel of that level, step 335. This process continues, step 340, for each level of channels.

Utilizing the scheduling system of the present invention, the system can determine whether a given channel is operating within or beyond its allocated bandwidth. One way to implement this mechanism is described by the embodiment of FIG. 4. As shown in FIG. 4, time is divided into equal-sized scheduling periods, for example, periods 410, 420, 430. Bandwidth is allocated on the basis of a fixed number of requests per scheduling period. The scheduling unit decides to schedule those requests at any suitable time during the scheduling period. For purposes of discussion, this scheduling method is referred to as "rate-based" scheduling. By allowing the scheduling unit to schedule requests at any time during the scheduling period it does not rely on a known request arrival time. Furthermore, the scheduling unit is able to schedule requests so as to maximize the efficiency of the resource. In the example of FIG. 4, four requests 412, 414, 416, 418 are always being scheduled in each scheduling period, but the exact time that each request is processed by the resource varies from scheduling period, e.g., period 420, to scheduling period, e.g., period 430.

One embodiment of rate-based scheduling is shown in FIG. 5. The advantage of this embodiment is that it requires very little state-just two counters per channel. The mechanism shown is for one charmel and is sufficient to determine whether the given charmel is above or below its bandwidth allocation. Multiple versions of this can be implemented to support multiple charmels. In addition, it is contemplated that the functionality described can be implemented a variety of ways.

A rate counter 510 is incremented at some small periodic interval, such as once every cycle. In the present embodi-

US 7,191,273 B2 5

ment, the rate counter 510 is configured to have a maximum value that is based on the allocated bandwidth of the channel. For example, if the bandwidth allocated to a particular channel is ten requests during each 100-cycle scheduling period, then the rate counter would be set up with a maximum value of 10.

Once the rate counter 510 reaches its maximum value, it causes the allocation counter 515 to be incremented, thus signaling that there is one more request "credit" available for that channel. The rate counter 510 resets and begins count- 10

ing again. In one embodiment the rate counter is implemented as a simple register or location in memory with associated logic to test the value of the counter. Alternately, the overflow bit of the counter may be used to increment the allocation counter 515 and reset the rate counter 510. 15

Each time a request is sent from that channel to the resource, the allocation counter is decremented, thus removing a credit. As long as the allocation count is positive, the channel is operating within its bandwidth allocation.

In one embodiment, the allocation count does not go 20

beyond the number of requests allocated per scheduling period (either positive or negative). Saturation logic is included to insure that the allocation count does not exceed a specified saturation value. For example, a register or memory corresponding to counter 515 may include control 25

logic that would not change the value in the counter beyond positive saturation value or negative saturation value. This enables the bandwidth use history to fade with time, allowing a channel to use more than its allocation during certain periods when bandwidth is available, while still maintaining 30

the bandwidth guarantees when bandwidth becomes tight again.

While the above-described scheduling method is suitable for all kinds of systems that have multiple requesters competing for a shared resource, it is particularly well suited for 35

shared dynamic random access memory (DRAM) memory systems. DRAM systems are especially difficult to schedule, because the service time of each request depends on the request type (e.g., read or write, burst size, etc.) and the request history which determines whether a particular 40

request hits a page that is open or whether a page must first be opened before it can be accessed. Given a conservative estimate of the bandwidth that can be achieved with a certain set of request streams from different initiating (requester) devices or processes, the described scheduling method can 45

guarantee different QOS to different requestors while achieving a very high DRAM eff1ciency.

The invention has been described in conjunction with the preferred embodiment. It is evident that numerous alternatives, modifications, variations and uses will be apparent to 50

those skilled in the art in light of the foregoing description. What is claimed is: 1. A method, comprising: scheduling access to a resource to meet quality of service

guarantees for requests within an apparatus; 55

storing a first request in a first channel, wherein the first channel is assigned with a first priority level;

keeping track of bandwidth usage from the first channel; demoting a priority level of the first channel based on

exceeding an allotted amount of bandwidth usage from 60

the first channel in a specified period of time; scheduling the first channel to issue the first request to a

resource to meet quality of service guarantees for requests within the apparatus; and

keeping track of the first channel's bandwidth usage 65

history by incrementing a rate counter every unit cycle and by incrementing an allocation counter every pre-

6 determined number of rate count and decrementing the allocation counter each time a request is received from the first channel.

2. A method, comprising: scheduling access to a resource to meet quality of service

guarantees for requests; storing a first request in a first channel that is assigned

with a first priority level; keeping track of scheduling history from the first channel; demoting a priority level of the first channel based on

exceeding a tracked feature of scheduling history associated with the first channel in a specified period of time, wherein the tracked feature of scheduling history associated with the channel is a type of request within a request stream; and

scheduling the first channel to issue the first request to a resource to meet quality of service guarantees for requests.

3. The method of claim 2, further comprising: keeping track of bandwidth usage from the first channel;

and demoting a priority level of the first channel based on

exceeding an allotted amount of bandwidth usage from the first channel in a specified period of time.

4. The method of claim 3, further comprising: continuing to grant access to the resource to requests from

the first channel, if the first channel is at an assigned highest priority and is within its allotted amount of bandwidth usage regardless of whether requests are waiting for access in lower priority channels.

5. The method of claim 3, further comprising: granting access to the resource to a second request from

a second channel at a second priority level that is within its allotted amount of bandwidth usage if there are no requests in channels at the first priority level that are within their allotted amount of bandwidth usage.

6. The method of claim 3, further comprising: granting access to the resource to a third request from a

third channel at a third priority level if there are no requests in channels at the first priority level or a second priority level that are within their allotted amount of bandwidth usage.

7. The method of claim 3, further comprising: promoting the priority level of the first channel when the

first channel is back within its allotted amount of bandwidth usage.

8. The method of claim 3, further comprising: using scheduling history to decide whether the first chan

nel is within its allocated amount of bandwidth usage. 9. The method of claim 3, further comprising: using rate-based scheduling to decide whether the first

channel is within its allocated amount of bandwidth usage.

10. A method, comprising: scheduling access to a resource to meet quality of service

guarantees for requests within an apparatus; storing a first request in a first channel that is assigned

with a first priority level; keeping track of scheduling history from the first channel; demoting a priority level of the first channel based on

exceeding a tracked feature of scheduling history associated with the first channel in a specified period of time; and

scheduling the first channel to issue the first request to a resource to meet quality of service guarantees for requests within the apparatus, wherein the tracked feature of scheduling history associated with the chan-

US 7,191,273 B2 7

nel is an amount of time needed to access the resource relative to a current timing of accessing the resource.







a second channel at a second priority level that is within

10

15

its allotted amount of bandwidth usage if there are no requests in channels at the first priority level that are 20 within their allotted amount of bandwidth usage.


third channel at a third priority level if there are no requests in channels at the first priority level or a 25

second priority level that are within their allotted amount of bandwidth usage.


first channel is back within its allotted amount of 30

bandwidth usage. 16. The method of claim 11, further comprising: using scheduling history to decide whether the first chan



18. An apparatus, comprising: means for scheduling access to a resource to meet quality

35

40

of service guarantees for requests within the apparatus; means for storing a first request in a first channel, wherein

the first channel is assigned with a first priority level; means for keeping track of bandwidth usage from the first 45

channel; means for demoting a priority level of the first channel

based on exceeding an allotted amount of bandwidth usage from the first channel in a specified period of time; 50

means for scheduling the first channel to issue the first request to a resource to meet quality of service guarantees for requests within the apparatus; and

means for keeping track of the first channel's bandwidth usage history by incrementing a rate counter every unit 55

cycle and by incrementing an allocation counter every predetermined number of rate count and decrementing the allocation counter each time a request is received from the first channel.


of service guarantees for requests; a first channel that is assigned with a first priority level to

store a first request; means for keeping track of bandwidth usage from the first

channel;

60

65

8 means for demoting a priority level of the first channel

based on exceeding an allotted amount of bandwidth usage from the first channel in a specified period of time;

means for scheduling the first channel to issue the first request to a resource to meet quality of service guarantees for requests;

an allocation counter; and a rate counter to keep track of the first channel's band

width usage history by incrementing the rate counter every unit cycle and by incrementing the allocation counter every predetermined number of rate count and decrementing the allocation counter each time a request is received from the first channel.

20. An apparatus, comprising: means for scheduling access to a resource to meet guality

of service guarantees for requests within the apparatus; means for storing a first request in a first channel, wherein

the first channel is assigned with a first priority level; means for keeping track of scheduling history from the

first channel; means for demoting a priority level of the first channel

based on exceeding a tracked feature of scheduling history associated with the first channel in a specified period of time; and

means for scheduling the first channel to issue the first request to a resource to meet quality of service guarantees for requests within the apparatus, wherein the tracked feature of scheduling history associated with the channel is a type of request within a request stream.







a second channel at a second priority level that is within its allotted amount of bandwidth usage if there are no requests in channels at the first priority level that are within their allotted amount of bandwidth usage.


third channel at a third priority level if there are no requests in channels at the first priority level or a second priority level that are within their allotted amount of bandwidth usage.


first channel is back within its allotted amount of bandwidth usage.

26. The method of claim 21, further comprising: using scheduling history to decide whether the first chan



US 7,191,273 B2 9


of service guarantees for requests; a first channel that is assigned with a first priority level to

store a first request; means for keeping track of scheduling history from the

first channel;

10

means for demoting a priority level of the first channel based on exceeding a tracked feature of scheduling history associated with the first channel in a specified 10

period of time, wherein the tracked feature of scheduling history associated with the channel is an amount

within the apparatus, the arbiter configured to demote a priority level of a first channel based on exceeding a tracked feature of scheduling history associated with the first channel in a specified period of time, the arbiter configured with a quality-of-service (QOS) guarantee for requests from the first channel, the arbiter configured to schedule the first channel to issue a first request to the resource to meet the quality of service guarantee for the first request, wherein the tracked feature of scheduling history associated with the channel is an amount of time needed to access the resource relative to a current timing of accessing the resource.

of time needed to access the resource relative to a current timing of accessing the resource; and

means for scheduling the first channel to issue the first request to a resource to meet quality of service guarantees for request.

29. An apparatus, comprising: an arbiter configured to schedule access to a resource for

requests from a plurality of channels within the apparatus, where one or more of the channels have an assigned priority level and convey to the arbiter requests to access the resource, the arbiter configured to determine if the resource is available to service requests

30. The apparatus of claim 29, wherein the arbiter is configured to continuously grant access to the resource to

15 requests from the first channel, if the first channel is at an assigned highest priority and is within its allotted amount of bandwidth usage regardless of whether requests are waiting for access in lower priority channels.

31. The apparatus of claim 29, wherein the arbiter is 20 configured to promote the priority level of the first channel

when the first channel is back within its allotted amonnt of bandwidth usage.

* * * * *

Exhibit F

(12) United States Patent Ebert et al.

(54) METHOD AND APPARATUS FOR DECOMPOSING AND VERIFYING CONFIGURABLE HARDWARE

(75) Inventors: Jeffrey Allen Ebert, Half Moon Bay, CA (US); Ravi Venugopalan, Santa Clara, CA (US); Scott Carlton Evans, Santa Clara, CA (US)




(21) Appl. No.: 10/293,734

(22) Filed: Nov. 12, 2002

( 65) Prior Publication Data

US 2004/0093186 A1 May 13, 2004

(51) Int. Cl? ................................................. G06F 11/30 (52) U.S. Cl. ........................... 702/182; 702/57; 702/85;

702/123; 714/733; 714/738; 716/4; 716/5; 716/7; 716/12; 703/23; 324/763; 324/765

(58) Field of Search ............................... 702/57-59, 85, 702/117-123, 182, 189; 714/733, 738; 716/4,

5, 7, 12; 703/23, 25; 324/763, 765



5,801,956 A * 9/1998 Kawamura et a!. ............ 716/4 2002/0091979 A1 * 7/2002 Cooke eta!. ............... 714/733 2002/0161568 A1 * 10/2002 Sample et a!. ................ 703/25 2002/0171449 A1 * 11/2002 Shimizu et a!. ............. 324/765

105

CONFIGURATION DATA STORAGE

UNIT

101

111111 1111111111111111111111111111111111111111111111111111111111111 US006816814B2

(10) Patent No.: US 6,816,814 B2 *Nov. 9, 2004 (45) Date of Patent:

2003/0067319 A1 * 4/2003 Cho ........................... 324/765

OTHER PUBLICATIONS

ALDEC, 'What is TCL/TK Scrpting', , Jan. 2002, ALDEC Support, pp. 1-9.* Thaker et al., 'Register-Transfer Level Fault Modeling and Testing Evaluation Techniques for VLSI Circuits', Jan. 2000, IEEE Publication, pp. 940-949.* Lin et al., 'A Functional Test Planning System for Validation of DSP Circuits Modeled in VHDL', Mar. 1998, IEEE Publication, pp. 172-177. * Evans et al, Honey I Shrunk the SOC Verification Problem, Sanies Inc., SNUG San Jose 2001, 11 pages. Thaker et al., "Register-Transfer Level Fault Modeling and Test Evaluation Techniques for VLSI Circuits", lTC International Test Conference, 2000 IEEE, Paper 35.3, pp. 940-949. VSI Alliance reference, "An overview of VSIA" from http://www.vsi.org/aboutVSINindex.htm, 2004. Lin et al., "A Functional Test Planning System for Validation of DSP Circuits Modeled in VHDL", 1998 International Verilog, pp. 172-177.

* cited by examiner

Primary Examiner-Marc S. Hoff Assistant Examiner-Elias Desta (74) Attorney, Agent, or Firm-Blakely, Sokoloff, Taylor & Zafman LLP

(57) ABSTRACT

The present invention includes a method and apparatus for decomposing and verifying configurable hardware. In one embodiment, the method includes automatically decomposing a hardware system into a set of one or more units, creating a test-bench for each of the set of units, and verifying each of the set of units before verifying the hardware system design.


GENERATION AND VERIFICATION UNIT 103

DESIGN( AND

TEST

104

U.S. Patent Nov. 9, 2004

105


UNIT

CONFIGURATION DATA

113

DESIGN( AND

TEST

104

101

SYSTEM

Sheet 1 of 7

GENERATION MODULE 107

DECOMPOSITION MODULE 109

VERIFICATION MODULE 111


UNIT 1

FIG. 1

US 6,816,814 B2

UNITN


READ CONFIGURATION

DATA. 202

ANALYZE CONFIGURABLE

HARDWARE DA1ABASE.

204

CREATE SYSTEM. 206

-DECOMPOSE THE SYSTEM AND VERIFY THE

SYSTEM COMPONENTS.

208

Sheet 2 of 7 US 6,816,814 B2

200 ~

FIG. 2

U.S. Patent

300~

Nov. 9, 2004

DOES THE CONFIGURATION DATA

CONFORM TO RULES FOR SYNTAX AND SEMANTICS?

302

DERIVE PARAMETERS.

304

CONFIGURE THE HOL

PREPROCESSING STATEMENTS

BASED ON THE PARAMETERS.

306

PREPROCESS HDL , SOURCE CODE

308

Sheet 3 of 7

PRODUCEHDL CODE FOR THE CONFIGURABLE

HARDWARE SYSTEM SPECIFIED

IN THE CONFIGURATION

DATA. 310

FIG.3

US 6,816,814 B2


403 ~

\ -··-·

410~ UNIT1

----- UNIT3

401~

411

f" ----------- ... :t. --------------1 l/405 lr- 408 . '

\ : .

' - UNIT2 :.. ,;--

' l UNIT4

407

' . :. ______________ -------- -~ ------- ___ :

SYSTEM 400

,_j 409

FIG. 4


MAP SYSTEM CONFIGURATION

DATA ONTO A SELECTED UN~T'S

PARAMETERS. 502

FOR EACH CONNECTION TO OTHER UNITS ADD INTERFACE MODELS,

MONITORS, PROTOCOL CHECKERS, ETC.

504 . ___ _j

GENERATE NEW I CONFIGURATION I

DATA FILE. 506

··-------.----'

GENERATE DESIGN BASED ON THE

CONFIGURATION DATA FILE.

508

GENERATE TESTS AND SCRIPTS AND/

OR INPUTS TO ANALYSIS TOOL.

510

I

Sheet 5 of 7 US 6,816,814 B2

EXECUTE THE - SCRIPTS.

512

1 END.

FIG. 5

U.S. Patent Nov. 9, 2004 Sheet 6 of 7

410 403

UNIT 1

401~

····---'------· ; MODELl 1 B i ' ' '-•······ --- ____ ,

TEST STIMULUS AND RESPONSE

CHECKING UNIT 415

FIG. 6

US 6,816,814 B2

~ MO~EL '


PROCESSOR($) 702 CONFIGURATION

HARDWARE GRAPHICS LIBRARY

CONTROLLER 734 101

~ r 120 I DISPLAY MEMORY732 I-

DEVICE 737

c IDE DRIVE(S) 742

\ GENERATION

AND VERIFICATION

UNIT 103

USB PORT{S) 744 ~

+ INPUT/OUTPUT CONFIGURATION

CONTROLLER HUB DATA STORAGE w (ICH)740 UNIT KEYBOARD 751 105

MOUSE752 ~

~ PARALLEL PORT(S) 753

SERIAL PORT(S) 754 ~ FLOPPY DISK DRIVE SYSTEM 700

755

NETWORK i .. -~

INTERFACE 756

FIG. 7

US 6,816,814 B2 1 2

FIG. 4 is a conceptual block diagram of a system design according to embodiments of the invention;

METHOD AND APPARATUS FOR DECOMPOSING AND VERIFYING

CONFIGURABLE HARDWARE


FIG. 5 is a flow diagram illustrating operations for decomposing and verifying a configurable hardware system

5 according to embodiments of the invention;

The present invention pertains to hardware verification. More particularly, the present invention relates to verifying configurable hardware.


FIG. 6 is a block diagram illustrating a test-bench for verifying a unit according to embodiments of the invention;

FIG. 7 illustrates an exemplary system for decomposing

10 and verifying configurable hardware, according to embodiments of the invention;

"Configurable hardware" or "parameterized hardware" describes hardware systems that are customized automatically at design creation time by using specified values for a 15 set of parameters or attributes. Such hardware may also support changes at run-time depending on parameter settings. Configurable hardware systems typically provide bet-ter performance than software running on a general-purpose computer system and greater flexibility than conventional 20 application specific integrated circuits (ASI Cs) without increasing circuit size and cost.


A method and apparatus for decomposing and verifying configurable hardware are described. Note that in this description, references to "one embodiment" or "an embodiment" mean that the feature being referred to is included in at least one embodiment of the invention. Further, separate references to "one embodiment" in this description do not necessarily refer to the same embodiment; however, neither are such embodiments mutually exclusive, unless so stated and except as will be readily apparent to those skilled in the art. Thus, the present invention can include any variety of combinations and/or integrations of the embodiments described herein.

Herein, block diagrams illustrate exemplary embodiments of the invention. Also herein, flow diagrams illustrate operations of the exemplary embodiments of the invention. The operations of the flow diagrams will be described with reference to the exemplary embodiments shown in the block diagrams. However, it should be understood that the operations of the flow diagrams could be performed by embodiments of the invention other than those discussed with

In conventional hardware systems, it is necessary to verify a system's functionality by testing the system and its components. Typically, the complexity of verifying a sys- 25

tern's functionality increases with the number of components that make up the system. Therefore, the conventional approach is to manually verify each unit individually and then to assemble the "known good units" into a system. If hardware is hierarchically arranged, verification must be 30

performed for each level in the hierarchy. If each individual unit has been verified before assembling the system, verifying system functionality can focus on potential problem with interactions between components rather than on each component's capabilities.

Configurable hardware systems can be verified using this type of convention hierarchical decomposition. However, because each instance of a configurable hardware system is different, each time a configuration parameter is modified, the system and its components must be manually verified. 40

The cost of repeatedly manually verifying a system and its components often offsets the advantages of configurable hardware.

reference to the block diagrams, and embodiments discussed 35 with references to the block diagrams could perform opera

tions different than those discussed with reference to the


The present invention includes a method and apparatus for decomposing and verifying configurable hardware. In one embodiment, the method includes automatically decomposing a hardware system into a set of one or more units, creating a test-bench for each of the set of units, and verifying each of the set of units before verifying the hardware system design.


The present invention is illustrated by way of example and not limitation in the Figures of the accompanying drawings, in which like references indicate similar elements and in which:

FIG. 1 is a block diagram for decomposing and verifying configurable hardware according to one embodiment of the invention;

FIG. 2 is a block diagram illustrating a system for creating and verifying configurable hardware according to one embodiment of the invention;

FIG. 3 is a flow diagram illustrating the creation of a system, according to embodiments of the invention;

flow diagrams. Overview

In one embodiment of the invention, a generation and verification unit generates a configurable hardware system based on configuration data and a configurable hardware library. The configurable hardware system is made up of a number of units. In one embodiment of the invention, the generation and verification unit hierarchically decomposes a

45 configurable hardware system into units that make up the system design. Configuration data is applied to each unit so that it can be removed and verified or analyzed outside of the system without changing the unit itself. The generation and verification unit creates a test-bench, tests, and controlling

50 scripts for each unit. Exemplary Architecture

FIG. 1 is a block diagram for decomposing and verifying configurable hardware according to one embodiment of the invention. FIG. 1 includes a generation and verification unit

55 103, which further includes a generation module 107, decomposition module 109, and verification module 111. The generation and verification unit 103 is connected to a configurable hardware library 101, and a configuration data storage unit 105. The configuration data storage unit 105

60 includes configuration data 113. The generation and verification unit 103 generates a configurable hardware system 104 and its constituent units (illustrated as units 1-N).

In a configurable hardware system design hierarchy, the term "system" refers to the composition of units at a

65 particular hierarchy level, where details of the units are hidden. Therefore, at a particular level in a configurable hardware system design hierarchy, units are indivisible

US 6,816,814 B2 3

components. However, at lower hierarchy levels, the units from a higher level have their details and internal components exposed. For example, referring to FIG. 1, at one design hierarchy level, the system 104 is viewed as a "black box" unit of a larger system, where the details about units 5 1-N are concealed. However, at a lower design hierarchy level, the system 104 is viewed as including units 1-N, where the unit connection details are exposed. At even lower levels of the configurable hardware system design hierarchy, the internal details of units 1-N are exposed. At the lowest

10 hierarchy level, a unit cannot be decomposed. The generation of system 104 and units 1-N will be described in more detail below in FIG. 4.

4 configurable hardware library 101 to determine the possible configurations of the hardware components necessary for generating the hardware system defined by the configuration data 113. Control continues at block 206.

As shown in block 206, a configurable hardware system is created. For example, the generation module 107 creates a configurable hardware system based on the configuration data 113 and the configurable hardware library 101. The operation of block 206 is further described below with reference to FIG. 3. Control continues at block 208.

At block 208, the system is decomposed and the system and its components are verified. For example, the decomposition module 109 and the verification module 111 decompose and verify the system components. The operation in

15 block 208 will be described in more detail below in the

The configuration data storage unit 105 includes configuration data 113, which hierarchically describes a configurable hardware system. For example, the configuration data 113 specifies the system and unit parameters at all relevant hierarchy levels. While the end user sets most parameters in the configuration data 113, the generation and verification unit 103 sets some parameters during the hardware integration and/or decomposition process. The configuration data 20

113 may be represented by any suitable electronic design automation scripting language, according to embodiments of the invention. In one embodiment of the invention, the configuration data 113 is represented in the tool control language (TCL) scripting language. In particular, the con- 25

figuration data 113 may include a TCL text file defining a system design name, system-level parameters, unit-level names and parameters, unit-level connection parameters (e.g., number of wires in a signal bundle, handshaking protocols, pipelining behavior, etc.), and interface state- 30

ments for binding unit instances to particular connections. In an alternative embodiment of the invention, this system information could be represented in the extensible markup language (XML) format or in a relational database.

Because multiple instances of any particular hardware 35

unit can be included in a hardware system, each unit instance is uniquely named in the configuration data 113. Moreover, different instances of the same unit can be configured differently. For example, one instance of a FIFO may be configured to have a depth of 10 bytes, while another 40

instance of a FIFO may be configured to have a depth of 100 bytes.

The configurable hardware library 101 describes all possible configurations of the system's hardware components. For example, the configuration hardware library 101 may 45

describe all possible configurations of a FIFO, including its depth, width, and other configurable parameters. In one embodiment of the invention, the configurable hardware library includes hardware description language (HDL) code (e.g. Verilog or VHDL) embedded with preprocessing state- 50

ments that describe how to interpret the configuration data 113.

discussion of FIG. 5. It should be evident to one of ordinary skill in the art that

the operations described in the flow diagram 200 could be repeated for generating and verifying hardware at any level in the hierarchical system design. For example, to verify a system at a particular hierarchy level, all of the system's components must be verified. This may require verifying lower level systems, which may in turn require verifying even lower level systems. Once the lowest level system is verified, the higher level systems may in turn be verified. Hence, the operations set forth in the flow diagram 200 can be repeated for creating and verifying systems and/or components at any design hierarchy level.

FIG. 3 is a flow diagram illustrating the creation of a system, according to embodiments of the invention. The operations of the flow diagram 300 will be described with reference to the exemplary embodiment illustrated in FIG. 1. At decision block 302, it is determined whether the configuration data conforms to rules for syntax and semantics. For example, the integration module 109 determines whether configuration data 113 from the configuration data storage unit 105 conforms to rules for syntax and semantics. As a more specific example, in an embodiment where the configuration data 113 is represented by a TCL text file, the integration module 109 determines whether the TCL file conforms to the syntax and semantics rules of the HDL used by the configurable hardware library 101. In one embodiment, the integration module 109 employs a highlevel language program (e.g., a C++, Python, or Java program) to analyze a TCL file for syntax and semantics. If the configuration data file conforms to the syntax and semantics rules, control continues at block 304. Otherwise, the flow ends with an error report. Control continues at block 304.

At block 304, parameters are derived. For example, the integration module 109 derives system parameters from the configuration data 113. As a more specific example, in one embodiment, the integration module 109 derives the system's parameters by analyzing a TCL file, which defines a

FIG. 2 is a flow diagram illustrating the creation, decomposition, and verification of a configurable hardware system, according to embodiments of the invention. The operations of the flow diagram 200 will be described with reference to the block diagram of FIG. 1. At process block 202, the configuration data is read. For example, according to the embodiment of the invention illustrated in FIG. 1, the generation module 107 of the generation and verification unit 103 reads the configuration data 113 from the configuration data storage unit 105. As noted above, the configuration data 113 may be a TCL file that hierarchically defines

55 configurable hardware system. For example, a system parameter may specify the minimum bandwidth required for an internal communications path. From this setting, parameters for specifying the number of wires used at various connection points in the system are derived according to the

a configurable hardware system. Control continues at block 204.

At block 204, the configurable hardware library is analyzed. For example, the generation module 107 analyzes the

60 rules in the configuration data. Control continues at block 306.

As shown in block 306, the preprocessing statements are configured based on the derived parameters. For example, in one embodiment of the invention, the integration module

65 109 configures HDL code preprocessing statements (stored in the configurable hardware library 101) that are affected by the specified and derived parameters. In doing this, the

US 6,816,814 B2 5

integration module 109 may impart particular values or control structures to preprocessor statements embedded in the HDL code. Control continues at block 308.

6 model, an interface monitor, and/or a protocol checker. This operation is conceptually illustrated in FIG. 6. In FIG. 6, model A is connected to unit 1 through communication path 403, while model B is connected to unit 1 through commu-As shown in block 308, the HDL source code is prepro

cessed. For example, the integration module 109 preprocesses the HDL source code that was configured according to the derived parameters. In one embodiment of the invention, the integration module 109 includes a macro language preprocessor (e.g., a C preprocessor, an M4 preprocessor, or a SIMPLE preprocessor) for preprocessing the embedded HDL source code. Control continues at block 310.

5 nication path 401. Model C is connected to unit 1 through communication path 410. The test stimulus and response checking unit 415 is connected to models A, B, and C. The test stimulus and response checking unit 415 monitors and facilitates testing operations. In the test-bench, models are

At block 310, the HDL code for the configurable hardware system specified in the configuration data is generated. For example, the integration module 109 generates the HDL code for the system specified in the configuration data 113 using HDLcode from the configurable hardware library 101. From block 310, control ends.

10 used for sending and receiving information to the unit being verified. For example, models A and B will receive streams of data from unit 1 according to unit 1's parameters (e.g., according to the particular communication protocol defined for the particular communication path). Similarly, model C will transmit data to unit 1 according to unit 1' s parameters.

15 The particular data to be transmitted to and from the unit will be determined by the tests used for verifying the unit. These tests will be discussed in more detail below. From block 504, control continues at block 506.

At block 506, the configuration data is generated. For FIG. 4 is a conceptual block diagram of a system design according to embodiments of the invention. As described above, according to an embodiment of the invention, the operations of FIG. 3 produce a system design represented in HDL code. FIG. 4 provides a graphical representation of such a system. FIG. 4 includes system 400, which includes unit 1, unit 2, unit 3, and unit 4. In system 400, unit 1 communicates to systems outside of the system 400 over a communication path 410. Unit 1 is coupled to unit 2 and unit

20 example, decomposition module 109 generates configuration data 113 specifying the selected unit's parameters. According to one embodiment of the invention, the decomposition module 109 generates configuration data 113 in the form of a TCL file, as described above in the discussion of

25 FIG. 1. According to an alternative embodiment, the decomposition module 109 generates configuration data 113 in the form of an XML file.

3 with communication paths 401 and 403, respectively. Unit 2 communicates with systems outside of system 400 through communication path 411. Unit 3 is coupled to unit 4 with 30

communication paths 405 and 408. Unit 3 is also coupled to unit 2 with a communication path 409.

Unit 2, illustrated with broken lines, is an optional unit in the system 400. Connection paths 401 and 409 are also optional. For a given level of the system design hierarchy, a 35

unit (or connection path) is optional when it is unknown whether factors external to the system will require the optional unit's functionality (e.g., a system at a higher level in the design hierarchy). For example, if system 400 could be configured to operate in two different modes, unit 2 40

would be optional if it's functionality were required by the first mode, but not by the second mode.

FIGS. 5 and 6 illustrate how system 400 is decomposed and verified according to embodiments of the invention. FIG. 5 is a flow diagram illustrating operations for decom- 45

posing and verifying a configurable hardware system according to embodiments of the invention. FIG. 6 is a block diagram illustrating a test-bench for verifying a unit according to embodiments of the invention. FIGS. 5 and 6 will be described with reference to the exemplary system of FIG. 4 50

and the exemplary embodiment of FIG. 1.

At block 508, a design based on the configuration data 113 is generated. For example, the generation module 107 uses the configurable hardware library 101 to generate a configurable hardware system design based on the configuration data 113. This operation is described in more detail above, in the discussion of FIG. 3. In one embodiment, the design is represented by HDL code. Control continues at block 510.

At block 510, tests and scripts and/or inputs to an analysis tool are generated. For example, the verification module 111 generates tests and scripts for running the tests and/or inputs to analysis tools. From block 510, control continues at block 512. In generating the tests, the verification module 111 may use pre-existing tests that are known to verify the functionality of a particular unit or it may generate customized tests based on an analysis of the unit configuration. These tests will exercise and verify the functionality of the configured unit being tested. According to an embodiment of the invention, the verification module 111 generates tests that are capable of verifying any configuration of the unit. In this embodiment, the tests read the configuration data 113 and modify their stimulus accordingly while the test is running, rather than before testing begins. The verification module 111 can also generate scripts for automatically performing the tests.

As an additional or alternative form of testing, the veri-fication module 111 provides the design to an analysis tool, which performs a static analysis of the design. For example, according to one embodiment of the invention, the verification module 111 provides the unit design represented by HDL code to a static verification tool that analyzes the HDL code for errors. In one embodiment, the static verification tool generates warnings and or error messages based on its

Referring to the flow diagram of FIG. 5, at block 502, the configuration data 113 is mapped onto a selected unit's parameters. For example, the decomposition module 109 maps the configuration data 113 defining system 400 onto 55

the parameters of a unit of system 400 (e.g., unit 1). As a more specific example, the decomposition module 109 analyzes the configuration data 113 to determine how unit 1's parameters should be configured to meet the requirements of system 400. Control continues at block 504. 60 analysis of the HDL code.

At block 504, for each connection to other units, interface models, monitors, and/or protocol checkers are added to the unit. For example, the verification module 111 analyzes the configuration data 113 to determine the connections for the selected unit (e.g., unit 1). For each connection (e.g., com- 65

munication path), the decomposition module 109 couples a model to the unit, which may include an interface driver

As shown in block 512, the scripts are executed. For example, the verification module 111 executes the scripts, which automatically test and verify the selected unit.

It should be apparent to one of ordinary skill in the art that the operations described in the flow diagram of FIG. 5 can be repeated to verify any unit/system at any level in a design hierarchy.

US 6,816,814 B2 7 8

through one or more USB ports 744. For one embodiment of the invention, the ICH 740 also provides an interface to a keyboard 751, a mouse 752, a floppy disk drive 755, one or more suitable devices through one or more parallel ports 753

Referring to FIG. 1, the generation and verification unit 103, configurable hardware library 101, and configuration data storage unit 105 may be implemented in the form of a conventional computing platform, including one or more processors, application specific integrated circuits (ASICs), memories, and/or machine readable media whereon instructions are stored for performing operations according to embodiments of the invention. Machine-readable media

5 (e.g., a printer), and one or more suitable devices through one or more serial ports 754. For one embodiment of the invention, the ICH 740 also provides a network interface 756 though which the computer system 700 can communi-

includes any mechanism that provides (i.e., stores and/or transmits) information in a form readable by a machine (e.g., 10

a computer). For example, a machine-readable medium includes read only memory (ROM); random access memory (RAM); magnetic disk storage media; optical storage media; flash memory devices; electrical, optical, acoustical or other form of propagated signals (e.g., carrier waves, infrared 15

signals, digital signals, etc.); etc. In one embodiment, the units shown in FIG. 1 are machine readable media executing on a processor to carryout the operations described herein. However, in alternative embodiments, the units ofFIG.1 are other types of logic (e.g., digital logic) for executing the 20

operations described herein. Alternatively, according to one embodiment of the invention, the generation and verification unit 103, configurable hardware library 101, and configuration data storage unit 105 can include one or more separate computer systems. It should also be understood that, accord- 25

ing to embodiments of the invention, the components illustrated in FIG. 1 could be distributed over a number of networked computers, wherein they could be remotely controlled and operated.

FIG. 7 illustrates an exemplary system for decomposing 30

and verifying configurable hardware, according to embodiments of the invention. As illustrated in FIG. 7, computer system 700 comprises processor(s) 702. Computer system 700 also includes a memory 732, processor bus 710 and input/output controller hub (ICH) 740. The processor(s) 702, 35

memory 732 and ICH 740 are coupled to the processor bus 710. The processor(s) 702 may comprise any suitable processor architecture. For other embodiments of the invention, computer system 700 may comprise one, two, three, or more processors, any of which may execute a set of instructions 40

that are in accordance with embodiments of the present invention.

cate with other computer and/or devices. Accordingly, computer system 700 includes a machine

readable medium on which is stored a set of instructions (i.e., software) embodying any one, or all, of the methodologies described herein. For example, software can reside, completely or at least partially, within memory 732 and/or within processor(s) 702.

While the invention has been described in terms of several embodiments, those skilled in the art will recognize that the invention is not limited to the embodiments described. The method and apparatus of the invention can be practiced with modification and alteration within the spirit and scope of the appended claims. The description is thus to be regarded as illustrative instead of limiting on the invention.

We claim: 1. A computer implemented method, comprising:

automatically decomposing a configurable hardware sys-tem into a set of one or more units;

creating a test-bench for each of the set of units; and

verifying each of the set of units before verifying the configurable hardware system, wherein the configurable hardware system is customized at design cre-ation time by using specified values for a set of parameters and a first instance of the configurable hardware system is different in function than a second instance of the configurable hardware system.

2. The computer implemented method of claim 1, wherein the set of units is defined in a configurable hardware library, and wherein the configurable hardware system is specified in a configuration data storage unit.

3. The computer implemented method of claim 2, wherein the configuration data is represented in a hierarchical language and wherein the configurable hardware library is represented in hardware design language (HDL). The memory 732 stores data and/or instructions, and may

comprise any suitable memory, such as a dynamic random access memory (DRAM), for example. In one embodiment

4. The computer implemented method of claim 1, wherein

45 the test-benches include models attached to each unit connection, wherein the models send data to and receive data from the unit according to parameters of the unit.

of the invention, the configuration hardware library 101, generation and verification unit 103, and configuration data storage unit 105 are stored in memory 732. However, they may be stored in any or all IDE drive(s) 742, memory 732, and/or other suitable storage devices. A graphics controller 50

734 controls the display of information on a display device 737, according to embodiments of the invention.

The input/output controller hub (ICH) 740 provides an interface to 1!0 devices or peripheral components for computer system 700. The ICH 740 may comprise any suitable 55

interface controllers to provide for any suitable communication link to the processor(s) 702, memory 732 and/or to any suitable device or component in communication with the ICH 740. For one embodiment of the invention, the ICH 740 provides suitable arbitration and buffering for each 60

interface. For one embodiment of the invention, the ICH 740

provides an interface to one or more suitable integrated drive electronics (IDE) drives 742, such as a hard disk drive (HDD) or compact disc read only memory (CD ROM) drive 65

for example, to store data and/or instructions for example, one or more suitable universal serial bus (USB) devices

5. A computer implemented method comprising:

automatically decomposing a set of one or more units at a first level of a configurable hardware system design hierarchy, into a set of one or more units of a lowest level of the hardware system design hierarchy, wherein the configurable hardware system design hierarchy includes a set of one or more hierarchy levels; and

individually verifying units of each hierarchy level of the hardware system design hierarchy successively from the lowest level to the first level with test benches dynamically built for each unit of each successive level, wherein the configurable hardware system is customized at a design creation time.

6. The computer implemented method of claim 5, wherein the automatically decomposing is based on configuration data and the contents of a configurable hardware library.

7. The computer implemented method of claim 6, wherein the configuration data specifies parameters for the units of each of the set of configurable hardware system design levels.

US 6,816,814 B2 9

8. The computer implemented method of claim 6, wherein the configurable hardware library defines the units of each of the set of hierarchy levels.

9. A computer implemented method, comprising: mapping a set of configuration data onto a corresponding 5

configurable unit definition selected from a configurable hardware library to generate a set of one or more configurable hardware units;

dynamically generating a test bench for each of the set of configurable hardware units based on the configuration 10

data;

verifying each of the set of configurable hardware units with their corresponding test bench;

integrating the verified set of configurable hardware units 15 into a configurable hardware system; and

verifying the configurable hardware system during a functional logic verification in a design creation en vironment.

10. The computer implemented method of claim 9 further 20

comprising:

generating tests to be run on the test-benches;

generating scripts for executing the tests; and

generating inputs to an analysis tool. 25

11. The computer implemented method of claim 9, wherein the corresponding configurable unit definition is represented in hardware design language (HDL).

12. The computer implemented method of claim 9, wherein the configuration data is represented in tool control

30 language (TCL), and wherein said the configuration data defines parameters for each of the set of configurable hardware units.

10 a configuration data storage unit including configuration

data to define a configurable hardware system design, the configuration data storage unit coupled to the generation and verification unit; and

a configurable hardware library to store definitions of configurable hardware units for integrating and decomposing hardware systems, the configurable hardware library coupled to the generation and verification unit.

19. The apparatus of claim 18, wherein the definitions of the configurable hardware units are represented in a hardware design language (HDL).

20. The apparatus of claim 18, wherein the configuration data is represented a hierarchical language.

21. The apparatus of claim 18, wherein the generation and verification unit includes a generation module to a generate units and systems within a hardware design hierarchy, a decomposition module to decompose the units and the systems, and a verification module to build test-benches for the units and the systems and to verify the units and the systems.

22. The apparatus of claim 21 wherein the units and systems are represented in HDL.

23. A machine-readable medium that provides instructions, which when executed by a machine, cause the machine to perform operations comprising:

automatically decomposing a configurable hardware sys-tem into a set of one or more units;

creating a test-bench for each of the set of units; and verifying each of the set of units before verifying the

configurable hardware system design, wherein the configurable hardware system is customized at design creation time by using specified values for a set of parameters and a first instance of the configurable hardware system is different in function than a second instance of the configurable hardware system. 13. The computer implemented method of claim 9

wherein the test-benches include models connected to each communication path of the unit.

35 24. The machine-readable medium of claim 23, wherein the set of units is defined in a configurable hardware library, and wherein the system is specified in a configuration data storage unit.

14. A computer implemented method of functional logic verification, comprising:

building a set of one or more test benches for individual units decomposed from a configurable hardware system based on configuration data and a configurable hardware library, wherein the individual units are of a lower level of a configurable hardware design hierarchy, wherein the system is of a higher level of a configurable hardware design hierarchy;

25. The machine-readable medium of claim 24, wherein the configuration data is represented a hierarchical language

40 and wherein the configurable hardware library is represented in hardware design language (HDL).

26. The machine-readable medium of claim 23, wherein the test-benches include models attached to each unit connection, wherein the models send data to and receive

45 data from the unit according to parameters of the unit. building a system test-bench; and

verifying the configurable hardware system after verifying the individual units.

15. The computer implemented method of claim 14, wherein the system is verified using the system test-bench 50

and the individual units are verified using the set of testbenches.

16. The computer implemented method of claim 14 further comprising:

generating tests to be run on the set of test-benches and 55

the system test-bench;


generating inputs to an analysis tool. 17. The computer implemented method of claim 14, 60

wherein the configurable hardware library defines the units included in the system, and wherein the configuration data specifies the parameters of the system.


a generation and verification unit to automatically 65

generate, decompose, and verify a configurable hard-ware system;

27. A machine-readable medium that provides instructions, which when executed by a machine, cause the machine to perform operations comprising:

automatically decomposing a set of one or more units at a first level of a configurable hardware system design hierarchy, into a set of one or more units of a lowest level of the hardware system design hierarchy, wherein the configurable hardware system design hierarchy includes a set of one or more hierarchy levels; and

individually verifying units of each hierarchy level of the hardware system design hierarchy successively from the lowest level to the first level with test benches dynamically built for each unit of each successive level, wherein the configurable hardware system is customized at design creation time.

28. The machine-readable medium of claim 27, wherein the automatically decomposing is based on configuration data and the contents of a configurable hardware library.

29. The machine-readable medium of claim 28, wherein the configuration data specifies parameters for the units of each of the set of configurable hardware system design levels.

US 6,816,814 B2 11

30. The machine-readable medium of claim 28, wherein the configurable hardware library defines the units of each of the set of hierarchy levels.

31. A machine-readable medium that provides instructions, which when executed by a machine, cause the 5

machine to perform operations comprising:

mapping a set of configuration data onto a corresponding configurable unit definition selected from a configurable hardware library to generate a set of one or more configurable hardware units; 10

dynamically generating a test bench for each of the set of configurable hardware units based on the configuration data;

verifying each of the set of configurable hardware units 15

with their corresponding test bench;

integrating the verified set of configurable hardware units into a configurable hardware system; and

12 35. The machine-readable medium of claim 31, wherein

the test-benches include models connected to each communication path of the unit.

36. A machine-readable medium that provides instructions, which when executed by a machine, cause the machine to perform operations of functional logic verification, comprising:

building a set of one or more test benches for individual units decomposed from a configurable hardware system based on configuration data and a configurable hardware library, wherein the individual units are of a lower level of a configurable hardware design hierarchy, wherein the system is of a higher level of a configurable hardware design hierarchy;

building a system test-bench; and

verifying the configurable hardware system after verifying the individual units.

verifying the configurable hardware system during a functional logic verification in a design creation en vironment.

32. The machine-readable medium of claim 31, further comprising:

37. The machine-readable medium of claim 36, wherein 20 the system is verified using the system test-bench and the

individual units are verified using the set of test-benches. 38. The machine-readable medium of claim 36 further

comprising:

generating tests to be run on the set of test-benches and generating tests to be run on the test-benches;


generating inputs to an analysis tool.

25 the system test-bench;

33. The machine-readable medium of claim 31, wherein the corresponding configurable unit definition is represented in hardware design language (HDL).

30 34. The machine-readable medium of claim 31, wherein

the configuration data is represented in tool control language (TCL), and wherein the configuration data defines parameters for each of the set of configurable hardware units.


generating inputs to an analysis tool. 39. The machine-readable medium of claim 36, wherein

the configurable hardware library defines the units included in the system, and wherein the configuration data specifies the parameters the system.

* * * * *

Exhibit G

c12) United States Patent Ebert et al.

(54) METHOD AND APPARATUS FOR DECOMPOSING AND VERIFYING CONFIGURABLE HARDWARE

(75) Inventors: Jeffrey Allen Ebert, Half Moon Bay, CA (US); Ravi Venugopalan, Santa Clara, CA (US); Scott Carlton Evans, Santa Clara, CA (US)

(73) Assignee: Sonics, Incorporated, Mountain View, CA (US)



(21) Appl. No.: 111118,044

(22) Filed: Apr. 29, 2005


US 2005/0198611 Al Sep. 8, 2005

Related U.S. Application Data

(63) Continuation-in-part of application No. 10/976,456, filed on Oct. 29, 2004, which is a continuation of application No. 10/293,734, filed on Nov. 12, 2002, now Pat. No. 6,816,814.

(51) Int. Cl. G06F 11130 (2006.01)

(52) U.S. Cl. ........................... 702/182; 702/57; 702/87 (58) Field of Classification Search ............ 702/57-59,

702/85, 117-123, 182, 189; 714/733, 738; 703/23, 25; 716/4, 5, 7, 12; 324/763, 765

See application file for complete search history.

111111 1111111111111111111111111111111111111111111111111111111111111 US007299155B2

(10) Patent No.: US 7,299,155 B2 (45) Date of Patent: *Nov. 20, 2007



5,801,956 A 6,701,474 B2 * 6,816,814 B2

2002/0091979 A1 2002/0161568 A1 2002/0171449 A1 2003/0067319 A1

9/1998 Kawamura et a!. 3/2004 Cooke et a!. ............... 714/724

1112004 Ebert et a!. 7/2002 Cooke et a!.

10/2002 Sample et a!. 1112002 Shimizu et a!. 4/2003 Cho

OTHER PUBLICATIONS

Thaker et a!., "Register-Transfer Level Fault Modeling and Test Evaluation Techniques for VLSI Circuits", ITC International Test Conference, 2000 IEEE, Paper 35.3, pp. 941-949.

(Continued)

Primary Examiner-Eiiseo Ramos-Feliciano Assistant Examiner-Elias Desta (74) Attorney, Agent, or Firm-Blakely Sokoloff Taylor & Zafman LLP

(57) ABSTRACT

The present invention includes a method and apparatus for decomposing and verifYing configurable hardware. In one embodiment, the method includes automatically decomposing a set of one or more units at a first level of a configurable hardware system design hierarchy into a set of two or more units of a lower level of the hardware system design hierarchy. The set of one or more units at a first level includes one or more units dynamically instantiated at design creation time as well as at least a first unit composed of a previously instantiated hardware system composed with two or more levels of units within the hardware system design hierarchy of the previously instantiated hardware system.


101 CONFIGURABLE

HARDWARE LIBRARY

105


UNIT

DESIGN[ AND

TEST

104


US 7,299,155 B2 Page 2

OTHER PUBLICATIONS

VSI Alliance reference, "An Overview ofVSIA" from http://www. vsi.org/aboutVSIA/index.htm, 2004, pp. 1 total. Lin eta!., "AFunctional Test Planning System for Validation ofDSP Circuits Modeled in VHDL", 1998 International Verilog, pp. 172-177. Evans et al., "Honey I Shrunk the SOC Verification Proablem", Sonics, Inc., SNUG San Jose, 2001, 2001 pp. 11 total.

ALDEC, "What is TCLITK Scripting?", Jan. 2002, ALDEC Sup

port, pp. 1-9.

PCT Notification of Transmittal oflnternational Preliminary Exami

nation Report for Int'l. Application No. PCT/US03/35336, Int'l

Filing Date Nov. 5, 2003, mailed May 31, 2005, 5 pgs.

* cited by examiner

U.S. Patent

105


UNIT

CONFIGURATION DATA 113

DESIGN[ AND

TEST

104

Nov. 20, 2007

101

SYSTEM

Sheet 1 of 7


LIBRARY

GENERATION MODULE 107

DECOMPOSITION MODULE 109

VERIFICATION MODULE 111


UNIT 1

FIG. 1

US 7,299,155 B2

UNITN


BEGIN

READ CONFIGURATION

DATA. 202

ANALYZE CONFIGURABLE

HARDWARE DATABASE.

204

CREATE SYSTEM. 206

DECOMPOSE THE SYSTEM AND VERIFY THE

SYSTEM COMPONENTS.

208

END.

Sheet 2 of 7 US 7,299,155 B2

200 .-------

FIG. 2

U.S. Patent

300~

Nov. 20, 2007 Sheet 3 of 7

DOES THE CONFIGURATION DATA CONFORM TO RULES

FOR SYNTAX AND SEMANTICS?

302

YES

DERIVE PARAMETERS.

304

CONFIGURE THE HDL

PREPROCESSING STATEMENTS

BASED ON THE PARAMETERS.

306

PREPROCESS HDL SOURCE CODE.

308

US 7,299,155 B2

PRODUCE HDL CODE FOR THE CONFIGURABLE

HARDWARE SYSTEM SPECIFIED

IN THE CONFIGURATION

DATA. 310

FIG. 3


410~ UNIT 1 \

UNITS

401~

:----------------t ----------------. . . . . . . l/405j 408

411 -------\ ! i ..... .::::1--"--+----l' UN IT 2 :. -. . . .

.----'-------L-----,v '- 407 . ' . ' . ' : ' : ' !_ __ - --- ------------------------ ---~

409_j

FIG. 4

UNIT4

SYSTEM 400


BEGIN

MAP SYSTEM CONFIGURATION EXECUTE THE

DATA ONTO A - SCRIPTS. SELECTED UNIT'S 512

PARAMETERS. 502

1 END.

FOR EACH CONNECTION TO OTHER UNITS ADD INTERFACE MODELS,

MONITORS, PROTOCOL CHECKERS, ETC.

504

GENERATE NEW CONFIGURATION

DATA FILE. 506

GENERATE DESIGN BASED ON THE

CONFIGURATION DATA FILE.

508

GENERATE TESTS AND SCRIPTS AND/

OR INPUTS TO ANALYSIS TOOL.

510

FIG. 5

U.S. Patent

413

Nov. 20, 2007 Sheet 6 of 7

401

UNIT 1

409~

-------'------· i MODEL i I I

I B I I I I I I I L------ -------1

TEST STIMULUS AND RESPONSE

CHECKING UNIT 415

FIG. 6

US 7,299,155 B2


SYSTEM 700

~ GRAPHICS

CONTROLLER 734

DISPLAY DEVICE 737

IDE DRIVE(S) 742 \

~

,___ __ __...

USB PORT(S) 744 .,___.,_.___ __ ~

KEYBOARD 751

MOUSE 752 I · I

1---------i~~J ~-----+! PARALLEL PORT(S) .---

753

SERIAL PORT(S) 754 f+-

Sheet 7 of 7

PROCESSOR(S) 702

~720

US 7,299,155 B2

CONFIGURATION HARDWARE

LIBRARY 101

1 'r--------~MEMORY732f-

INPUT/OUTPUT CONTROLLER HUB

(ICH) 740

GENERATION AND

VERIFICATION UNIT 103


UNIT 105

FLOPPY DISK DRIVE 755 ~--------------~

NETWORK INTERFACE 756

FIG. 7

US 7,299,155 B2 1

METHOD AND APPARATUS FOR DECOMPOSING AND VERIFYING


RELATED APPLICATIONS

The present patent application is a Continuation in part of prior application Ser. No. 10/976,456, filed Oct. 29, 2004, which was a Continuation of prior application Ser. No. 10/293,734, filed Nov. 12, 2002, entitled "A METHOD 1o AND AN APPARTUS FOR DECOMPOSING AND VERIFYING CONFIGURABLE HARDWARE," and issued on Nov. 9, 2004 as U.S. Pat. No. 6,816,814.

2 composed with two or more levels of units within the hardware system design hierarchy of the previously instantiated hardware system.


Some embodiments of the invention are illustrated by way of example and not limitation in the Figures of the accompanying drawings, in which like references indicate similar elements and in which:

FIG. 1 is a block diagram for decomposing and verifying configurable hardware according to one embodiment of the invention;

FIELD FIG. 2 is a block diagram illustrating a system for creating

15 and verifying configurable hardware according to one embodiment of the invention;

The present invention pertains to hardware verification. FIG. 3 is a flow diagram illustrating the creation of a system, according to embodiments of the invention; More particularly, the present invention relates to verifying

configurable hardware. FIG. 4 is a conceptual block diagram of a system design 20 according to embodiments of the invention;

BACKGROUND FIG. 5 is a flow diagram illustrating operations for decomposing and verifying a configurable hardware system according to embodiments of the invention; "Configurable hardware" or "parameterized hardware"

describes hardware systems that are customized automatically at design creation time by using specified values for a set of parameters or attributes. Such hardware may also support changes at run-time depending on parameter settings. Configurable hardware systems typically provide bet-

FIG. 6 is a block diagram illustrating a testbench for 25 verifying a unit according to embodiments of the invention;

and

ter performance than software running on a general-purpose computer system and greater flexibility than conventional

30

application specific integrated circuits (ASICs) without increasing circuit size and cost.

In conventional hardware systems, it is necessary to verify a system's functionality by testing the system and its 35 components. Typically, the complexity of verifying a system's functionality increases with the number of components that make up the system. Therefore, the conventional approach is to manually verify each unit individually and then to assemble the "known good units" into a system. If

40 hardware is hierarchically arranged, verification must be performed for each level in the hierarchy. If each individual unit has been verified before assembling the system, verifying system functionality can focus on potential problems with interactions between components rather than on each 45 component's capabilities.

Configurable hardware systems can be verified using this type of conventional hierarchical decomposition. However, because each instance of a configurable hardware system is different, each time a configuration parameter is modified, 50 the system and its components must be manually verified. The cost of repeatedly manually verifying a system and its components often offsets the advantages of configurable hardware.

FIG. 7 illustrates an exemplary system for decomposing and verifYing configurable hardware, according to embodiments of the invention.


In general, a method and apparatus for decomposing and verifying configurable hardware are described. Note, separate references to "one embodiment" in this description do not necessarily refer to the same embodiment; however, neither are such embodiments mutually exclusive, unless so stated and except as will be readily apparent to those skilled in the art. Further specific numeric references such as first memory, may be made. In general, the specific numeric reference should not be interpreted as a literal sequential order but rather interpreted that the first unit is different than a second unit. Thus, the specific details and implementations set forth are merely exemplary. Likewise, the present invention can include any variety of combinations and/or integrations of the example but not inclusive embodiments described herein.

SUMMARY

The present invention includes a method and apparatus for decomposing and verifYing configurable hardware. In one embodiment, the method includes automatically decomposing a set of one or more units at a first level of a configurable hardware system design hierarchy into a set of two or more units of a lower level of the hardware system design hierarchy, wherein the set of one or more units at a first level includes one or more units dynamically instanti- 65

ated at design creation time as well as at least a first unit composed of a previously instantiated hardware system

Herein, block diagrams illustrate exemplary embodiments of the invention. Also herein, flow diagrams illustrate operations of the exemplary embodiments of the invention. The operations of the flow diagrams will be described with reference to the exemplary embodiments shown in the block diagrams. However, it should be understood that the operations of the flow diagrams could be performed by embodi-

55 ments of the invention other than those discussed with reference to the block diagrams, and embodiments discussed with references to the block diagrams could perform operations different than those discussed with reference to the flow diagrams.

60

Overview In one embodiment of the invention, a generation and

verification unit generates a configurable hardware system based on configuration data and a configurable hardware library. The configurable hardware system is made up of a number of units. In one embodiment of the invention, the generation and verification unit hierarchically decomposes a

US 7,299,155 B2 3

configurable hardware system into units that make up the system design. Configuration data is applied to each unit so that it can be removed and verified or analyzed outside of the system without changing the unit itself. The generation and verification unit creates a testbench, tests, and controlling scripts for each unit.

Likewise, the generation and verification unit may automatically decompose a set of one or more units at a top level of a configurable hardware system design hierarchy into a set of two or more units of a lower level of the hardware system design hierarchy. The set of one or more units at a first lower level may include one or more units dynamically instantiated at design creation time. The set of one or more units at the first lower level may also include at least a first unit composed of a previously instantiated hardware system composed with two or more levels of units within the hardware system design hierarchy of the previously instantiated hardware system. Thus, the previous instantiated hardware system may be a completely functional system with all of its models and verification tests already generated. The previous instantiated hardware system may be being integrated into a new configurable hardware system as one part of that new system.

The generation and verification unit may individually verify units of each hierarchy level of the new configurable hardware system design hierarchy successively from the lower levels to the first level with testbenches. The testbenches being the units under test and the corresponding models. However beneficially, the sequences of test inputs for the previously instantiated hardware system may be reused when testing that previously instantiated hardware system in the new configurable hardware system.

Exemplary Architecture

FIG. 1 is a block diagram for decomposing and verifying configurable hardware according to one embodiment of the invention. FIG. 1 includes a generation and verification unit 103, which further includes a generation module 107, decomposition module 109, and verification module 111. The generation and verification unit 103 is connected to a configurable hardware library 101, and a configuration data storage unit 105. The configuration data storage unit 105 includes configuration data 113. The generation and verification unit 103 generates a configurable hardware system 104 and its constituent units (illustrated as units 1-N). One or more of the units stored in the configurable hardware library 101 may be a previously instantiated entire hardware system.

4 The configuration data storage unit 105 includes configu

ration data 113, which hierarchically describes a configurable hardware system. For example, the configuration data 113 specifies the system and unit parameters at all relevant hierarchy levels. While the end user sets most parameters in the configuration data 113, the generation and verification unit 103 sets some parameters during the hardware integration and/or decomposition process. The configuration data 113 for specific units may be supplied by a user through a

10 text file, through input supplied by a user through a graphic user interface, or through a random configuration data generator. The configuration data 113 may be represented by any suitable electronic design automation scripting language, according to embodiments of the invention. In one

15 embodiment of the invention, the configuration data 113 is represented in the tool control language (TCL) scripting language. In particular, the configuration data 113 may include a TCL text file defining a system design name, system-level parameters, unit-level names and parameters,

20 unit-level connection parameters (e.g., number of wires in a signal bundle, handshaking protocols, pipelining behavior, etc.), and interface statements for binding unit instances to particular connections. In an alternative embodiment of the invention, this system information could be represented in

25 the extensible markup language (XML) format or in a relational database.

Because multiple instances of any particular hardware unit can be included in a hardware system, each unit instance is uniquely named in the configuration data 113. Moreover,

30 different instances of the same unit can be configured differently. For example, one instance of a FIFO may be configured to have a depth of 10 bytes, while another instance of a FIFO may be configured to have a depth of 100

35 bytes.

The configurable hardware library 101 describes all possible configurations of the system's hardware components. For example, the configuration hardware library 101 may describe all possible configurations of a FIFO, including its

40 depth, width, and other configurable parameters. In one embodiment of the invention, the configurable hardware library includes hardware description language (HDL) code (e.g. Verilog or VHDL) embedded with preprocessing statements that describe how to interpret the configuration data

45 113. The configurable hardware library 101 contains the code

for an instance of each potential unit. Thus, the Hardware Description Language Code describing a particular unit's design is exactly the same as that used in the system (top)

50 level. The code for an instance of that unit is obtained from the configurable hardware library 101 rather than having an existing unit regenerate itself. During verification merely, the testbench components remain to be generated for each unit. In this way the actual system generated from code in

In a configurable hardware system design hierarchy, the term "system" refers to the composition of units at a particular hierarchy level, where details of the units are hidden. Therefore, at a particular level in a configurable hardware system design hierarchy, units are indivisible components. However, at lower hierarchy levels, the units from a higher level have their details and internal components exposed. For example, referring to FIG. 1, at one design hierarchy level, the system 104 is viewed as a "black box" unit of a larger system, where the details about units 1-N are concealed. However, at a lower design hierarchy 60

level, the system 104 is viewed as including units 1-N, where the unit connection details are exposed. At even lower levels of the configurable hardware system design hierarchy, the internal details of units 1-N are exposed. At the lowest hierarchy level, a unit cannot be decomposed. The genera- 65

tion of system 104 and units 1-N will be described in more detail below in FIG. 4.

55 the library will have been tested and verified rather than instances of similar units being tested and verified. In an embodiment, the code for units may come from the configurable hardware library 101 and/or by having an existing unit regenerate itself.

Overall, the generation and verification unit 111 may automatically generate, decompose, and verify a configurable hardware system. A first instance of the configurable hardware system may be different in function than a second instance of the configurable hardware system. The generation and verification unit 111 may create code for each unique instance of a unit from configurable parameters based on configuration provided at design creation time.

US 7,299,155 B2 5

FIG. 2 is a flow diagram illustrating the creation, decomposition, and verification of a configurable hardware system, according to embodiments of the invention. The operations of the flow diagram 200 will be described with reference to the block diagram of FIG. 1. At process block 202, the configuration data is read. For example, according to the embodiment of the invention illustrated in FIG. 1, the generation module 107 of the generation and verification unit 103 reads the configuration data 113 from the configuration data storage unit 105. As noted above, the configuration data 113 may be a TCL file that hierarchically defines a configurable hardware system. Control continues at block 204.

At block 204, the configurable hardware library is analyzed. For example, the generation module 107 analyzes the configurable hardware library 101 to determine the possible configurations of the hardware components necessary for generating the hardware system defined by the configuration data 113. Control continues at block 206.

As shown in block 206, a configurable hardware system is created. For example, the generation module 107 creates a configurable hardware system based on the configuration data 113 and the configurable hardware library 101. The operation of block 206 is further described below with reference to FIG. 3. Control continues at block 208.

At block 208, the system is decomposed and the system and its components are verified. For example, the decomposition module 109 and the verification module 111 decompose and verify the system components. The operation in block 208 will be described in more detail below in the discussion of FIG. 5.

It should be evident to one of ordinary skill in the art that the operations described in the flow diagram 200 could be repeated for generating and verifYing hardware at any level in the hierarchical system design. For example, to verifY a system at a particular hierarchy level, all of the system's components must be verified. This may require verifying lower level systems, which may in tum require verifying even lower level systems. Once the lowest level system is verified, the higher level systems may in turn be verified. Hence, the operations set forth in the flow diagram 200 can be repeated for creating and verifYing systems and/or components at any design hierarchy level.

Thus, the generation and verification unit 111 may automatically decompose a set of one or more units at a top level of a configurable hardware system design hierarchy into a set of two or more units of a lower level of the hardware system design hierarchy. The set of one or more units, such

6 that previously instantiated hardware system in the new configurable hardware system.

The generation and verification unit 111 can integrate that existing design into a new system with additional units. The generation and verification unit 111 treats the existing configurable hardware system design as a single unit in the new system. The generation and verification unit 111 may reuse the sequences of test inputs previously constructed for the existing design when testing that single unit in the new

10 system. FIG. 3 is a flow diagram illustrating the creation of a

system, according to embodiments of the invention. The operations of the flow diagram 300 will be described with reference to the exemplary embodiment illustrated in FIG. 1.

15 At decision block 302, it is determined whether the configuration data conforms to rules for syntax and semantics. For example, the verification module 111 determines whether configuration data 113 from the configuration data storage unit 105 conforms to rules for syntax and semantics.

20 As a more specific example, in an embodiment where the configuration data 113 is represented by a TCL text file, the verification module 111 verification module 111 determines whether the TCL file conforms to the syntax and semantics rules of the HDL used by the configurable hardware library

25 101. In one embodiment, the verification module 111 verification module 111 employs a high-levellanguage program (e.g., a C++, Python, or Java program) to analyze a TCL file for syntax and semantics. If the configuration data file conforms to the syntax and semantics rules, control contin-

30 ues at block 304. Otherwise, the flow ends with an error report. Control continues at block 304. Note, the verification module 111 may stipulate configuration data for a first unit to verifY a specific set of parameters to cause a legal result for that set of parameters. The verification module 111 may

35 stipulate configuration data for the first unit to verifY a specific set of parameters intentionally causes a rule violation for that set of parameters prior to the first unit being part of a composed system or when the first unit is part of the composed system. Thus, the configuration data may be

40 checked for errors at one or more points in the process. At block 304, a set of parameters is derived for that unit.

For example, the verification module 111 derives system parameters from the configuration data 113. As a more specific example, in one embodiment, the verification mod-

45 ule 111 derives the system's parameters by analyzing a TCL file, which defines a configurable hardware system. For example, a system parameter may specify the minimum bandwidth required for an internal communications path. From this setting, parameters for specifYing the number of as Unit 1-Unit N, at a first lower level may include one or

more units dynamically instantiated at design creation time. The set of one or more units, such as Unit 1-Unit N, at the first lower level may also include at least a first unit composed of a previously instantiated hardware system composed with two or more levels of units within the hardware system design hierarchy of the previously instan- 55

tiated hardware system. Thus, the previous instantiated hardware system may be a completely functional system with all of its models and verification tests already generated. The previous instantiated hardware system may be being integrated into a new configurable hardware system as 60

one part of that new system.

50 wires used at various connection points in the system are derived according to the rules in the configuration data. Control continues at block 306.

As shown in block 306, the preprocessing statements are configured in code of a programming language based on the derived parameters. For example, in one embodiment of the invention, the verification module 111 configures HDL code preprocessing statements (stored in the configurable hardware library 101) that are affected by the specified and derived parameters. In doing this, the verification module 111 may impart particular values or control structures to preprocessor statements embedded in the HDL code. Con-

The generation and verification unit 111 may individually verify units of each hierarchy level of the new configurable hardware system design hierarchy successively from the lower levels to the first level with testbenches. However 65

beneficially, the sequences of test inputs for the previously instantiated hardware system may be reused when testing

trol continues at block 308. As shown in block 308, the HDL source code is prepro

cessed. For example, the verification module 111 preprocesses the HDL source code that was configured according to the derived parameters. In one embodiment of the inven-tion, the verification module 111 includes a macro language

US 7,299,155 B2 7

preprocessor (e.g., a C preprocessor, an M4 preprocessor, or 8

maps the configuration data 113 defining system 400 onto the parameters of a unit of system 400 (e.g., unit 1). As a more specific example, the decomposition module 109 analyzes the configuration data 113 to determine how unit l's parameters should be configured to meet the requirements of system 400. Control continues at block 504.

At block 504, for each connection to other units, interface models, monitors, and/or protocol checkers are added to the unit. For example, the verification module 111 analyzes the

a SIMPLE preprocessor) for preprocessing the embedded HDL source code. Control continues at block 310. At block 310, the HDL code for the configurable hardware system specified in the configuration data is generated. For example, the verification module 111 generates the HDL code for the system specified in the configuration data 113 using HDL code from the configurable hardware library 101. The configured code in the programming language is processed to emit a hardware description of the unique instance of the unit. From block 310, control ends.

FIG. 4 is a conceptual block diagram of a system design according to embodiments of the invention. As described above, according to an embodiment of the invention, the operations of FIG. 3 produce a system design represented in HDL code. FIG. 4 provides a graphical representation of such a system. FIG. 4 includes system 400, which includes unit 1, unit 2, unit 3, and unit 4. In system 400, unit 1 communicates to systems outside of the system 400 over a communication path 410. Unit 1 is coupled to unit 2 and unit 3 with communication paths 401 and 403, respectively. Unit 2 communicates with systems outside of system 400 through communication path 411. Unit 3 is coupled to unit 4 with communication paths 405 and 408. Unit 3 is also coupled to unit 2 with a communication path 409.

10 configuration data 113 to determine the connections for the selected unit (e.g., unit 1). For each connection (e.g., communication path), the decomposition module 109 couples a model to the unit, which may include an interface driver model, an interface monitor, and/or a protocol checker. This

15 operation is conceptually illustrated in FIG. 6. In FIG. 6, model A is connected to unit 1 through communication path 403, while model B is connected to unit 1 through communication path 401. Model C is connected to unit 1 through communication path 410. The test stimulus and response

20 checking unit 415 is connected to models A, B, and C. The test stimulus and response checking unit 415 monitors and facilitates testing operations. In the testbench, models are used for sending and receiving information to the unit being verified. For example, models A and B will receive streams

Unit 2, illustrated with broken lines, is an optional unit in the system 400. Connection paths 401 and 409 are also optional. For a given level of the system design hierarchy, a unit (or connection path) is optional when it is unknown whether factors external to the system will require the optional unit's functionality (e.g., a system at a higher level in the design hierarchy). For example, if system 400 could

25 of data from unit 1 according to unit l's parameters (e.g., according to the particular communication protocol defined for the particular communication path). Similarly, model C will transmit data to unit 1 according to unit l's parameters. The particular data to be transmitted to and from the unit will

30 be determined by the tests used for verifying the unit. These tests will be discussed in more detail below. From block 504, control continues at block 506.

At block 506, the configuration data is generated. For example, decomposition module 109 generates configura-

be configured to operate in two different modes, unit 2 would be optional if it's functionality were required by the first mode, but not by the second mode. 35 tion data 113 specifYing the selected unit's parameters.

Accordingly, the topology of units in the first instance of the configurable hardware system can be modified at design creation time. The topology of units in a system includes, for example, the number of units and which other units connect

According to one embodiment of the invention, the decomposition module 109 generates configuration data 113 in the form of a TCL file, as described above in the discussion of FIG. 1. According to an alternative embodiment, the decom-

40 position module 109 generates configuration data 113 in the form of an XML file.

to a particular unit. Such modification allows the topology of units in the first instance of the configurable hardware system to be different from the topology of units in the second instance of the configurable hardware system. Thus, the arrangement of units in a system including number of units, the size of the units, which other units connect to a 45

particular unit, etc. can be modified at the design creation time.

The design creation time may be the design phase of an electronic system, such as a System on a Chip, which occurs before the fabrication of the electronic system. An electronic 50

system design is typically functionally verified prior to the actual fabrication of the electronic system containing the design. Generally, as will be discussed in more detail later, in computer aided electronic system creation there exist two major stages: front-end processing and back-end processing. 55

FIGS. 5 and 6 illustrate how system 400 is decomposed and verified according to embodiments of the invention. FIG. 5 is a flow diagram illustrating operations for decomposing and verifYing a configurable hardware system according to embodiments of the invention. FIG. 6 is a block 60

diagram illustrating a testbench for verifying a unit according to embodiments of the invention. FIGS. 5 and 6 will be described with reference to the exemplary system of FIG. 4 and the exemplary embodiment of FIG. 1.

Referring to the flow diagram of FIG. 5, at block 502, the 65

configuration data 113 is mapped onto a selected unit's parameters. For example, the decomposition module 109

At block 508, a design based on the configuration data 113 is generated. For example, the generation module 107 uses the configurable hardware library 101 to generate a configurable hardware system design based on the configuration data 113. This operation is described in more detail above, in the discussion of FIG. 3. In one embodiment, the design is represented by HDL code. Control continues at block 510.

At block 510, tests and scripts and/or inputs to an analysis tool are generated. For example, the verification module 111 generates tests and scripts for rum1ing the tests and/ or inputs to analysis tools. From block 510, control continues at block 512. In generating the tests, the verification module 111 may use pre-existing tests that are known to verify the functionality of a particular unit or it may generate customized tests based on an analysis of the unit configuration. These tests will exercise and verifY the functionality of the configured unit being tested. According to an embodiment of the invention, the verification module 111 generates tests that are capable of verifying any configuration of the unit. In this embodiment, the tests read the configuration data 113 and modifY their stimulus accordingly while the test is running, rather than before testing begins. Accordingly, the sequences of test inputs for the one or more units dynamically instantiated at design creation time may be dynamically built at run time. The test generation may occur with run-time adaptation. Components and test sequences may be gener-

US 7,299,155 B2 9

ated at run time rather than at compile time. Thus one program file determines what type and number of units are being tested and then another program generates the test sequences appropriate for those units. The verification module 111 can also generate scripts for automatically performing the tests.

As noted above, the verification module 111 may use pre-existing tests or it may generate customized tests based on an analysis of the unit configuration. Thus, the verification module 111 during verification can use an existing 10

collection of test components including the unit and its associated models while replacing a previously generated set of test input sequences with an entirely new set of test input sequences. The verification module 111 may reuse old test components with new sequences of inputs other then the 15

original series of test input sequences to the circuit formed from those test components.

10 memories, and/or machine readable media whereon instructions are stored for performing operations according to embodiments of the invention. Machine-readable media includes any mechanism that provides (i.e., stores and/or transmits) information in a form readable by a machine (e.g., a computer). For example, a machine-readable medium includes read only memory (ROM); random access memory (RAM); magnetic disk storage media; optical storage media; flash memory devices; electrical, optical, acoustical or other form of propagated signals (e.g., carrier waves, infrared signals, digital signals, etc.); etc. In one embodiment, the units shown in FIG. 1 are machine readable media executing on a processor to carryout the operations described herein. However, in alternative embodiments, the units of FIG. 1 are other types of logic (e.g., digital logic) for executing the operations described herein. Alternatively, according to one embodiment of the invention, the generation and verification unit 103, configurable hardware library 101, and configuration data storage unit 105 can include one or more separate

Tests can be customized or replaced after test benches are generated. In other words, the tests may be manipulated independently after the testbench is constructed. 20 computer systems. It should also be understood that, accord

ing to embodiments of the invention, the components illustrated in FIG. 1 could be distributed over a number of networked computers, wherein they could be remotely con-

As an additional or alternative form of testing, the verification module 111 provides the design to an analysis tool, which performs a static analysis of the design. For example, according to one embodiment of the invention, the verification module 111 provides the unit design represented by 25

HDL code to a static verification tool that analyzes the HDL code for errors. In one embodiment, the static verification tool generates warnings and or error messages based on its analysis of the HDL code.

Thus, the generation and verification unit 103 may auto- 30

matically decompose the configurable hardware system into the set of one or more units. The verification module 111 may create inputs into the static verification analysis tool that does not require an executing of a stimulus through the

trolled and operated. FIG. 7 illustrates an exemplary system for decomposing

and verifYing configurable hardware, according to embodiments of the invention. As illustrated in FIG. 7, computer system 700 comprises processor(s) 702. Computer system 700 also includes a memory 732, processor bus 710 and input/output controller hub (ICH) 740. The processor(s) 702, memory 732 and ICH 740 are coupled to the processor bus 710. The processor(s) 702 may comprise any suitable processor architecture. For other embodiments of the invention, computer system 700 may comprise one, two, three, or more

set of one or more units to verifY the results of the static test. The static verification analysis may be performed as part of functional logic verification in a phase of electronic system and circuit design, which includes System on Chip design,

35 processors, any of which may execute a set of instructions that are in accordance with embodiments of the present invention.

The memory 732 stores data and/or instructions, and may comprise any suitable memory, such as a dynamic random access memory (DRAM), for example. In one embodiment of the invention, the configuration hardware library 101, generation and verification unit 103, and configuration data storage unit 105 are stored in memory 732. However, they may be stored in any or all IDE drive(s) 742, memory 732,

to ensure that a configured logic design correctly implements the product specification for that logic design. The 40

configurable hardware system may be customized at design creation time by using specified values for a set of parameters. Each instance of the configurable hardware system may be different in function than another instance of the configurable hardware system.

An embodiment of an IP analysis tool can run other analysis tools such as code coverage (particularly merging results from bottom up), lint, formal verification, power analysis, gate-level simulation (as opposed to RTL simulation). The static verification analysis tool may be the lint test 50

that performs rule checking, the formal verification test that performs a mathematical proof, or similar tool that does not require an executing of a stimulus, such as test input sequences, through the hardware model to verifY the results

45 and/or other suitable storage devices. A graphics controller 734 controls the display of information on a display device 737, according to embodiments of the invention.

of the static test. As shown in block 512, the scripts are executed. For

example, the verification module 111 executes the scripts, which automatically test and verify the selected unit.

It should be apparent to one of ordinary skill in the art that the operations described in the flow diagram of FIG. 5 can be repeated to verify any unit/system at any level in a design hierarchy.

Referring to FIG. 1, the generation and verification unit 103, configurable hardware library 101, and configuration data storage unit 105 may be implemented in the form of a conventional computing platform, including one or more processors, application specific integrated circuits (ASICs ),

The input/output controller hub (ICH) 740 provides an interface to I/0 devices or peripheral components for computer system 700. The ICH 740 may comprise any suitable interface controllers to provide for any suitable communi-cation link to the processor(s) 702, memory 732 and/or to any suitable device or component in communication with the ICH 740. For one embodiment of the invention, the ICH

55 740 provides suitable arbitration and buffering for each interface.

For one embodiment of the invention, the ICH 740 provides an interface to one or more suitable integrated drive electronics (IDE) drives 742, such as a hard disk drive

60 (HDD) or compact disc read only memory (CD ROM) drive for example, to store data and/or instructions for example, one or more suitable universal serial bus (USB) devices through one or more USB ports 744. For one embodiment of the invention, the ICH 740 also provides an interface to a

65 keyboard 751, a mouse 752, a floppy disk drive 755, one or more suitable devices through one or more parallel ports 753 (e.g., a printer), and one or more suitable devices through

US 7,299,155 B2 11

one or more serial ports 754. For one embodiment of the invention, the ICH 740 also provides a network interface 756 though which the computer system 700 can communicate with other computer and/or devices.

12 invention is not limited to the embodiments described. For example, units in a configurable hardware system may be expressed in a general programming language instead of a hardware description language. The configurable hardware code may be represented in the general-purpose programming language. Such configurable hardware code emits a hardware description when executed. Scripts written in the general purpose progrming language may be used to generate the code to create units in the configurable hard-

Accordingly, computer system 700 includes a machinereadable medium on which is stored a set of instructions (i.e., software) embodying any one, or all, of the methodologies described herein. For example, software can reside, completely or at least partially, within memory 732 and/or within processor(s) 702.

As discussed, the software used to facilitate aspects of an electronic circuit and system design process can be embodied onto a machine-readable medium. A machine-readable medium includes any mechanism that provides (e.g., stores and/or transmits) information in a form readable by a

10 ware system and to create tests to functionally verify those units. C++, Java, Python, natively or derivations of, may all be examples of general progrming languages. Similarly, SystemC may be used for hardware modeling and testing rather then a hardware description language. The functions

15 machine (e.g., a computer). The information representing the apparatuses and/or methods stored on the machinereadable medium may be used in the process of creating the apparatuses and/or methods described herein. For example, the information representing the apparatuses and/or methods may be contained in an Instance, soft instructions in an IP 20

generator, soft instructions in a testbench tool, or similar machine-readable medium storing this information.

The IP generator may be used for making highly configurable hardware systems. In an embodiment, an example intellectual property generator may comprise the following: 25

a graphic user interface; a common set of processing elements; and a library offiles containing design elements such as circuits, control logic, and cell arrays that define the intellectual property generator. In an embodiment, a designer chooses the specifics of the configurable hardware 30 system to produce a set of files defining the requested configurable hardware system instance. The configurable hardware system instance may include front-end views and back-end files.

of one module may be combined with another module or spread out into two or more discrete modules. The method and apparatus of the invention can be practiced with modification and alteration within the spirit and scope of the appended claims. The description is thus to be regarded as illustrative instead of limiting on the invention.

We claim: 1. A method, comprising: automatically decomposing a set of one or more units at

a first level of a configurable hardware system design hierarchy into a set of two or more units of a lower level of the hardware system design hierarchy, wherein the set of one or more units at a first level includes one or more units dynamically instantiated at design creation time as well as at least a first unit composed of a previously instantiated hardware system composed with two or more levels of units within the hardware system design hierarchy of the previously instantiated hardware system; and

individually verifying units of each hierarchy level of the hardware system design hierarchy successively from the lower levels to the first level with testbenches, wherein the sequences of test inputs for the previously instantiated hardware system are reused when testing that previously instantiated hardware system in the configurable hardware system.

2. The method of claim 1, further comprising: using a general purpose programming language to create

units in the configurable hardware system and to create tests to functionally verifY those units.

Front-end processing consists of the design and architec-35

ture stages, which includes design of the SOC schematic. The front-end views support documentation, simulation, debugging, and testing. The front-end processing may include connecting models, configuration of the design, simulating and tuning during the architectural exploration. The design is simulated and tested. Front-end processing 40

traditionally includes simulation of the circuits within the SOC and verification that they work correctly. The integration of the electronic circuit design may include packing the cores, verifYing the cores, simulation and debugging. 3. The method of claim 1, wherein a first unit in a

45 configurable hardware system is expressed in a general programming language instead of a hardware description language.

Back-end processing traditionally includes progrming of the physical layout of the SOC such as placing and routing, or floor planning, of the circuit elements on the chip layout, as well as the routing of all interconnects between components. The back-end files, such as a layout, physical LEF, etc are for layout and fabrication. Thus, the floor plan 50 may be generated imported and edited. After this, the design may be outputted into a Netlist of one or more hardware design languages (HDL) such as Verilog, VHDL or Spice. A Netlist describes the connectivity of an electronic design such as the components included in the design, the attributes

4. The method of claim 1, further comprising: dynamically building at run time, sequences of test inputs

for the one or more units dynamically instantiated at design creation time.

5. An apparatus generated by the method of claim 1. 6. A machine readable medium that contains instructions,

which when executed by the machine to cause the machine 55 to perform the operations of claim 1.

of each component and the interconnectivity amongst the components. After the Netlist is generated synthesizing may occur. Accordingly, back-end processing further comprises the physical verification of the layout to verify that it is physically manufacturable and the resulting SOC will not have any function-preventing physical defects. If there are 60

defects, the placement of circuit elements and interconnect routing is revisited, which requires one or more revisions to the Netlist. Such a process can lead to increased design time, since the physical placement of the components happens much later in the design stages.

While the invention has been described in terms of several embodiments, those skilled in the art will recognize that the

65

7. The method of claim 1, further comprising: using a SystemC programming language to create units in

the configurable hardware system and to create tests to functionally verify those units.

8. The method of claim 1, further comprising: using a Hardware Description Language to create units in

the configurable hardware system and to create tests to functionally verify those units.

9. An apparatus, comprising: a generation and verification unit to automatically gener

ate, decompose, and verifY a configurable hardware system; wherein a first instance of the configurable

US 7,299,155 B2 13

hardware system is different in function than a second instance of the configurable hardware system, and the generation and verification unit to create code for a unique instance of a unit from configurable parameters based on configuration data provided at design creation time.

10. The apparatus of claim 9, wherein the generation and verification unit creates code for the unique instance of the unit from configurable parameters based on configuration data using a verification module that determines:

whether the configuration data conforms to rules for syntax and semantics;

what a set of derived parameters should be for that unit; and

10

processes configured code in a programming language to 15

emit a hardware description of the unique instance of the unit.

11. The apparatus of claim 9, wherein the generation and verification unit creates code for the unique instance of the unit from configurable parameters based on configuration 20

data using the verification module that determines: whether the configuration data conforms to rules for

syntax and semantics; what a set of derived parameters should be for that unit;

and configures code in a programming language with prepro

cessing statements based on the set of derived parameters.

25

12. The apparatus of claim 9, wherein the generation and verification unit has a generation module to determine at 30

design creation time a topology of units in a first instance of the configurable hardware system, and the topology of units in the first instance of the configurable hardware system is different from the topology of units in a second instance of the configurable hardware system.

13. The apparatus of claim 9, wherein the generation and verification unit includes

a hardware library to store a previously instantiated collection of test components including a first unit and its associated models; and

a verification module to replace a first set of test input sequences with a second set of test input sequences to execute on the test components.

35

40

14 15. The apparatus of claim 9, wherein the configurable

hardware system is for a System on a Chip.

16. The apparatus of claim 9, wherein the generation and verification unit has a verification module to obtain configuration data for a first unit to verifY if a specific set of parameters causes a legal result or a rule violation for the specific set of parameters prior to the first unit being part of a composed system.

17. The apparatus of claim 9, wherein the generation and verification unit has a generation module to receive configuration data for a first unit supplied by a user through a text file, supplied by a user through a graphic user interface, or supplied through a random configuration data generator.

18. The apparatus of claim 9, further comprising

a library to store code in a programming language for an instance of a unit, wherein the generation and verification unit to obtain the code for a first instance of the unit from the library.


automatically decomposing a configurable hardware system into a set of one or more units;

creating inputs into a static verification analysis tool that does not require an executing of a stimulus through the set of one or more units to verifY the results of the static test; and

performing the static verification analysis as part of functional logic verification in a phase of electronic system and circuit design to ensure that a configured logic design correctly implements a product specification for that logic design, wherein the configurable hardware system is customized at design creation time by using specified values for a set of parameters and a first instance of the configurable hardware system is different in function than a second instance of the configurable hardware system.

20. The method of claim 19, wherein the electronic system and circuit design is for a System on a Chip.

21. An apparatus generated by the method of claim 19.

22. A machine readable medium that contains instruc-14. The apparatus of claim 9, wherein the generation and verification unit has a decomposition module to determine at run time what type and number of units are being tested to allow test sequences appropriate for those units to be generated.

45 tions, which when executed by the machine to cause the machine to perform the operations of claim 19.

* * * * *

sonics, inc. - complaint for patent infringement

Documents