introduction of design-oriented profiler of specc language · existing profiling tools are...

Introduction of design-oriented profiler ofSpecC language

Technical Report ICS– 00-47June 2001

Lukai Cai, Dan GajskiInformation and Computer Science

University of California, IrvineIrvine, CA 92697

(949)824-8059{lcai, gajski}ics.uci.edu

II

Index

1 Introduction ................................................................................................................................................12 SpecC profiler .............................................................................................................................................23 Input and output models of SpecC profiler ...................................................................................................44 Behavior profiler .........................................................................................................................................4

4.1 Design flow of the behavior profiler .....................................................................................................44.2 Behavior statistics description ..............................................................................................................5

4.2.1 Total execution number of behavior ...............................................................................................54.2.2 Average execution number of operations. .......................................................................................64.2.3 Traffic...........................................................................................................................................64.2.4 Storage..........................................................................................................................................7

4.3 Behavior dependency analysis ..............................................................................................................84.3.1 Calling dependency analysis. .........................................................................................................84.3.2 Data dependency analysis. .............................................................................................................9

4.4 Two algorithms in behavior profiler ......................................................................................................94.4.2 Algorithm of analyzing port access ..............................................................................................10

5 Retargetable profiler .................................................................................................................................125.1 Design flow of the retargetable profiler ...............................................................................................125.2 Weight table generation ......................................................................................................................135.3 Output statistics of Retargetable profiler .............................................................................................13

5.3.1 Design Performance ....................................................................................................................135.3.2 Traffic.........................................................................................................................................145.3.3 Memories ....................................................................................................................................15

6 A design methodology of Using SpecC profiler ..........................................................................................166.1 Design assumption..............................................................................................................................166.2 Specification analysis .........................................................................................................................166.3 Critical path analysis ..........................................................................................................................176.4 PE selection .......................................................................................................................................176.5 Behavior partitioning..........................................................................................................................186.6 Behavior scheduling ...........................................................................................................................19

7 Conclusion ................................................................................................................................................20Reference:........................................................................................................................................................20

III

List of FiguresFigure 1: Design flow of SpecC methodology of refining from specification model to architecture model……….2Figure 2: Display of behavior statistics…………………………………………………………………………………4Figure 3: Design flow of behavior profiler ..........................................................................................................5Figure 4: Example of behavior instance nodes .....................................................................................................5Figure 5: Example of specification model ...........................................................................................................7Figure 6: Example of use of behavior dependency ...............................................................................................8Figure 7: Example for algorithm of port access. ................................................................................................10Figure 8: Design flow of analyzing port access..................................................................................................11Figure 9: Design flow of retargetable profiler ...................................................................................................13Figure 10: Design flow of the simple methodology for architecture exploration .................................................16Figure 11: Specification display........................................................................................................................17

1

Abstract

To design from higher level of abstraction andtomake architecture exploration decision at early stage,designer should know the characteristics ofspecification on the higher level of abstraction.Designers should also have a way to evaluate thesystem in early stage, to ensure that the system meetsthe constraint requirement. In this report, SpecCprofiler, a design-oriented profiler, is introduced tocomplete above two tasks, by evaluating thespecification model of SpecC language.

Index Terms-- behavior profiler, retargetable profiler,specification model, design-oriented, SpecC

1 IntroductionWith the requirement of time to market and theincrease of complexity of design, the design industryhas tried to make the design decisions in earlier stageof design flow. Using SpecC methodology [1][2], thedesign process will be smooth and efficient.

However, in the past, our design experiences onJPEG[5][6], GSM vocoder[7], and JBIG[8] systemshow the difficulty to get the satisfied profiling result,from existing profiling tools, for the specificationmodel of SpecC language. The reason is that theexisting profiling tools are code-oriented: thepurposes of existing profiling tools are to find thebottleneck of the executed algorithm, thus to optimizespecification of algorithms. The characteristics ofthese profiling tools are:

a) They are machine-limited: For example, DSP56600 instruction set simulator [9] only providescycle timing result of instruction for DSP 56000processor. Similarly, Hierarchical profilers ofCodewarrior[10] only analyzes performance forIntel Pentium/484/AMD K6/AMD K7 processors.

b) They can only analyze performance of the design.Traffic and needed size of memory of systemcomponents are not concerned.

c) They only consider sequential execution offunctions, without considering the parallelexecution.

The code_oriented profilers work well in case ofevaluating the specification executed on the processorthat the profiling tools support. However, becauseSoC design incorporates at least a programmableprocessor, on chip memory, and accelerating functionunits implemented in hardware [11], the need of SoCdesign is more complex than that code-orientedprofilers can generate. For example, functions ofspecification can be implemented in any selected PEs.If the system components (PEs: processing elements)are not Intel Pentium/484/AMD K6/AMD K7processors, Codewarrior can not be used.Furthermore, The traffic between different PEs alsoshould be evaluated. The traffic does not onlyinfluence the performance of system, but alsoinfluence the protocol selection. Finally, since PEscan be executed in parallel in SoC, the parallelexecution between different functions should beconsidered.

Since the existing profilers cannot help us toimplement SoC design, in the JBIG project, a manualapproach of profiling were implemented. It took 2months one person to generate acceptable profilingresult.

To fasten the design process, SpecC profiler, whichis a design-oriented profiler, is developed. Comparedwith the code-oriented profiler, the design-orientedprofiler provides needed profiling result for SoCdesign. Unlike code-oriented profiler, the purpose ofdesign-oriented profiler is to help designers toevaluate specification and design system thus to findgood architecture exploration solution. For example, itcan tell designers that which function should beimplemented in faster PE to improve the performanceof system. Also, it can identify that two functionsshould be implemented in same PE, to reduce theperformance overhead for communication. Thus, withthe help of SpecC profiler, designers can makearchitecture exploration decision more easily. The characteristics of SpecC profilers are.

a) SpecC profiler is retargetable profiler: it canevaluate the characteristics of behavior that isexecuting on any selected PEs. Moreover, it canprovide profiling result for the case that different

Introduction of design-oriented profiler of SpecClanguage

2

functions of specification are executed ondifferent PEs.

b) SpecC profiler not only evaluates performance,but also evaluate the traffic, needed memory size,for each function.

c) SpecC profiler evaluates parallel execution amongfunctions as well as sequential execution.

SpecC profiler belongs to a set of tools refining thespecification model into architecture model, of SpecCmethodology [2].

SpecC profiler works on the specification model ofSpecC language. The profiling results are based on thenumber of operations of specification, therefore it isnot real cycle-accurate. However, it is good enough togive designers a first look of system of design.

This report describes SpecC profiler. In section 2,the overview of SpecC profiler as well as the designflow of refining from specification model intoarchitecture model, are illustrated. The input andoutput models of the profiler are introduced in section3. Two main parts of SpecC profiler, behaviorprofiler, and retargetable profiler are described insection 4 and 5 respectively. In section 6, a simplemethodology and a example are given to teachdesigners how to use SpecC profiler. Finally, aconclusion is made in section 7.

2 SpecC profilerSpecC profiler works on specification model of

SpecC language [1]. The specification model is themodel with the highest level of abstraction. It is anaccurate model of the system in terms of purefunctionality but does not reflect its structure ortiming.

SpecC Profiler consists of two parts: behaviorprofiler and retargetable profiler. In Figure1, thesetwo parts are illustrated in the environment of SpecCmethodology, for refining the specification model intothe architecture model .

At the initial stage of design, as soon as originalspecification model is semantically and syntacticallycorrect and an executable testbench is selected,behavior profiler can be performed.

At the beginning, the first part of SpecC profiler,behavior profiler, inserts statements into the originalspecification model, to collect execution number ofbasic blocks of specification, at simulation-time. Aftersimulating with selected testbenches, behaviorprofiler analyzes the specification, based on theexecution numbers of basic blocks that are generatedduring simulation. Behavior profiler creates tworesults: behavior statistics, and behavior dependency

Figure 3: Design flow of SpecC methodology of refining from specification model toarchitecture model

G U I

O rg .S p e c if ic a tio n

m o d e l

A rc h ite c tu rem o d e l

P Elib ra ry

A rc h ite c tu rere f in in g to o ls

B e h a v io rp r o file r

R e ta r g e ta b lep r o file r

D e s ig n e r:

A rc h ite c tu re re f in in g to o ls

S p e c C p ro f ile r :

D e s ig nc o n s tra in t

E v a lu a tio n -a n n o ta te d

sp e c if ic a tio nm o d e l

P ro f il in g -a n n o ta te d

sp e c if ic a tio nm o d e l

D e c is io n -a n n o ta te d

sp e c i fic a tio nm o d e l

T e s tb e n ch e s

D e s ig n e r

3

(In SpecC language, Behavior is a class thatencapsulates related functions and connects to otherbehaviors by its ports). Behavior statistics consists ofthe statistics of behaviors, including execution numberof behaviors, average execution number of operationsper behavior execution, the size of needed memory ofbehavior, and the average traffic of behavior, perbehavior execution. Behavior dependency describesthe executing relation between behaviors, includingcalling/called relations and sequence/parallelexecution relations. Behavior statistics and behaviordependency are not related to system architecture,therefore, it is implementation-independent.

Behavior profiler annotates behavior statistics andbehavior dependency into original specificationmodel. Profiling-annotated specification model isproduced and sent to GUI. GUI is a graphical userinterface that helps designers to explicitly andautomatically read and write different specificationmodels [4].

With the help of behavior statistics and behaviordependency, designers should make the architectureexploration decision, based on their designexperience. Architecture exploration consists offollowing three items:

a) Allocation: Needed PEs are selected from PElibrary, for assembling the system.

b) Partitioning: Behaviors of specifiction are mappedto selected PEs.

c) Scheduling: Whether the execution relationsamong behaviors are parallel or sequential aredecided.

The architecture exploration decision will beannotated into profiled-annotated specification modelby GUI, thus decision-annotated specification modelis created.

With decision-annotated specification model asinput, the second part of the SpecC profiler,retargetable profiler, is used to re-profilespecification. In this stage, the re-profiling results areimplementation-dependent, based on designers’architecture exploration decision. The re-profilingresult includes performance of behaviors, size of

memory of each behavior, and the amount of trafficsand the traffic time between different behaviors. Theresult of retargetable-profiler will be annotated intodecision-annotated specification model. The newmodel, evaluation-annotation specification model, iscreated and displayed to designers by GUI.

After re-profiling, designers will evaluate whetherthe current design will meet the constraintrequirement, based on evaluation-annotationspecification. If the current implementation cannotmeet the requirement, a new architecture explorationdecision will be made and again annotated intoprofiled-annotated specification model. The processof designers’ decision making and re-profiling iscontinued until the constraint requirement of design issatisfied by system, as shown in the shaded part ofFigure1. As soon as the final architecture explorationdecision is made, evaluation-annotation specificationmodel will be sent to the architecture-refining tool.The architecture-refining tool refines the specificationmodel into the corresponding architecture modelautomatically.

In general, designers control the design process andmake the architecture exploration decision, based ontheir design experience. To help beginners to makecorrect architecture exploration decision, a simpledesign methodology of how to make architectureexploration decision with profiling result is describedin Section 6 in great detail.

Besides the characteristics of SpecC profilersdescribed in Section 1, SpecC profiler has two moreadvantages. First, it analyzes the behaviordependency, which provides designers more flexibilityof scheduling. This part will be illustrated in Section4. Second, since the behavior profiler and theretargetable profiler are separated, design simulationis only executed once during design process, bybehavior profiler. The retargetable profiler can beexecuted as many times as needed, based on the resultof behavior profiler, without specification simulation.Since the specification simulation and execution ofbehavior profiler are slow, while process of re-profiling is fast, this advantage makes more systemalternative can be explored during the architectureexploration, than the traditional methodology.

4

3 Input and output models ofSpecC profiler

As shown in Figure1, there are four specificationmodels: original specification model, profiling-annotated specification model, decision-annotatedspecification model, and evaluation-annotatedspecification model. All of the specification modelsare described in the format of the .SIR file. The .SIRfile is the internal representation of SpecC model. Itcan be transformed to and from .SC file by SpecCcompiler [3].

Among these models, original specification modeland decision-annotated specification model are theinput models of SpecC profiler. Profiling-annotatedmodel and evaluation-annotated specification modelare the output models of SpecC profiler. The onlydifferences among these models are annotations:Original specification model has no annotation;profiling-annotated specification model hasinformation of behavior dependency and behaviorstatistic; decision-annotated specification modelcontains architecture exploration decision; evaluation-

annotated specification model contains re-profilingstatistics.

As shown in Figure1, GUI should be developed tohelp designers to read and write specification modelsautomatically and explicitly. To make SpecC profileran independent tool, we also provide a way fordesigner to read and write model by simply readingand writing texture files. This part of work isdescribed in the profiler manual.

4 Behavior profiler

4.1 Design flow of the behavior profiler

Figure2 illustrates the design flow of the behaviorprofiler.

First of all, task “instrument for profiling”decomposes the original specification model intocombinatorial and basic blocks, and inserts statementsfor counting the execution number of blocks intooriginal specification model. After statementinsertion, instrumented specification model, which isan internal model, is developed, as shown in Figure 2.

Execution number ofbehavior instance nodes

(N_N)

Execution number ofoperations per

behavior execution(Op)

Communication trafficper behaviorexecution (T)

Storage for eachbehavior (S)

Total portwidth (T_S)

Total portaccess

(T_R, T_W)

Static storage per data type(S_Static_DataType)

Sub-Sub storage per datatype per behavior

(S_Sub_DataType_Beh)

Execution number ofoperations

per operation type(Op_OpType)

Execution number ofoperations per operation

type per data type(OP_OpType_DataType)

Execution number ofbehavior (N_B)

Execution number ofbehavior

instance(N_I)

Port width perdata type

(T_S_DataType)

Port access perdata type

(T_R_DataType,T_W_DataType)

Port width of port(T_S_Porti,j)

Port access ofport(T_R_Porti,j,,T_W_Porti,j )

Staticstorage (S_Static)

Sub-Sub storage persub_behavior (S_Sub_Beh)

Heap per data typestorage(S_heap_DataType)

Heap storage(S_heap)

Stock storage per data typeper called function

(S_Stack_DataType_Fun)

Stockstorage (S_Stack)

Stock storageper called function

(S_Stack_Fun)

Sub-Sub storage(S_Sub)

Figure 4: Display of behavior statistics

5

Org. specificationmodle

Instrument forprofiling

Instrumented–specification model

simulator

Instrumentedresult

Behaviors statisticsannotation

Profiling-annotatedspecification model

Behavior statistics analysis

Behaviorstatistics

Behaviordependency

Behavior dependency analysis

Testbenches

Figure 5: Design flow of behavior profiler

The instrumented specification model is thencompiled to a executable file by SpecC compiler.After the executable file is simulated with selectedtestbench, two output files are created: _BB_Counterfile contains the number of execution of basic blockswhile _Heap_Counter file contains the heap size offunctions. Based on _BB_Counter file and_Heap_Counter file, the task “behavior statisticsanalysis” and the task “behavior dependencyanalysis” analyze original specification model andproduce behavior statistics and behavior dependency.Finally, the task “behaviors statistics annotation”annotates the behavior statistics and behaviordependency into original specification model. Thisprocess will produce profiling-annotated specificationmodel.

4.2 Behavior statistics description

The behavior statistics represents the statistics ofeach behavior based on the simulation with theselected testbenches. The behavior statistics consistsof four types: execution numbers of behavior, averageexecution numbers of operations per behaviorexecution, average traffics per behavior execution,and needed behavior memory. These statistics arelisted in Figure 3 hierarchically.

4.2.1 Total execution number of behaviorThere are several execution numbers of behavior

entities should be counted, based on the need ofsystem evaluation.

First of all, the total execution number of behaviorsshould be counted. Furthermore, since each behaviorcan have a number of behavior instances and thebehavior instances can be mapped into different PEsduring architecture exploration, behavior profiler alsoprovides the execution number of behavior instances.During research, we found that even the executionnumber of behavior instances cannot provide enoughinformation for system evaluation. For example, inFigure 4(a), there are three behaviors: X, A, B.Behavior X contains behavior instances A1 and A2,which are both the instantiation of behavior A.Similarly, behavior A contains behaviors instances B1and B2, which are both the instantiation of behaviorB. We use a hierarchical calling tree, called behaviorcalling tree, to displayed this relation, as Figure 4 (b).The tree contains seven nodes: X, A1(X), A2(X),B1(X_A1), B2(X_A1), B1(X_A2), B2(X_A2). Forexample, B1(X_A1) represent the behavior instanceB1 of behavior instance A1 of behavior instance X.We call the node in behavior calling tree behaviorinstance node.

(b)(a)

A XA1

A2

B1

B2

X

A1 (X)

A2 (X)

B1 (X_A1)

B2 (X_A1)

B1 (X_A2)

B2 (X_A2)

PE0

PE1

PE3

PE4

(c)

Figure 6: Example of behavior instance nodes

In Figure4(c), four leaf behavior instance nodes aremapped to different PEs. Therefore, if we want toanalyze the system performance based on thispartitioning, the execution number of behaviorinstance nodes is needed.

Three types of execution numbers of behaviorentities are calculated as follows:

a) N_B: Total execution number of behavior:N_Bi refers to the execution number of behavior i.

N_Bi equals to the execution number of combinatorialblock that represents behavior i’s main function.

6

b) N_Ii,j: Average execution number of behaviorinstance:

If behavior instance j is declared in behavior i, itrefers to the average execution number of behaviorinstance j per execution of behavior i.

If N_Bi is not zero, assume that the basic blockwhich contains behavior instance j has executed Xtimes,

N_Ii,j = X / N_Bi ;

Otherwise N_Ii,j = 0;

c) N_Ni: Total execution number of behaviorinstance node:

N_Ni refers to the total execution number ofbehavior instance node i.

N_Ni can be calculated as: i. The top behavior instance node is Main

behavior of specification. N_N Main = 1; ii. If behavior instance node A ‘s parent is

behavior instance node B,

N_NA = N_N B * N_IB,A.

4.2.2 Average execution number of operations.Average execution number of operations per

behavior execution is another essential behaviorstatistic, which will be used for evaluatingperformance of design. In SpecC language, there are56 operation types and 29 data types. Therefore, theexecution number of operations can be calculatedhierarchically in three levels. If we use OpType torepresent a chosen operation type and use DataType torepresent a chosen data type, these three levels can bedefined as:

a) OP_OpType_DataType_i : it represents theaverage execution number of operations ofoperation type OpType for data type DataType,per behavior i's execution. Since the number ofoperations in each basic block per blockexecution is fixed, this number can be derivefrom original specification model. If we writethis number as BB_OpType_DataTypeK,, foroperation type OpType and data type DataType,in basic block k,

OP_OpType_DataType_i= ΣK(BB_OpType_DataTypek * (Execution number of Basic block K))

b) OP_OpType_i: it represents the averageexecution number of operations of operation typeOpType for all the data type, per behavior i'sexecution.

OP_OpType_i = ΣDataTypeOP_OpType_DataType_i.

c) Op_i: it represents the average execution numberof operations of all operation types for all datatypes, per behavior i's execution.

Op_i = ΣOpType OP_OpType_i.

About execution number of operations arecalculated without considering function calls. If the abehavior has behavior instances or function calls, theexecution numbers of operations of its behaviorinstance or called function should be added to theexecution number of operations of the behavior. Thealgorithm for this case is explained in 4.4.1.

4.2.3 TrafficThe traffic (T) represents the communication

throughput of behavior. SpecC profiler provides twotypes of traffic as follows:

4.2.3.1 Port width

The port width refers to the bit number of behaviorports. It consists of three levels:

a) T_S_Porti,j represents the bit number of port jof behavior i.

b) T_S_DataTypei: represents the bit number ofall the ports which have data type DataType,of behavior i.

c) T_Si represents the bit number of all the portsof behavior i.

T_Si = Σ DataType T_S_DataTypei = Σj T_S_Porti, j

Port width can be directly analyzed from originalspecification model.

4.2.3.2 Port Access.

Port access refers to the average access number ofeach behavior's port per behavior execution. The portaccess includes read access and write access. . Forexample, if x is behavior's port, executing "x = x +x"once creates two read access and one write access.Similar to port width, port access consists of threelevels:

7

a) T_R_Porti,j represents the average number ofread access for port j of behavior i.T_W_Porti,j represents the average number of

write access for port j of behavior i.b) T_R_DataTypei represents the read access of

all the ports that have data type DataType, ofbehavior i.T_W_DataTypei represents the write access

of all the ports which have data type DataType,of behavior i.

c) T_Ri represents the total read access of all theports of behavior i.T_Wi represents the total write access of all

the ports of behavior i.

The approach of getting port access will beillustrated in 4.4.2.

Besides communication through ports, behaviorsalso can communicate through it channels. Theconcept of channel is defined in [1]. Behavior usechannel in term of function calls. For example, ifbehavior reads from channel A and save it into localvariable b, behavior includes the statement b=A.receive(). On the other hand, if behavior writesvalue of variable b into the channel A, behaviorincludes the statement A.send(b). Behavior profiler calculates the argument of called channel functions asread access. If there are return data of channelfunctions, such as A.receive(), behavior profilercalculated them as write access.

4.2.4 StorageThere are four types of storage’s: static storage,

stack storage, heap storage, and sub_behavior storage.

4.2.4.1 Static storage

Static storage of behavior consists of variables outsideany behaviors, variables in behaviors but outside anyfunctions, variables in the main function of behaviors,and behavior's ports. When a behavior is executed,static storage must be allocated and the size of staticstorage will not changed during behavior execution.

Static storage contains two levels:

a) S_Static_DataTypei represents the total amountof the static storage of data type DataType ofbehavior i.

b) S_Statici represents the total amount of thestatic storage of all the data types of behavior i.

S_Statici = Σ DataType S_Static_DataTypei

In Figure 5, S_StaticParent =sizeof(G) +sizeof(A) +sizeof(B) + sizeof(C) +sizeof(*D).

Static storage can be directly analyzed fromoriginal specification model.

4.2.4.2 Heap storage

Heap storage refers to the storage that are allocatedby "malloc" statements and are freed by “free”statements.

In this project, heap storage contains two levels:

a) S_Heap_DataTypei represents the total amountof the heap storage of data type DataType ofbehavior i.

b) S_Heapi represents the total amount of the heapstorage of all the data types of behavior i.

S_Heapi = Σ DataType S_Heap_DataTypei In figure 5, S_HeapParent = sizeof(D).

Unlike static storage, heap storage is analyzedbased on simulation result _Heap_Counter file and_BB_Counter_file mentioned in 4.1

4.2.4.3 Stack storage

Stack storage is the first hierarchical storageconcerned. It refers to the storage of the calledfunctions of behaviors. The storage of functionconsists of function’s static and stack storage andfunction’s parameters. Since all the functions arecalled sequentially in behavior, only current calledfunction storage is needed at time. Therefore, we usethe maximum of storage of all the called functions ofbehavior as behaviors stack storage.

void F0(){ int X; X=0; } void F1(int L){ int E; F0(); E=L++;

} void F2(){

int X; X=0; }

void main(){ int C, *D; F1(C); F2(); Inst1.main(); Inst2.main(); D = (int*) malloc (5);

} };

int G; Behavior Sub_1( int K, int K2){ void main(){ int M; K = K2;

} }; Behavior Sub_2(int K3, int K4){ void main(){ k4 = k3+1;

} }; Behavior Parent(int A){ int B; Sub_1 Inst1(B, A); Sub_2 Inst2(A, B);

Sub_3 Inst2(A, B);

Figure 7: Example of specification model

8

Stack storage of behavior contains three levels:

(a) S_Stack_DataType_Funi,j represents the totalamount of storage of data type DataType of calledfunction j of behavior i.

(b) S_Stack_Funi,j represents the total amount of thestorage of called function j of behavior i.

S_Stack_Funi,j = Σ DataType S_Stack_DataType_Funci,j

(c) S_Stack represents the largest S_Stack_Funi of allthe called functions, of behavior i.

S_Stacki = Maxj(S_Stack_Funi,j)

In Figure 5, S_Stackparent= Max ( S_Stack_Fun parent, F1 , S_Stack_Funparent, F2 )= S_Stack_Fun parent, F1= sizeof(E) + sizeof(L) + S_Stack_Fun parent,F0= sizeof(E) + sizeof(L) + sizeof(X).

Stack storage can be achieved by hierarchicallyanalyzing original specification model.

4.2.4.4 Sub_behavior storage

Sub_behavior storage is the second hierarchicalstorage concerned. Unlike stack storage, two behaviorinstances can be executed in parallel. Thussub_behavior is defined as the sum of the storage ofits sub_behavior instances. The storage ofsub_behavior instance consists of instance’s static,stack, heap, and sub_behavior storage.

In this project, Sub_behavior storage contains threelevels:

a) S_Sub_DataType_Behi,j represents the totalamount of the sub-behavior storage of data typeDataType of sub_behavior instantiation j ofbehavior i.

b) S_Sub_Behi represents the total amount of thesub-behavior storage of sub_behaviorinstantiation j of behavior i.

S_Sub_Behi,j =Σ DataType S_Sub_DataType_Behi,j

c) S_Subi represents the total amount of the sub-behavior storage of all the data of behavior i.

S_Subi = Maxj(S_Sub_Behi,j)

In Figure 5,S_SubParent = S_Sub_BehSub_1 +

S_Sub_BehSub_2 = sizeof (K) + sizeof(K2) +sizeof (K3)+sizeof (K4).

Similar to stack storage, sub_behavior storage canbe achieved by hierarchically analyzing originalspecification model.

4.3 Behavior dependency analysis

Besides the statistics of each behavior, the relationsamong behaviors are also very useful for designers tomake design decision. For example, if there is notraffic between behavior D and E as shown in Figure6(a), the behavior D and E can be executed in parallelinstead of in sequential, as shown in Figure 6(b),which may improve the performance.

Two types of behavior dependencies are analyzed:calling dependency analyzes the called/callingrelations; data dependency analyzes whether there istraffic between behaviors.

Behavior dependency can be achieved byhierarchically analyzing original specification modeland by analyzing port access statistics.

A

(b)

B C

A

B C

D

E

(a)

D E

Figure 8: Example of use of behavior dependency

4.3.1 Calling dependency analysis.Calling dependency contains two parts:

(a) Parent behavior.(b) Children behaviors and their

execution relations.

9

For example, in Figure 8(a), Behavior A does nothave parent behavior. Its children behaviors and theirexecution relations can be described as ( B || C ) -> D-> E, while "B || C" represents parallel executionbetween behavior B and behavior C. "D -> E "represents D and E are executed sequentially and E isexecuted after D.

4.3.2 Data dependency analysis.Data dependency represents whether there is traffic

between sub_behavior instances of the behavior. InFigure 6, data dependency represents the amount oftraffic among behavior B, C, D, and E.

The sub_behavior instances are connected by theirports. There are two ways to connect ports ofsub_behavior instances. First, the ports are connectedthrough the global variables/channels that defined inthe parent behavior, such as traffic between Inst1 andInst2 through variable B, in Figure 5. We called thesetypes of variable as connected variable, betweenconnected behavior instances. Second, the ports areconnected by the parent behavior's ports, such astraffic between Inst1 and Inst2 through port A inFigure 5. We called these types of variable asconnected port, between connected behaviorinstances.

Based on the result of behavior statistics analysis,between any two behavior instances, the port-to-porttraffics is calculated, for each connected port and eachconnected variable. The port-to-port traffic is calledbased on following equations.

a) Traffic for each connected port/variable

T_CVk = Max(T_R_Porti, Map(i,k) ,T_W_Portj, Map(j,k) ) + Max(T_W_Porti,Map(i,k) , T_R_Portj, Map(j,k) );T_CPk = Max(T_R_Porti, Map(i,k) ,T_W_Porti, Map(j,k) ) + Max(T_W_Porti,Map(i,k) , T_R_Porti, Map(j,k) )

T_CVk/T_CPk represents the amount of trafficthrough connected variable/port k, betweenbehavior instance i and j. Map(i, k) represents theport of behavior i that is mapped to connectedvariable/port k. In Figure5, for connected port Ain behavior Parent, Map(Inst2, A) is port K3.T_R_Port and T_W_Port is the read and writeaccess of port, described in 4.2.3.2. For trafficbetween behavior instance i and j throughconnected variable/port, read access of mappedport in i may not equals to write access ofmapped port in j. Therefore, maximum of read

access of mapped port in i and write access ofmapped port in j is added with maximum of writeaccess of mapped port in j and write access ofmapped port in i, which will be used asT_CVk/T_CPk. In Figure5, for traffic betweenbehavior instance Inst1 and Inst2 throughconnected port A, T_CVA = Max(T_R_Port Inst1,K2 , T_W_PortInst2, K3 ) + Max(T_W_PortInst, K2,T_R_PortInst2, K3 ).

b) Traffic for all connected ports/variables

T_CP = ΣT_CPk T_CV = ΣT_CVk

The total traffic for all the connectedvariables/ports can be calculated by (15) and(16).

c) Traffic between behavior instances

T_BBi,j = T_CP + T_CV

The total traffic between behavior instance iand j are the sum of traffic for connectedvariables and connected ports.

If there is no traffic between two sub_behaviorinstance, we call this two behavior instances "dataindependent". Otherwise, it is called "datadependent". Furthermore, the closeness of twosub_behavior instances is represented by the trafficbetween sub_behavior instances.

4.4 Two algorithms in behavior profiler

In behavior profiler, several algorithms are appliedto achieve behavior statistics. The complexity ofimplementing behavior profiler comes from thehierarchical analysis of specification. In thesesub_section, two algorithms are described. The firstone is for recursive function calls as well as operationcalculation. The second one is for port access.

4.4.1 Algorithm for recursive function calls andoperation calculation

In 4.2.2, we derive average execution numbers ofoperations of by equations (3)(4)(5), withoutconsidering function calls. In this sub-section, thealgorithm for calculating average execution number ofoperations with considering sub_function calls isdescribed.

There are three types of functions in SpecC. Localfunctions are the functions defined and used inside

10

behaviors; global functions are the functions definedoutside behaviors but in specification; and libraryfunctions are the functions defined in libraries butused in specification. The local functions of eachbehavior can be called recursively. The globalfunctions also can be called recursively.

The average execution number of operations forbehavior equals to the average execution number ofoperations for main function of the behavior.Therefore, we can achieve the execution number ofoperations for behavior by only considering functions.

We use local functions as our example to illustratethe way of solving recursive-calling problem. In thisstage, we ignore library functions and assume theexecution numbers of operations of global functionsare already calculated.

We calculated each OP_OpType_DataType by usingfollowing algorithm. To add the execution number ofoperations of called functions into execution numberof operations of calling functions, the followingequations are adopted [12]

A = C * A + O

A is n-dimensional vector, where its item Ai is theaverage number of operations executed by thefunction i and n is the number of local functions in thebehavior, including the main function. C is a squarematrix, where Ci,j denotes how many times function icalled function j. Ci,j equals to execution number ofbasic block which contains calling statement dividedby execution number of function j. O is dimensinalvector.

Oj = Σk( (Op_Bk +ΣiOp_Global_Function_i+ ΣlOp_Subl) * BBk) / N_Fj

Op_Bk is the number of operations in basic block k offunction j, Op_Global_Function is the averageexecution number of operations of called globalfunction in basic block k. Op_Sub is the averageexecution number of operations in the sub_behaviorinstance. BBk is execution number of basic block k offunction j. N_Fj is the execution number of function j.

Behavior Main_B(int x,int y,int z){

void C(cx1, cx2){int i;i =( cx1+ cx2) * cx1;y = x;

}

void B(b) {int i, j;

for(i=0; i

11

execution. For argument cx2 of function C, there isone read access, per C's execution. Since behaviorMain_B’s port x, y, and z are bound to arguments ofcalled function C, cx1 and cx2, the arguments accessof cx1 and cx2 should be counted as the port access ofx, y, z. We call this port access indirect port access.On the other hand, the statement “x = y” in function Cis called direct port access.Behavior profiler calculates the port access bycompleting following steps as shown in Figure 8:

Build function-calling tree

Calculate execution number of function instance nodes

Build port-argument binding tree

Calculate average argument access for functions

Calculate average direct port access for functions

Calculate average indirect port access for function instance

Calculate port access from global function calls/sub-behavior instance

Calculate total port access for function instance nodes

Calculate total port access for functions

Calculate avg port access for functions

Calculate port access for recursive function call

Figure 10: Design flow of analyzing port access

a) Build function-calling tree: similar tobehavior calling tree in 4.2.1, behaviorprofiler analyzes the function-calling tree, asshown in Figure 9(b). The function-callingtree reflects the called and calling relationbetween function instance nodes.

b) Calculate execution number of functioninstance nodes in function-calling tree.Similar to behavior instance nodes in 4.2.1, ifwe define F_I (i,j) as the execution number ofcalled function j per execution of callingfunction i, and define F_N(i) as the totalexecution number of function instance node i,

F_N(i) = F_N(j) * F_I (i,j)

While F_N(main) = 1. F_I(i,j) equals toexecution number of the basic block thatcontains calling statement of function jdivided by execution number of function i.For example, in Figure 9, the F_I(main, B1)= 5, F_I(B1, C1) = 6. Therefore, F_N(B1(main) ) = F_N(main) * F_I(main, B1) =5, F_N( C1(main_B1) ) = F_N( B1(main) ) *F_I(B1, C1 ) = 30.

c) Build port-argument binding tree, for eachbehavior instance nodes: The bindinginformation is recorded if the argument offunction instance node is bound to the port ofbehavior. For example, the port-argumentbinding tree is displayed as shown in Figure9(c). In this binding tree, the argument cx1 offunction instance node C1(main_B1) ismapped to port x of behavior.

d) Calculate average argument access for eachfunction. In Figure 7, the average access forargument cx1 of function C is 2 read.

e) Calculate the direct port access D_P(i,j),while i refers to function instance node i and jrefers to port j.

D_P(i,j) = port access(j) per functionexecution * F_N(i)

In Figure 7, D_P( C1(Main_B1) ,x) =1(read) * 30 = 30(read).

f) Calculate the indirect port access, based onport-argument binding tree and function-calling tree, for each port of function instancenodes. First, the total argument access offunction instance node D_A(i,k) arecalculated, while i refers to function instancenode i and k refers to argument k.

D_A(i,k) = argument access(k) perfunction execution* F_N(i)

The indirect port access is:

I_P(i, j) = Σk D_A(i,k)

for all the argument k bound to port j. Forexample I_P( C1(main_B1) , x) = D_A(C1(main_B1), cx1) = 2(read) * 30 = 60(read), since argument cx1 of Main_B1_C1 isbound to port x.

g) Calculate the port access from its globalfunction calls and sub_behavior instances, foreach port of function instance nodes,. Since

12

the port accesses of behaviors are calculatedin the order from children behaviors to parentbehaviors, the port accesses of sub_behaviorinstances are already known. The argumentaccesses of global function call are calculatedbefore any behavior port calculation. Afterbinding the ports of behavior to the ports ofsub_behavior instance or arguments of globalfunctions, the port access from its globalfunction calls and sub_behavior instances canbe derived. We define it as G_P(i,j), forfunction instance node i and port j.

h) Calculate the total port accesses for eachfunction instance node FIN_P(i,j).

FIN_P(i,j) = D_P(i,j) + I_P(i,j) +G_P(i,j).

i) Calculate the port accesses for each function,without considering local function calls.

F_P(i,j) = Σk FIN_P(k,j)

While k is the function instance node thathave function type i. For example F_P(C) =FIN_P( C1(main_B1) ) + FIN_P(C2(main_B1) ) + FIN_P( C1(main_B2) ) +FIN_P( C2(main_B2) ) + FIN_P(C1(main_D_B3) ) + FIN_P( C2(main_D_B3)).

j) Calculate the average port accesses for eachfunction, AF_P(i, j), without consideringlocal function call.

AF_P(i, j) = F_P(i,j) / F_F(i)

while F_F(i) is the execution number offunction i.

k) Calculate the average port accesses for eachfunction, AF_P(i, j), considering localfunction calls. Algorithm in 4.4.1 is used tosolve this recursive problem.

In the C program, the arguments of the data typesuch as “int” can only be read but not written. On theother hand, for the pointer type, the argument can beread and written through pointer access. In behaviorprofiler, we calculated the argument access based onthis fact.

5 Retargetable profiler

5.1 Design flow of the retargetableprofiler

Statistics generated by behavior profiler arearchitecture-independent. Allowing users to estimatethe system that reflects architecture explorationdecision, we designed the retargetable profiler.:

Retargetable profiler did the following tasks

a) Retargetable profiler assigns a weight table toeach behavior. The weight table representsthe characteristics of operation, traffic, ormemory for the PE that the behavior ismapped to. Thus, with the weight tables andthe characteristics from behavior profiler,retargetable profiler generates theperformance, traffic, and memoryinformation of the behavior.

b) Each PE consists of a number of behaviors.All of these behaviors are executedsequentially. The behaviors communicate withthe behaviors in the different PEs. Since thestatistics of behavior can be achieved from a),the statistics of PE can also be generated.

c) The system consists of PEs. Retargetableprofiler can generate the statistics of systembased on the statistics of PEs.

SpecC profiler complete separates the behaviorprofiler and retargetable profiler. It makes thatretargetable profiler does only depend on the outputof behavior profiler, but not on the simulating result.That is, the task of retargetable profiler is to only useweight table, behavior statistics, and behaviordependency as input, without simulating testbenchesand analyzing behaviors. Therefore, the process ofexecuting retargetable profiler is very fast. Forexample, for the vocoder project [7] , in Sun’s Ultra5 system, process of behavior profiler took 68seconds, testbench simulation took 22 seconds, andprocess of retargetable profiler only took 5 seconds.Since the statistics of system for architectureexploration decision is calculated by executingretargetable profiler once, designers can completechanging the architecture exploration decision and re-profiling the design in very short time. It makes thatthe architecture can be explored in a very large range.

The process of retargetable profiler is in Figure 9,which has explained in section 1.

13

GUI

Implementation evaluation and annotation

Decision-annotated specification model

Evaluation-annotated specification model

Profiling-annotated specification model

Figure 11: Design flow of retargetable profiler

5.2 Weight table generation

The items in weight table can be divided to threetypes: weight for operation, weight for traffic, andweight for memory. The weight for operationrepresents an operation weight for certain operationtype and certain data type. The weight for trafficrepresents a traffic weight for certain data type. Theweight for memory represents a memory weight forcertain data type.

Each weight table represents one PE. PE can be acustom ASIC, a programmable processor, or an IP. Ifthe PE refers to IP, the IP provider should providerelated weight table. If PE refers to custom ASIC, theweight table should be generated by analyzing theASIC architecture.

If PE refers to a programmable processor, we candevelop the weight table by reading processormanuals and analyzing based on source codegeneration [12]. For example, the SpecC source codeis

{int c, a; c = a + 123;}.

After compiling, the target machine code is:

{MOVE a, R1;MOVE #123, R1; ADD R1, R2;Move R2, c}

From these results, we concluded that each integeridentifier and constant contributed with one MOVEinstruction into the target code, each integer additioncontributed with one ADD instruction, and there was

no contribution from assignment. Using this way, theweight table can be achieved.

The number in the weight table can represents thetime, clock-cycle, and number of bit/Byte, accordingto designers’ purpose.

5.3 Output statistics of Retargetableprofiler

The statistics of retargetable profiler includeperformance, traffic for behavior instance nodes andmemory for behaviors. To simply the name, in thissubsection, we also call behavior instance node asbehavior.

5.3.1 Design PerformanceAverage performance of each behavior is evaluated.

The performance can be counted as execution time,clock cycles of execution, or number of instruction,depending on the meaning of items in PE weight table.The average performance of behavior can becomputed by adding weighted execution numbers ofoperations, which are the product of execution numberof operations generated in 4.2.1 and theircorresponding weight.

Two types of behaviors, leaf behavior andcombinatorial behavior, are treated separately toachieve the performance of the design.

(a) Leaf behaviorLeaf behaviors are the behaviors that do not have anysub_behavior instances. The performance of each leafbehavior contains four levels:

P_OpType_DataType_i =Op_OpType_DataType_i *Weight_OpType_DataType_i

P_OpType_i = ΣDatatypeP_OpType_DataType_i

Avg_P_i = ΣOpTypeP_OpType_i

Total_P_i = Avg_P_i * N_Ni

Op_OpType_DataType_i is the average executionnumber of operation type OpType of data typeDataType, for behavior i, as show in 4.2.2. TheWeight_OpType_DataType_i is weight of the PE thatbehavior i partitions to, for operation type OpType

14

and data type DataType. N_Ni is the execution numberof behavior instance nodes in 4.2.1.

The four levels of performance counted are:P_OpType_DataType_i represents the averageperformance of operation type OpType for data typeDataType of the behavior i. P_OpType_i representsthe average performance of operation type OpType forall data types of behavior i. Avg_P_i represents theaverage performance for all operation type and alldata type of behavior i. Total_P_i represents the totalperformance of behavior i.

(b) Combinatorial behavior and PECombinatorial behaviors are the behaviors that have atleast one sub_behavior. Combinatorial behavior canbe used to represent the performance of PE or thewhole design. The performance of combinatorialbehavior i can be calculated as follows:

i f execut ion relat ions (sub_behavior) == sequentialthen

Total_P i = ΣS u b_ be h(Total_P S u b_ be h ) ;else

if execut ion relat ions(sub_behavior) == paral lelthen

if sub_behaviors are mapped to same PEthen

Total_P i = ΣS u b_ be h(Total_P S u b_ be h) ;else

Total_P i = Max S u b_ be h(Total_P S u b_ be h) ;endif

endifendif

Total_PSub_beh represents the performance ofsub_behavior instances, which should be calculatedbefore the performance of behavior i.

Though the performance of each PE is not explicitlydisplayed by behaviors, it is already considered byusing weight tables. The total performance of designcan be represent by the performance of Mainbehavior.

During performance estimation, we do not consider“waiting time”. For example, when two behaviors Aand B are executed in parallel, and A and B aremapped to different PEs, if B will not executed until Breceive a input from A, there will be some waitingtime for Behavior B. Designers should adjust the

performance for this case, based on the profilingresult.

5.3.2 TrafficRetargetable profiler calculates the amount of

traffic among behaviors. Compared with port widthand port access, the amount of traffic is weightedtraffic thus it is PE dependent. Based on port widthand port access, two types of statistics, static trafficand dynamic traffic are generated by retargetableprofiler.

a) Average static trafficIn some cases, when a behavior is executed, the

data communication is only happened twice: rightbefore the execution of behavior and right after theexecution of behavior. During the execution, the datawill be saved in local memories. In these cases, thetotal amount of data communication per behaviorexecution is called static traffic. If C_S representsaverage static traffic, static traffic can be calculatedby equation

C_Si = 2 * Σ DataType (T_S_DataTypei *Weight_Traffic_DataType)

T_S_DataTypei is the port width defined in 4.2.3.1and Weight_Traffic_DataType is the weight of trafficfor data type DataType.

b) Average dynamic trafficIf the data communication is happened whenever

ports are access, the average traffic for each executionof behavior i is called average dynamic traffic, whichis represented by C_D_Wi and C_D_Ri

C_D_Wi = Σ DataType (T_W_DataTypei *Weight_Traffic_DataType)

C_D_Ri = Σ DataType (T_R_DataTypei *Weight_Traffic_DataType)

while T_W_DataTypei and T_R_DataTypei are writeaccess and read access defined in 4.2.3.2.

c) Total traffic of behaviorFurthermore, designer can calculate the total traffic

of behavior i using average static traffic and averagedynamic traffic. For example, assuming when eachbehavior is executed, the data communication is only

15

happened twice as discussed in 5.3.2 a), the totaltraffic

Ci = C_Si * N_Ni.

While N_Ni is the total execution number of behaviorinstance node i in 4.2.1.On the other hand, if the data communication ishappened whenever ports are access for eachbehavior, such as in 5.3.2.(b), the total traffic

Ci = (C_D_Wi + C_D_Ri) * N_Ni.

d) Traffic between behaviors and PEsBesides the traffic in and out a behavior nodes, the

average traffic between behavior instances weregenerated, based on behavior dependency and trafficfor each behavior, by calculating the port-to-porttraffic between behaviors.

For example, if Behavior are executed sequentially,such as behavior D execute before behavior E asshown in Figure 8(a), and D and E communicatethrough connected ports, the traffic will be exist onlywhen write D and read E. The amount of traffic CD, Eis:

I. If D, E share a PE, there is no traffic.II. If D, E communicate through a global

memory:

CD,,E, =N_ND*T_SD, +N_NE *T_RE,

III. If D, E communicate through a localmemory and

i) If the local memory in D:

CD,,E, =N_NE*T_RE

ii) If the local memory in E:

CD,,E, = N_ND*T_SD

Retargetable profiler only provides the totaldynamic traffic between behaviors instances.Designers should give the correct traffic based ondifferent situation.

The traffic between two PEs are also can becalculated. The traffic between any behavior in PE1and any behavior in PE2 are added to the trafficbetween PE1 and PE2. Designers can easily do thiswork manually.

5.3.3 MemoriesStatistics of memories are weighted storages.

a) Static memoryStatic memory

M_Statici = Σ DataType (S_Static_DataTypei* Weight_Memory_DataType)

While S_Static_DataTypei is described in 4.2.4.1 andWeight_Memory_DataType is the weight of memoryfor data type DataType.

b) Heap memoryHeap memory

M_Heap i = Σ DataType (S_Heap_DataTypei* Weight_Memory_DataType)

While S_Heap_DataTypei is described in 4.2.4.2.

c) Stack memoryStack memory

M_Stack i = Σ DataType (S_Stack_DataTypei* Weight_Memory_DataType)

While S_Stack_DataTypei is described in 4.2.4.3

d) Sub_behavior memorySub_behavior memory

M_Sub i = Σ DataType (S_Sub_DataTypei *Weight_Memory_DataType)

While S_Sub_DataTypei is described in 4.2.4.4.

f) Total memory of behavior and PEThe total memory of behavior

M i = M_Statici +M_Heap i + M_Stack i +M_Sub i

The total memory of PE

M_Pk = Max(M i)

For all behavior i in the PE k.

16

6 A design methodology of UsingSpecC profiler

As mentioned in section 1, the designers makearchitecture exploration decision based on their designexperience. In this section, a simple methodology ofmaking architecture exploration decision by usingSpecC profiler is introduced. This designmethodology was developed based on our designexperience and the advantages of SpecC profiler. It issimple and extendable. Thus designers can improve itaccording to different design cases. The design flowof this methodology is described in Figure 12.

Not succeedNext PE

Specification analysis

Critical path analysis

PE selection for PE i (i =1..n)

Behavior partitioning for PE i (i =1..n)

Specification schedulingFinish allthe PEs

Architecture refinement

Figure 12: Design flow of the simple methodology forarchitecture exploration

6.1 Design assumption

We assume that the specification of design isspecified in SpecC language, the execution relationbetween behavior instances in specification is eithersequential or parallel. During the architectureexploration, these execution relations are not changed.

Since the architecture is assembled from itscomponents, the component library ( PE library ) isalready existed. For each PE, PE table includes threeattributes: MIPS (Million instruction per second),MOPS (Million operation per second), and weighttables for operation, traffic and memory. For differentweight tables, the items of weight tables can havedifferent meanings. For example, if the weight foroperation “+” is 3, it can be explained as thatoperation “+” is compiled to 3 instruction, or itsexecution time is 3ms.

The PE table used in our example is displayed inTable 1.

Name MOPS MIPS WeightTableSW1 2 7 WT1SW2 5 17 WT2HW 10 32 WT3

Table 1 : PE Table

6.2 Specification analysis

To illustrate the methodology, we use a simpleexample.

First of all, designers derived behavior statistic andbehavior dependency by behavior profiler. Behaviorcalling tree of the example is displayed in Figure13(a). In behavior calling tree, (--) represents thesequential execution among subbehavior instances,while (||) represents parallel execution and (Leaf)represents leaf behavior. Since there are no twobehavior instances that are the instances of samebehavior, we use behavior’s name as behaviorinstance name in the behavior calling tree. In thissection, we behavior instance node as behavior forsipilicity.

In table 2, the behavior statistics of the example islisted. To make it simple, only the total executionnumber of operation, total traffic, and total memorysize for each behavior are chosen.

Name Operation Traffic MemMain 56k 0 5kAB 20k 50 Word 3kCD 11k 50 Word 1kA 20k 0 2kB 8k 0 1kC 9k 0 0.5kD 11k 0 0.5kE 25k 0 1kA_1 5k 0 0.5kA_2 15 0 0.5k

Table 2 : Behavior statisitics of example

Based on behavior statistics and behavior callingtree, the behavior parallel graph can be generated bydesigners, as shown in Figure 13(b).

17

(Leaf) A_1

(Leaf) A_2

(--) Main

(||) AB

(||) CD

(Leaf) E

(Leaf) B

(--) A

(Leaf)C

(Leaf)D

(a) Behavior calling tree

A

C

B

D

E

(b) Behavior parallel graph

Op(B)

Op(C)

Op(E)

Op(D)

Op(A)

Column1 Column2

Figure 13: Specification display

In behavior parallel graph[13], each white blockrepresents a behavior, labeled with behavior name.The behaviors in different columns are executed inparallel, such as A and B, or C and D, based on thespecification. In each column, the behaviors areexecuted sequentially, starting from the top. If twobehaviors are executed in parallel, the successor ofthem can not be executed until the executions of twobehaviors are finished. For example, C and D willstart executing after execution of A and B. Inbehavior parallel graph, the length of block representsthe total execution number of operations (Op) forbehavior. The shaded block means do nothing.

6.3 Critical path analysis

If two behaviors are executed in parallel, the onethat has the greater execution time will influence theperformance of the system. Therefore, critical path isfirst analyzed. Critical path consists of behaviors. Ifthe behaviors are executed in parallel, the behaviorthat have largest Op is in the critical path, such as Aand D in Figure 10. Sequential behaviors are alwaysin the critical path, such as E in Figure10.

Designers should rearrange the behaviors inbehavior parallel display, for showing the criticalpath. For any behavior I and J, if Op(I)> Op(J), thenColumn(I)< Column(J), while column(I) refers to thecolumn number of behavior I. Therefore, the critical

path is represented in column 1, as shown in Figure13.

6.4 PE selection

The behaviors on different columns of behaviorparallel graph are executed in different PEs.Therefore, PEs are first selected column by column.Since the column 1 represents the critical path, PEselection should in the order from column 1 to columnn. Op of column equals to the sum of Ops ofbehaviors in the column. Since execution time ofsystem is related to Op of critical path, we use criticalpath as an example to show our methodology.

In the beginning, designers should try to find easiestdesign solution: pure SW solution. In this solution, allthe behaviors in critical path are implemented in oneprogrammable processor. From 6.3, we knew the totalexecution number of operation of critical path, whichis called Op(Critical_Path). We also know the timeconstraint of system, which is called T_Constraint.Thus, Op(Critical_Path) divided by T_Constraint cantell us how many operations are needed to proceed persecond, which is called Need_MOPS. By comparingNeed_MOPS and MOPS of PEs in PE library, we canfind the suitable PE.

Using MOPS, we select PE in operation level,which is not accurate. After using retargetableprofiler, performance of critical path, which is calledP(critical_path), can be generated. If the items inweight table represent the number of instructions foreach operation, P(critical_path) represents the totalexecution number of instructions. Similar,P(Critical_Path) divided by T_Constraint can tell ushow many instructions are needed to proceed persecond, which is called Need_MIPS. By comparingNeed_MIPS and MIPS of selected PE, we can ensurethat the PE we select based on MOPS can meet timeconstraint.

The algorithm of selecting the SW PE can bedescribed as follows:

Need_MOPS = Op(Cri t ical_Path)/T_Constraint ;for test_PE = PE1 to PE n do

i f Need_MOPS > MOPS(test_PE) thentest next PE;

else / / Have SW PE good for MOPSGenerate P(Cri t ical_Path), based on test_PE;Need_MIPS = P(Cri t ical_Path) /

T_Constraint ;i f Need_MIPS

18

return ( test_PE);/ / Find SW PE good for MIPS

endif endif

endfor

/* If no PE can be selected based on MOPS, t ry thefastest PE based on MIPS, because MOPS is notaccurate for PE select ion */Generate P(Cri t ical_Path), based on PEn;Need_MIPS = P(Cri t ical_Path) / T_Constraint ;i f Need_MIPS P(Selected_Beh)then

Selected_Beh = Leaf_behavior(i);endif

endfor

P(Crit ical_Path) = P_REPROFILE(Crit ical_Path,Selected_Beh);while (Selected_Beh != NULL) & (P(Cri t ical_path)> T_constraint ) do

P(Crit ical_Path) = P_REPROFILE(Crit ical_Path,Selected_Beh);

Selected_Beh = Parent (Selected_Beh);Generate P(Cri t ical_Path) ;

endwhilereturn Selected_Beh;

In our example, Leaf_behaviors are A_1, A_2, B,C, D, and E. The Parent(A_1) is A. The purpose ofP_REPROFILE is to derive the performance ofcritical path, in the case that the second argumentSelcted_Beh is mapped to HW while the rest ofbehaviors are mapped to SW, for the column

19

represented by first argument. P_REPROFILE isimplemented by retargetable profiler.

The advantage of this algorithm is simple. Thedisadvantage is that it needs to execute retargetableprofiler several times for function P_REPROFILE.

A revised algorithm is given in the following.Firstly, retargetable profiler generates performance ofeach behavior for SW and HW, respectively. We useP_S(i) for SW performance of behavior i, whileP_H(i) for HW performance of behavior i. Furthermore, P_Comm(i) is used to represent theperformance overhead for traffic of behavior i.

To find the behavior for HW, we calculateNeed_Improvement.

Needed_Improvment = P_S(Critical_path)–T_constraint

P_S(Critical_path) is the performance of criticalpath/system for pure SW solution. Secondly,Perf_Gain is calculated for each behavior.

Perf_Gain(i) = P_S(i) - P_H(i) –P_Comm(i)

For any behavior i, if Perf_Gain(i) >Needed_Improvment, we can say that the system canmeet the time constraint by implementing behavior iin HW while rest of behaviors in SW. Thus, we needto find the behavior that can satisfy above equation.

Finally, we can get the behavior for HW by followingalgorithm,

Selected_Beh= Null ;Perf_Gain(Selected_Beh) = 0;for i=1 to num_of_leaf_behavior do

if Perf_Gain ( Leaf_behavior(i) ) >Perf_Gain(Selected_Beh) then

Selected_Beh = Leaf_behavior(i);endif

endfor

while (Selected_Beh != NULL) &(Perf_Gain(Selected_Beh)< Needed_Improvement ) do

Selected_Beh = Parent (Selected_Beh);endwhilereturn Selected_Beh;

In our example the performance of behaviors arelisted in Table 3.

Name P_S(ms)

P_H(ms)

P_Comm(ms)

Perf_Gain(ms)

Main 11.52 6.07 0 5.45AB 6.49 3.44 0.50 2.55CD 2.26 1.20 0.50 0.56A 4.11 2.18 0 1.93B 1.64 0.87 0 0.77C 1.85 0.98 0 0.87D 2.26 1.20 0 1.06E 5.14 2.70 0 2.44A_1 1.02 0.53 0 0.49A_2 3.08 1.64 0 1.44

Table 3: Performance for behaviors in example

If the performance constraint of design is 10ms.Needed_Improvement = 11.52- 10 = 1.52 ms. Amongleaf behavior E, A_1, and A_2, we can found behaviorE has greatest Perf_Gain, which is 2.44ms. SincePerf_Gain(E) is greater than Need_Improvement, it isthe behavior for HW.

If Selected_Beh is found out by this algorithm, thesystem can meet the constraint requirement byimplementing Selected_Beh in HW.

If no behavior can be selected, and P_H(System) >T_constraint, it means even the pure HW solution cannot meet the requirement. In this case, the behaviorscheduling should be made.

6.6 Behavior scheduling

If the HW solution can not be meet the requirement,the specification should be scheduled by designers, tochange more sequential execution to parallelexecution. Designers can make this change based onbehavior-dependency, since behavior-dependency cantell designers whether two behaviors are independent.For example, if the performance constraint of designis 5ms, and SpecC profiler tell designers that there areno traffic between behavior E and behavior CD or AB,designers should change the specification to makebehavior E parallel executed with behavior AB andCD. The PE selection, behavior partitioning andbehavior scheduling should be make executed againfor new specification.

20

7 ConclusionIn this report, the SpecC profiler is introduced. SpecCprofilers contain two parts: behavior profiler andretargetable profiler. Behavior profiler provides thecharacteristics of specification, while retargetableprofiler provides the characteristics of system,according to designers’ architecture explorationdecision.

SpecC profiler has following advantagesa) SpecC profiler is retargetable profiler: it can

evaluate the characteristics of behavior that isexecuting on any selected PEs. Moreover, it canprovide profiling result for the case that differentfunctions of specification are executed ondifferent PEs.

b) SpecC profiler not only evaluates performance,but also evaluates the amount of traffic, neededmemory size, for each function.

c) SpecC profiler evaluates parallel execution amongfunctions as well as sequential execution.

d) SpecC profiler analyzes the behavior dependency,which provides designers more flexibility ofscheduling.

e) Two part of SpecC profiler, behavior profiler andthe retargetable profiler, are separated.Therefore, design simulation is only executedonce. It makes the process of executingretargetable profiler fast and makes large rangearchitecture exploration possible.

f) SpecC profiler is designed for SpecC language,which is worldwide-accepted system level designlanguage. Thus, it can be used as one of tools inSpecC Engine for other Chip designers and EDAvenders.

Besides SpecC profiler, a simple methodology ofmaking architecture exploration decision is given.

The SpecC profiler works on specification model ofSpecC language. The evaluation result is not realcycle-accurate. However, it is first step to limit therange of architecture exploration and it outlinesprofiler of abstract level in SoC design.

Reference:[1] D. D. Gajski, J. Zhu, R. Dömer, A. Gerstlauer, S. Zhao,

SpecC: Specification Language and Methodology, KluwerAcademic Publishers

[2] Andreas Gerstlauer, Rainer Doemer , J. Peng, D Gajski,System design: a practical guide of SpecC, Kluwer AcademicPublishers.

[3] Rainer Doemer , SpecC SIR Internal representation, CECSInternal report IR99-03, June 1999

[4] David Berner, Development of a Visual Refinement- andExploration-Tool for SpecC, Technical Report ICS-01-122000

[5] L. Cai, J. Peng, C. Chang, A. Gerstlauer, H. Li, A. Selka, C.Siska, L. Sun, S. Zhao and D. Gajski, "Design of a JPEGEncoding System," UC Irvine, Technical Report ICS-TR-99-54, November 1999.

[6] Hanyu Yin, Haito Du, Tzu-Chia Lee, Daniel D. Gajski,"Design of a JPEG Encoder using SpecC Methodology," UCIrvine, Technical Report ICS-TR-00-23, July, 2000.

[7] Andreas Gerstlauer, Shuqing Zhao, Daniel D. Gajski andArkady M. Horak, "Design of a GSM Vocoder using SpecCMethodology," UC Irvine, Technical Report ICS-TR-99-11,March 1999.

[8] Junyu Peng, Lukai Cai, Anand Selka, Daniel D. Gajski,"Design of a JBIG Encoder using SpecC Methodology," UCIrvine, Technical Report ICS-TR-00-13, June, 2000.

[9] DSP56600 16-bit Digital Signal Processor Family Manual,Motorola Inc.

[10] Rick Grehan, Code Profilers: Choosing a Tool for analyzingperformance, A Metrowerks white paper

[11] Chang, Cooke, Hunt, Surviving of SoC Revolution, A guideto platform-Based Design, Kluwer Academic Publishers.

[12] Sinisa Srbljic, Mario Stefanec, Ivan Benc SpecC Profiler:Specification-level Exploration Tool

[13] D. Gajski, J. Peir Essential Issues in Multiprocessor Systems1985 IEEE

introduction of design-oriented profiler of specc language · existing profiling tools are...

Documents