representing bit vectors and arrays in control data flow graph …pabitra/facad/06cs6007t.pdf ·...

Representing Bit Vectors and Arrays in Control Data Flow Graph for High Level Synthesis

A thesis submitted in partial fulfillment of the requirements

for the degree of

Master of Technology in

Computer Science and Engineering by

Chaithanya Kiran Kona (06CS6007)

Under the guidance of Dr. Dipankar Sarkar

and Dr. Chittaranjan Mandal

Dept. of Computer Science and Engineering Indian Institute of Technology

Kharagpur

April 2008

Dept. of Computer Science and Engineering Indian Institute of Technology Kharagpur

Certificate May, 2008

This is to certify that the thesis entitled “Representing Bit Vectors and Arrays in Control Data

Flow Graph for High Level Synthesis” submitted to the Department of Computer Science and

Engineering, Indian Institute of Technology, Kharagpur by Chaithanya Kiran Kona (Roll No.

06CS6007) for the partial fulfillment of the requirements for the award of degree of Master of

Technology in Computer Science and Engineering is a bonafide record of the work carried out

by him under my supervision and guidance. The research report and results embodied in this

thesis have not been submitted for any other degree or diploma in any other University or

Institute.

.……….……………………….. .……….……………………….. Prof. C.R.Mandal Prof. Dipankar Sarkar Dept. of Computer Science and Engineering Dept. of Computer Science and Engineering Indian Institute of Technology Indian Institute of Technology Kharagpur – 721302 Kharagpur – 721302

Acknowledgement

I avail this unique opportunity to express my gratitude and indebtedness to my project

supervisors Prof. Dipankar Sarkar and Prof. C.R.Mandal, Department of Computer Science And

Engineering, Indian Institute of Technology, Kharagpur, for their sustained interest, advises,

perpetual encouragement and thoughtful constructive criticisms during the course of the

investigation and preparation of the manuscript.

I am sincerely grateful to Prof. Indranil Sen Gupta, Professor and Head, Department of

Computer Science and Engineering, Indian Institute of Technology, Kharagpur for providing all

necessary facilities for the successful completion of my project.

I further acknowledge Chandan Karfa for his support to enhance the features of the

synthesis tool. I would like to extend my heartfelt thanks to my friends for their support and help

to overcome the difficulties by always being with me in my ups and downs during the project.

I would also like to thank the supporting staff of Department of Computer Science and

Engineering for their timely help and assistance.

Abstract

High-Level Synthesis (HLS) comprises translating a behavioral specification into

its corresponding Register Transfer Level (RTL) specification of the system. Structured

Architecture Synthesis Tool (SAST) takes the behavioral description of an input design

and outputs the synthesizable RTL Verilog code. This work involves enhancing the

SAST by adding interfaces for the verifier.

The first step of HLS involves deriving control and data-path information from

the behavioral code into a Control Data Flow Graph (CDFG). We added array access

notations to the existing CDFG representation. Using that representation we can

schedule the array operations.

The preprocessor converts the input CDFG into intermediate representation (IR)

which consists of the precedence constraints or partial order between the operations in

each basic block, along with the incoming (in) and outgoing (out) variables set for each

basic block. We implemented preprocessing stage of SAST in order to handle the array

operations. This generated intermediate representation is used as input to the

scheduler.

Key words: High-Level synthesis, DFG, CDFG, RTL, Preprocessor

Contents

1 Introduction 2

1.1 High-level Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Contributions of the Present Work. . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Organization of thesis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Related Works 6

3 Structured Architecture Synthesis Tool (SAST) 8

3.1 Target Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.2 Features of SAST. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4 Data Flow Graphs 16

4.1 Data Flow Graphs (DFG) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.2 Data Flow Graphs for Array Operations. . . . . . . . . . . . . . . . . . . 20

4.2.1 Array access notation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.2.2 Array change notation. . . . . . . . . . . . . . . . . . . . . . . . . . . 20

5 Control Data Flow Graphs (CDFG) 24 5.1 CDFG Representation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

5.2 Example Representation of CDFG. . . . . . . . . . . . . . . . . . . . . . . . . 26

6 Preprocessor 29 6.1 Live variable analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

6.2 Dependency Graph Extraction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 6.3 Intermediate representation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

6.4 Implementation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

7 Experimentation and Results 40 7.1 Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

7.2 Conclusions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

7.3 Future Work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

7.4 Lift controller results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

Bibliography 52

List of Figures

3.1 Hand-in-hand synthesis and verification . . . . . . . . . . . . . . . . . . . . 10

3.2 Schematic of structured architecture . . . . . . . . . . . . . . . . . . . . . . . . 13

4.1 a basic block in C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.2 The basic block in single-assignment form . . . . . . . . . . . . . . . . . . . 17

4.3 An extended data flow graph for sample block. . . . . . . . . . . . . . . 18

4.4 Standard data flow graph for sample basic block. . . . . . . . . . . . . 19

4.5 Array access notation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.6 Array change notation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.7 flow chart for lift controller. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.8 Data flow graph for lift controller. . . . . . . . . . . . . . . . . . . . . . . . . . 23

5.1 CDFG representation of GCD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5.2 CDFG representations with array operations . . . . . . . . . . . . . . . . 28

6.1 Data flow diagram for preprocessor before scheduling. . . . . . . . 30

6.2 Code for diffeq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

6.3 Partial order for diffeq code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Chapter 1

Introduction

1.1 High-level Synthesis

Synthesis is the process of translating a behavioral description into a

structural description. Synthesis is also defined as the process of

interconnecting primitive components at a certain level of abstraction (target

level) to realize a specification at a higher level of abstraction (source level).The

transformation of the design at source level is carried out to achieve some

predefined performance goals or constraints.

Several levels of the synthesis process are system synthesis, high level

synthesis, logic synthesis and layout synthesis. High level synthesis (HLS) takes

algorithmic or high level behavior as input and outputs the register transfer level

(RTL) description consisting of functional units, storage and interconnecting

units. Individual synthesis systems cater to different constraint goal sets. Typical

user constraints are area, clock speed and power.

The high-level synthesis (HLS) process consists of translating a behavioral

specification into an RTL structural description containing a data path and a

controller so that the data transfers under the control of the controller exhibit the

specified behavior. The input behavioral description to the HLS is represented as a

control data flow graph (CDFG) and the output of the HLS is a structural data

path of interconnected components and a controller. The synthesis process

consists of several interdependent sub-tasks such as, scheduling, allocation, and

binding and controller design. The operations in the behavioral description are

assigned control steps through a scheduling process. Each control step is the

basic time unit of the synchronous digital system. The allocation process

computes the minimum number of functional units and registers required to

synthesize the design based on the scheduling information of the operations and

the operators available in the component library. The variables are mapped to

registers and the operations are mapped to functional units by the binding

process. Finally, the controller is designed based on the data transfer required

among the data path elements in different control steps.

1.2 Contributions of the Present Work

The goal of present work is related to incorporating arrays and bit vectors

into SAST. SAST takes behavioral description having array variables as input

and produces the synthesizable RTL verilog as the output. It takes resource

library and user architectural constraints as additional inputs.

The features that are already there in the existing system are follows.

• A compiler that converts the VHDL behavioral specification into an

intermediate form (CDFG). This CDFG is later processed to extract the

dependencies between the operations which will be used in scheduling

phase.

• A GA based scheduling algorithm, which takes the behavioral description

in the form of a CDFG. Each operation in the CDFG is represented in three

address code. An operator library and the architectural constraints are

taken as additional inputs to the scheduling algorithm.

• To reduce the number of control steps and the resource requirement, a

method of handling the variable assignments has been devised which is

different from that of handling operations.

• Construction of the data path and controller for the scheduled input

design and generating the synthesizable RTL verilog for both, the data

path and the controller.

• A system to verify the results generated at each step of synthesis process

is implemented. It takes the two FSMDs from two steps of synthesis

process as input and finds the equivalence between them.

• To increase the performance, operation chaining feature was added to the

system on a per block basis based on a delay model.

To make SAST a full-fledged high level synthesis system that handles array

operations, enhancements and extensions have been carried out over the existing

system through the present work, these are as follows.

• We encoded lift controller behavior is encoded that contains array

operations and the corresponding CDFG is extracted which is used as

input to the SAST.

• We added array access and change notations to the existing CDFG

representation to handle the array operations.

• We enhanced the preprocessing phase of SAST by introducing array

access and change notations in the DFG (data flow graph).

1.3 Organization of thesis

The thesis for the present work is organized as follows:

• Chapter 1 is introduction, discussion about HLS, the phases involved in

HLS, basic block representation which involves operations on data arrays,

scheduler which assigns control steps to the memory operations along

with arithmetic and logic operations.

• Chapter 2 discusses related work referenced.

• Chapter 3 describes about the Structured Architecture Synthesis Tool

(SAST).

• Chapter 4 describes the CDFG representation for array operations.

• Chapter 5describes the CDFG representation for array operations.

• Chapter 6 describes preprocessing information of the array operations.

• Chapter 7 has results after experimentation along with conclusion and

future work.

Chapter 2

Related Works

The use of arrays and records in modern hardware-description languages

(HDL) allows designs to be modeled at very high levels of abstraction. Many

behavioral descriptions for manipulating large amounts of data computations

use array variables to represent data storages.

Aggregate data types such as arrays are often used in describing designs

at higher-levels of abstraction. These complex types are useful for grouping

related data into a single object, which makes the description more readable and

concise. From a design style point of view, the use of aggregate types makes the

design description more maintainable.

Another reason for the frequent usage of arrays originates from their wide

range usage in software languages. Many methodologies today incorporate the

use of 'C' for algorithmic development. This 'C' description is then translated to

an HDL (e.g., VHDL or Verilog) for synthesis. The arrays used in the 'C'

description can be mapped directly to arrays in the HDL. Thus, it is important to

be able to synthesize these data types efficiently.

In the existing SAST tool there are no arrays and bit vectors. We want

to enhance the synthesis tool, SAST by incorporating arrays and bit vectors.

Chapter 3

Structured Architecture Synthesis

Tool (SAST)

To deal with the increasing complexity of today's VLSI designs the use of

high level synthesis systems becomes increasingly crucial. Several HLS systems

like Maha [1], HAL [2], STAR [3], SAM [4], and GABIND [5] are now available to

support the HLS of digital systems. Over the last several years, these systems

have evolved from elementary systems producing non-optimized data paths to

more sophisticated systems generating data paths optimized with respect to area,

time, power and testability. With the advancement of the VLSI circuit

technology, a rapid scaling of the feature size has been performed. Device scaling

implies that the circuit performance will be increasingly determined by the

interconnection performance. For instance, interconnection contributes 50

percent of total delay in 0.35 micron technology whereas it is expected to rise up

to 70 percent in 0.25 micron technology. Thus, interconnections are expected to

play the most critical role in design of chips in deep sub-micron technologies.

The development of FPGA has also taken place around this time. These are now

becoming attractive platforms for prototyping designs, simulation acceleration,

hardware in loop simulation etc. To the best of our knowledge most HLS tools,

however, produce optimized designs in terms of resources, time steps, power,

area etc. without much emphasis on reduction of long and random

interconnections.

SAST is a genetic algorithm based HLS tool designed to synthesize the

behavioral specification into a structured architecture frame work .This tool is

depicted in figure 1. This structured architecture (SA) leads to interconnect

optimization. The schematic diagram of the SA is shown in figure 2. The

datapath is organized as architectural blocks (A-block). Each a block has a local

functional unit (FU), local storage and internal interconnections, as shown in

figure 2. SA also permits the use of memories as architectural components.

These are connected to the global buses like the A-blocks. These structured data

paths avoid random interconnections between datapath elements. Each A-block

has a simple implementation. This makes the generated design easy to

implement on programmable devices such as FPGAs.

3.1 Target Architecture

Figure 3.1 Hand-in-hand synthesis and verification

Structured Architecture Synthesis Tool (SAST) essentially takes the behavioral

description of an input design in the form of three address code, and outputs the

synthesizable RTL Verilog code. The generated data path is organized as

architectural blocks (A-block). Each A-block has a local functional unit (FU), local

storage and local buses (also called as access links). All the A-blocks in a design

are interconnected by a number of global buses. Other than the local memories in

all A-blocks, SAST also permits the use of global memories as architectural

components. These memories are similar to an A-block, except that it does not

contain any functional unit in it. These memories can be accessed globally by all

the A-blocks. These external memories are connected to all A-blocks by global

buses. The global memory units in the structured architecture play an important

role as a convenient interface for the system. While it may be difficult to initialize

a specific storage location within an A-block, it is considerably easier to store

initial operands and retrieve final results from the global memory units. Global

memories help improve the availability of operands and relieve the storage

requirement in individual A-blocks.

The schematic diagram of the Structured Architecture (SA) is shown in

Figure 2. All the data path components are of the same width. That is, the local

buses, storage units, functional units in the A-blocks and the global buses have

same width.

There are input/output ports that are connected to global buses, so that all

the A-blocks can access any of the ports. Each A-block has local memory as

register bank, which are connected to global buses through internal buses (access

links). And each A-block has one functional unit (FU), which takes input from

either local memory, or from internal buses. The output from the functional unit

sends back either to the register bank or to internal buses. Switches are there in

the design to enable/disable the connection between any two components in the

A-block. The group of the switches, which connects the internal buses and the

output of FU to the input ports of registers, are called as in-switches. The group

of switches, which connects the output of registers and output of FU to internal

buses, are called as out-switches. Global buses are connected to input ports of FU

through internal buses and in-switches. Output port of the FU is connected to the

global buses through internal buses and out-switches. The schematic diagram of

an A-block is shown in the Figure 3.

The phases in the synthesis tool are VHDL translation, scheduling,

Allocation and binding, controller generation and RTL code generation.

In SAST, after extraction of CDFG from the VHDL we have to

construct the live variables carrying data across several basic blocks and the

partial order between operations for each basic block, this module is called

as preprocessor which produces the intermediate representation (IR) necessary

for scheduling the design specified in CDFG.

The correctness of HLS process is verified in three phases. The phase-I

verifies the scheduling process. This phase is also called as scheduling verification.

The input of this phase is the CDFG and the output is the scheduled behavior. In

phase-II, the datapath generated after allocation and binding is verified against

the scheduled behavior. We will verify the registers sharing among the variables

of the input specification. This phase is called as datapath verification. In phase-III,

the controller will be verified against the data path. This phase is called controller

verification. Verification task of this phase involves checking the correctness of the

control signals.

Figure 3.2 Schematic of structured architecture.

The structure of the data path is characterized by a set of architectural

constraints like the number of A-blocks, the number of global memories, the

number of global buses interconnecting the A-blocks, the number of access links

or access width connecting an A-block to the global buses and the maximum

number of writes per time step to storage locations in an A-block. The

architectural parameters which are internal to an A-block (e.g. number of

accesslinks and number of write ports to internal memory, etc.) are same to each

A-block. These structured data paths avoid random interconnects between data

path elements. Each A-block has a simple implementation. This makes the

generated design easy to implement on programmable devices such as FPGAs.

3.2 Features of SAST

Reduction in Interconnection Cost: There are many high level synthesis tools

currently available in the market. But all of the present tools try to produce the

optimal RTL with the random interconnections among the data path components

which raise the interconnection cost while fabricating the design. Field

programmable gate arrays (FPGA) are naturally attractive for prototyping the

design generated by high level synthesis (HLS). Programmable devices tend to

have limited wiring resources between the data path elements so the designs

implementing on such devices required avoiding the long-distance

interconnections. We used a structured architecture for HLS which produces the

predictable interconnections among the data path components. This causes low

interconnection cost in the design.

VHDL to CDFG Parser: SAST has a parser which converts VHDL

behavioral specification to CDFG which is an intermediate form. This

intermediate form is given as an input for processing phase for extracting

dependencies between the operations and the control flow between the basic

blocks.

Scheduling: SAST uses genetic algorithm based scheduler for scheduling

the input design. SAST supports both time constrained and resource constrained

scheduling. Resource information and the maximum number of control steps the

scheduler can take to schedule for each basic block are provided to the scheduler

as input constraints. It also handles multi-cycle and pipelined functional units.

Register allocation and binding: After scheduling is completed, the next step

is live variable analysis and register allocation for each A-block. SAST uses

minimum number of registers to store the intermediate values in the design. It

uses register interconnection optimization (RIO), which reduces the number of

interconnections and switches in the design.

Data path Generation: Data path for each A-block consists of functional

unit, a register bank and access links, which connects A-blocks to the global

buses. SAST uses structured architecture (SA) in the data path, which reduces the

interconnection length between the data path components. SAST uses minimum

number of buses to schedule the input design.

RTL Generation: The final output from SAST is the RTL description in

verilog. It generates the synthesizable verilog code for both the data path and the

control path.

Chapter 4

Data Flow Graphs

4.1 Data Flow Graphs

A data flow graph is a model of a program with no conditionals. In a high-

level programming language, a code segment with no conditionals-more

precisely, with only one entry and exit point-is known as a basic block. Figure 4.1

below shows a simple basic block. As the C code is executed, we would enter this

basic block at the beginning and execute all the statements.

w = a + b;

x = a - c;

y = x+ d;

x = a + c;

z= y + e;

Figure 4.1 a basic block in C

Before we are able to draw the data flow graph for this code we need to

modify it slightly. There are two assignments to the variable x- it appears twice

on the left side of an assignment. We need to rewrite the code in single-

assignment form, in which a variable appears only once on the left side. Since

our specification is C code, we assume that the statements are executed

sequentially, so that any use of a variable refers to its latest assigned value. In

this case, x is not reused in this block (presumably it is used elsewhere), so we

just have to eliminate the multiple assignment to x. The result is shown in Figure

4.2 below, where we have used the names x1 and x2 to distinguish the separate

uses of x.

w = a + b;

x1 = a - c;

y = x1 + d;

x2 = a + c;

z = y + e;

Figure 4.2 The basic block in single-assignment form

The single-assignment form is important because it allows us to identify a

unique location in the code where each named location is computed. As an

introduction to the data flow graph, we use two types of nodes in the graph

round nodes denote operators and square nodes represent values. The value

nodes may be either inputs to the basic block, such as a and b, or variables

assigned to within the block, such as w and x1. The data flow graph for our

single-assignment code is shown in Figure 4.3 below.

The single-assignment form means that the data flow graph is acyclic if

we assigned to x multiple times, then the second assignment would form a cycle

in the graph including x and the operators used to compute x. Keeping the data

flow graph acyclic is important in many types of analyses we want to do on the

graph.

Figure 4.3 an extended data flow graph for sample block

The data flow graph is generally drawn in the form shown in Figure 4.4

below. Here, the variables are not explicitly represented by nodes. Instead, the

edges are labeled with the variables they represent. As a result, a variable can be

represented by more than one edge. However, the edges are directed and all the

edges for a variable must come from a single source. We use this form for its

simplicity and compactness.

Figure 4.4 Standard data flow graph for sample basic block

The data flow graph for the code makes the order in which the operations

are performed in the C code much less obvious. This is one of the advantages of

the data flow graph. We can use it to determine feasible reordering of the

operations, which may help us to reduce pipeline or cache conflicts. We can also

use it when the exact order of operations simply doesn't matter. The data flow

graph defines a partial ordering of the operations in the basic block. We must

ensure that a value is computed before it is used, but generally there are several

possible orderings of evaluating expressions that satisfy this requirement.

4.2 Data Flow graphs for Array Operations

We introduced array access and array change notations to the data flow

graphs such that they can represent array operations. From these data flow

graphs we can construct control data flow graph (CDFG) which is used as the

input to the preprocessor.

4.2.1 Array access notation

The array access notation can be represented in data flow graph as shown

in the figure 4.5. Consider the array variable X[i],

Figure 4.5 data flow graph for Array access notation

4.2.1 Array change notation

Consider the array operation shown below. It can be represented in data

flow graph using array change notation as shown in the figure 4.6.

Array access notation

X[i] = Y

Figure 4.6 data flow graph for Array change notation

Figure 4.7 shows the flow chart for lift controller that includes array operations

and figure 4.8 shows the corresponding data flow graph for the lift controller.

[ ]= X

Array change notation

Figure 4.7 flow chart for lift controller

req = a ^ b

current = 0 scan = 0 move = 1 up = 1 doorclose = 1 atfloor =1

req == 1

req[scan] == 1

move = 1

req[current] == 1

up == 1 move = 0 doorclose =0 req[current]=0 scan = current

current = current+1 current = current-1

scan <= 0

scan >= 7

up == 1

up = 1

scan = scan-1

up = 0

scan = scan+1

req scan

req[scan]

req current

[ ] req[current]

== [ ]=

== scan

Figure 4.8 data flow graph for lift controller

Chapter 5

Control Data Flow Graphs

Most of the HLS systems require the input in form of a data flow graph to

be used for all the phases. Manually writing a data-flow specification along with

the control parameters is time consuming and error-prone. So instead behavioral

design specification at very high abstraction level is provided as input to

systems. The designs are coded in high-level languages like C or hardware

design languages like VHDL, Verilog etc. Translation scheme involves the

methodologies involved in translating the high-level specification into CDFG.

The methodologies involved are similar to one in the front-end of a typical

compiler flow. In this section we give details of the CDFG representation used,

VHDL subset and the methodology for the translation scheme.

5.1 CDFG representation

The CDFG representation used for SAST is block based. The Control and

Data Flow Graph is a directed graph that can be represented as B = (V, E). The

nodes v є V represent a Basic block. The data dependency is maintained in each

basic block. Moreover each basic block can further be represented by a data flow

graph. Here each node of the data flow graph is a three-address instruction

specifying the appropriate operation to be performed. The basic blocks showing

branching have single conditional statements representing conditional constructs

like IF, CASE or LOOP constructs. Thus the directed edges in the CDFG

represent transfer of a value or control from one node to another.

The interpretation of B is imperative: an operation is executed after one of

its predecessors is executed. We have notions for reading and writing data from

ports namely as READ and WRITE operations, included into the CDFG

representation. There is a directed edge from block bi to bj , if bj immediately

follows bi in some execution sequence; that is if,

• there is a conditional or unconditional jump from the last statement

of bi to the first statement of bj , or

• bj immediately follows bi in the order of the program, and bi does

not end in an unconditional jump.

We say that bi is a predecessor of bj, and bj is a successor of b j . Basic

block is represented by a record consisting of a number of three address

statements in the basic block, list of three address statements, and by the list of

logical successors of the basic block. Both source and result variables of the

operations in the CDFG, are converted into a sanitized form, if variable v is in the

operation of a

CDFG, from the symbol table if v is at ith position, sanitized operation

contains as i. In translating a functional specification to a register transfer level

(RTL) design, one needs to know the data dependency information in each basic

block i to do scheduling and data-path allocation. Within each basic block this

dependency is preserved between the operations.

5.2 Example Representation of CDFG

The figure 5.1 is the CDFG representation for the GCD behavioral

description. From the figure we can say that B0 is the initial block and B3 is the

final block.

Figure 5.1 CDFG representation of GCD

In B0, from read port we are reading the value of y1 and y2 and final output is

writing in block B3.

20 B0 2 read (p0, a) read (p1, b) B1 6 current = 0 scan = 0 move = 0 up = 1 doorclose = 1 atfloor = 1 B2 1 req1 = a^b C1 1 req1 == 1 B3 1 req = req1 C2 1 req[scan] == 1 B4 1 move = 1 C3 1 req[current] == 1 C4 1 up == 1 B5 1 current = current+1 B6 1 current = current-1 B7 4 move = 0 doorclose = 0 req[current] = 0 scan = current C5 1 req == 1 C6 1 up == 1 B8 1 scan = scan+1

C7 1 scan >= 7 B9 1 up = 0 B10 1 scan = scan-1 C8 1 scan <= 0 B11 1 up = 1 20 B0 1 B1 B1 1 B2 B2 1 C1 C1 2 0 B0 1 B3 B3 1 C2 C2 2 0 C5 1 B4 B4 1 C3 C3 2 0 C4 1 B7 C4 2 0 B6 1 B5 B5 1 C3 B6 1 C3 B7 1 C5 C5 2 0 B0 1 C6 C6 2 0 B10 1 B8 B8 1 C7 C7 2 0 C2 1 B9 B9 1 C2 B10 1 C8 C8 2 0 C2 1 B11 B11 0

Figure 5.2 CDFG representations with array operations

Chapter 6

Preprocessor

The preprocessor converts the input CDFG into intermediate

representation(IR) which consists of the precedence constraints or partial order

between the operations in each basic block, along with the incoming(in) and

outgoing(out) variables (location availability in the A-blocks) set for each basic

block. Data flow diagram of the preprocessor, consisting of several subtasks is

shown in figure 6.1. Each component in figure 6.1 is explained briefly as follows,

• Input operations get the list of operations for each basic block in the

CDFG, constructs symbol table of variables in the CDFG.

• Build successors constructs the flow of control information for each basic

block, value on which control goes to successor block logically for each

basic block.

• Construct live sets compute input and output variables for each basic

block from the list of operations in the basic block and flow control

information.

• Construct partial order computes the precedence constraints between the

operations in the basic block, from the live sets and operations of the basic

block.

• Generate intermediate form puts the basic block information in a manner

which is suitable for scheduling basic blocks with the existing scheduling

algorithm of SAST, from the live sets and partial order of the basic blocks.

Figure 6.1 Data flow diagram for preprocessor before scheduling

6.1 Live variable analysis

This section computes the incoming and outgoing set of variables for each

basic block in the CDFG, using the flow of control information between basic

blocks and the list of operations in each basic block, built from the sanitization.

Data flow analysis is performed over CDFG to find out the incoming and

outgoing variables of each basic block.

Definition Data flow analysis is a collection of information that summarizes the

creation/destruction of values in a program, used to identify legal optimization

opportunities.

Alternatively, for each point p and each variable v in a program,

determine whether the value of v at p could be used along with some path in the

CDFG starting at p. If so, we say v is live at p; otherwise v is dead at p. For

example, a three address statement in basic block i, with operation j, Oij : x = y +

z, is said to define x and to use y and z, if x, y are not defined in i until operation

Oij. 4 different sets need to be maintained for each basic block i,

usei : set of variables whose values may be used in i prior to any definition of the

variable,

de fi : set of variables being defined in the i prior to any use of that variable in i,

ini : set of variables live at the entry point of i,

outi : set of variables live at the exit point of i.

These sets are used in computing the incoming and outgoing variable set

for i, from the flow of control information and operations list in i.

Computation of use, def sets Simply stating, for each basic block i, usei is the set

of variables used before defined in the basic block i and defi is the set of union all

LHS variables in the basic block i. The inputs to the algorithm 1 which computes

usei and defi sets for each basic block i are total number of basic blocks and the

operations in the basic block and returns the usei and defi sets for each basic block

i from the operations of the basic blocks in the CDFG.

Computation of in, out sets The data flow equations that compute ini and outi

sets for each basic block i, from the usei, defi sets of i and the flow of control

information are:

From equation 1, we say that a variable v is live coming out of a basic

block iff it is live coming into one of its successors. Similarly, using equation 2 a

variable v is live coming into a basic block i if either it is used before redefinition

in i or it is live coming out of i and is not redefined in i. The algorithm to

compute ini and outi sets for each basic block i from the flow of control

information is in algorithm 2.

6.2 Dependency graph extraction

After construction of live sets for each basic block i, partial order or

dependency graph between operations within each basic block i, is to be

constructed. A dependency graph of a basic block consists of nodes representing

functional operators, control operators, and read/write operators corresponding

to I/O interface. Nodes are connected by arcs that represent either the

communication of values or the ordering of I/O operations by dependencies. If a

node N1 computes a value that is used by node N2, then there is a path from N1

to N2. The communication between nodes along the path represents whether the

computed value is actually used. However, in addition to the write-before-write,

read-before-write, and write-before-read dependencies that exist between normal

operations, there exists read-before-read dependencies between operations to an

I/O port, since the values present at a port is changed by the execution sequence

of port operations. So, port operations in a basic block corresponding to the same

port must be having dependency in the dependency graph. With this approach,

the partial order constructed for the set of three address statements in figure 6.2

is shown in figure 6.3, with the incoming variables in= { 3, dx, u, x, y}. Top level

function which performs this task is construct Partial Order, which takes total

number of basic blocks, list of operations in the basic blocks, input and output

variable, sets as input parameters and returns the dependency flow graph

between operations within each basic block.

Figure 6.2 Code for Diffeq

6.3 Intermediate representation

Computation of live variable in and out sets, partial order within each

basic block is completed. Next step is to generate this information of each basic

block which is used for scheduling with SAST, into an intermediate

representation. We have to built the location availability in A-blocks, of the

incoming and outgoing variables sets of the basic blocks in the CDFG, to know

the destination of incoming and outgoing variables required for scheduling with

SAST, is taken as input from the user. Intermediate representation consists of the

incoming and outgoing variable set along with their location availability in A-

blocks, and the partial order within each basic block and also the module library

containing set of operators, with the different kind of operators for each operator.

So, each element j of the ini or outi set of the basic block i is a 2-tuple ,

where v j is the jth variable in the ini or outi and is the availability of

before coming into i or after coming out of i.

Figure 6.3 Partial order for diffeq code in figure6.2

6.4 Implementation

This section gives the details of data structures for the representation of

array operations in the preprocess stage.

The following existing data structure is used to implement preprocessor

that handles array operations.

typedef struct A //struct A holds necessary information to represent basic block { char bid[10]; //basic block id char **ip; //holds input operations in the block int **index; //sanitized indexes of i/p operations int **extra; //holds extra operations int *exin; // constants in the basic block int nopn; //total number of operations in block int nextra; //total extra operations int exnin; // extra inputs for constants int *in; //variables live at the block entry int *out; //variables live at exit of a block int *use; //list of var. being used before defined int *def; //list of var. being defined in this block int *total; //total src opernads in operations int **order; int *npred; //number of predecessors as operations int *ncprd; //number of i/p variables as operands int *pred; int *read; int *write; double *nconst; //constants in the operations int *ncidx; //number of A-blocks where i/p variables being stored int **clist; //A-blocks of i/p variables int *plist; //list of predecessor operations int *pflst; //type of predecessors opn/ var assgmt int *xclist; //A-blocks of outgoing variables int nin,nout,nuse,ndef; //no. of in,out, use and def sets int block; struct B *succ; // successor blocks, on what val control goes }*block;

The following existing functions in the preprocessor has been modified

such that preprocessor handles array operations.

int get_Operand; //gets the operands from the input equations

double add_Operand;

int get_Operator; //gets the operands from the input equations

void add_Operator;

`The following files are taken as input to the preprocessor, basic blocks

information and operations and control steps in the file design.cdfg, architectural

parameters in the file design.arch, and module library information in the file

design.opr.

Chapter 7

Experimentation and Results

7.1 Results

Lift controller behavior is encoded and the corresponding CDFG is

extracted. The preprocessing phase of SAST has enhanced and tested

successfully with lift controller benchmark. The benchmark encoded covers the

all the possible array operations that the preprocessor can handle.

The following input and files are used by processor at it gives output files

as given below,

Inputs: 3 different files

(i) Lift_221.cdfg -- the input CDFG.

(ii) Lift_221.opr -- the module library with the set of operations.

(iii) Lift_221.arch -- the architectural parameter file.

Outputs: 3 different files

1. Lift_2211.po: .po denotes the partial order. This file contains the list of

3A operations in the file " Lift_221.cdfg" represented as a partial order. It

contains the precedence information between the 3A. Also contains

control information between the basic blocks as given in file

“Lift_221.cdfg".

2. Lift_2211.bb: .bb for the basic blocks information. This file contains

architectural parameters (copied from .arch file) followed by the live

variable input and output sets of the basic blocks. Live variable sets also

contain the A-block location availability for both the input and output

3. Lift_2211.opr: contains the module library information as specified in the

file “Lift_221.opr", with the removal of type field for each operator.

Lift controller: The design has a request register in which pending

requests of users stored. This register is n-bit register where n- is the number of

floors. The variables used in this design are:

current -- current floor of the lift carriage -- integer,

req -- bits of this register contain pending requests from the floors -- bit

vector having as many bits as there are floors

scan -- for examining whether there is a pending request from a floor or

not -- used to index the variable "req" -- integer,

atfloor -- a (single) bit -- indicating whether the lift carriage is at

a floor (value 1) or in between two floors (value 0),

move -- a bit -- indicating whether lift is moving (value 1) or

stationary(value 0),

up -- a bit -- indicates whether a lift is moving up (or is to do so) (value1)

or moving down.

doorclosed -- a bit -- indicates whether door is closed.

The detailed results of the Lift design are given in the Lift controller results.

7.2 Conclusions

This work is concerned with the development of a HLS tool that supports

arrays and bit vector data types and operations on them.

In this work we encoded a new bench lift controller mark that contains

array operations and we extracted CDFG from that and also architectural

parameters and module library information. And we introduced array access

and array change notations into DFG such that the corresponding CDFG is to be

supported by preprocessor. We enhanced the preprocessor to support the array

operations.

7.3 Future Work

SAST takes VHDL behavioral description and produces RTL description in

synthesizable Verilog. To make it is more effective and efficient and following

enhancements can be done:

• Array variable clustering can be used in allocation of memories to

array variables resulting in optimization of number of global

memories.

• Compiler optimization like code re-writing techniques, loop

transformation etc can be included in the translation phase or

preprocessing phase which results in reducing the number of

memory operations.

7.4 Lift Controller Results

In this example pending requests from the users stored in req register. The

lift controller stops the lift wherever req bit is set to 1. After it resets that bit to 0.

The CDFG for Lift controller example is as follows. It has 20 basic blocks.

20 B0 2 read (p0, a) read (p1, b) B1 6 current = 0 scan = 0 move = 0 up = 1 doorclose = 1 atfloor = 1 B2 1 req1 = a^b C1 1 req1 == 1 B3 1 req = req1 C2 1 req[scan] == 1 B4 1 move = 1 C3 1 req[current] == 1 C4 1 up == 1 B5 1 current = current+1 B6 1 current = current-1 B7 4 move = 0 doorclose = 0 req[current] = 0 scan = current

C5 1 req == 1 C6 1 up == 1 B8 1 scan = scan+1 C7 1 scan >= 7 B9 1 up = 0 B10 1 scan = scan-1 C8 1 scan <= 0 B11 1 up = 1 20 B0 1 B1 B1 1 B2 B2 1 C1 C1 2 0 B0 1 B3 B3 1 C2 C2 2 0 C5 1 B4 B4 1 C3 C3 2 0 C4 1 B7 C4 2 0 B6 1 B5 B5 1 C3 B6 1 C3 B7 1 C5 C5 2 0 B0 1 C6 C6 2 0 B10 1 B8 B8 1 C7 C7 2 0 C2 1 B9 B9 1 C2 B10 1 C8 C8 2 0 C2 1 B11 B11 0

Module Library Information

Module library information used in the lift controller is given below. It has nine

operators.

9 2 2 2 2 2 2 2 2 2 = 0 5 1 1 5 1 1 [] 0 10 1 1 10 1 1 []= 0 10 1 1 10 1 1 ^ 0 10 1 1 10 1 1 == 1 10 1 1 10 1 1 + 0 10 1 1 10 1 1 - 0 10 1 1 10 1 1

>= 1 10 1 1 10 1 1 <= 1 10 1 1 10 1 1

Architectural Parameters

1 2 2 2 2 0 0 19 B0 0 1 0 0 1 0 1 0 1 0 B1 0 1 0 0 1 0 1 0 1 0 B2 0 1 0 0 1 0 1 0 1 0 C1 0 1 0 0 1 0 1 0 1 0 B3 0 1 0 0 1 0 1 0 1 0 C2 0 1 0 0 1 0 1 0 1 0 B4 0 1 0 0 1 0 1 0 1 0 C3 0 1 0 0 1 0 1 0 1 0 C4 0 1 0 0 1 0 1 0 1 0 B5 0 1 0 0 1 0 1 0 1 0 B6 0 1 0 0 1 0 1 0 1 0 B7 0 1 0 0 1 0 1 0 1 0 C5 0 1 0 0 1 0 1 0 1 0 C6 0 1 0 0 1 0 1 0 1 0 C7 0 1 0 0 1 0 1 0 1 0 B9 0 1 0 0 1 0 1 0 1 0 C8 0 1 0 0 1 0 1 0 1 0 B10 0 1 0 0 1 0 1 0 1 0 B11 0 1 0 0 1 0 1 0 1 0

Basic Block Information

1 2 2 2 2 B0 0 0 2 0 0 1 0 1 1 1 1 B1 2 4 0 1 0 1 1 1 10 2 0 1 11 2 0 1 5 0 0 0 0 1 1 0 1 2 0 1 0 3 0 1 1 5 1 1 3 B2 5 5 0 1 0 1 1 1 2 1 0 3 1 0

5 1 1 4 2 0 0 2 3 1 0 3 5 0 0 4 8 0 1 0 C1 4 5 2 1 0 3 1 1 5 1 0 8 1 0 11 2 0 1 4 2 0 0 0 3 1 0 1 5 0 0 2 8 0 0 3 B3 4 4 2 1 0 3 1 1 5 1 0 8 1 0 4 2 0 0 0 3 1 0 1 5 0 0 2 9 0 1 0 C2 4 5 2 1 0 3 1 1 5 1 0 9 1 0 11 2 0 1 4 2 0 0 0 3 1 0 1 5 0 0 2 9 0 0 3 B4 3 4 2 1 0 5 1 0 9 1 0 11 2 0 1 3 2 0 0 0 5 1 0 1 9 0 0 2 C3 3 4 2 1 0 5 1 1 9 1 0 11 2 0 1 3 2 0 0 0 5 1 0 1 9 0 0 2 C4 3 4

2 1 0 5 1 1 9 1 0 11 2 0 1 3 2 0 0 0 5 1 0 1 9 0 0 2 B5 3 4 2 1 0 5 1 1 9 1 0 11 2 0 1 3 2 0 1 0 5 1 0 1 9 0 0 2 B6 3 4 2 1 0 5 1 1 9 1 0 11 2 0 1 3 2 0 1 0 5 1 0 1 9 0 0 2 B7 2 3 2 1 0 5 1 1 10 2 0 1 4 2 0 0 0 3 1 1 3 5 0 0 1 9 0 1 2 C5 4 5 2 1 0 3 1 1 5 1 0 9 1 0 11 2 0 1 4 2 0 0 0 3 1 0 1 5 0 0 2 9 0 0 3 C6 4 5 2 1 0 3 1 1 5 1 0 9 1 0 11 2 0 1 4 2 0 0 0 3 1 0 1 5 0 0 2 9 0 0 3

B8 4 5 2 1 0 3 1 1 5 1 0 9 1 0 11 2 0 1 4 2 0 0 0 3 0 1 0 5 0 0 2 9 0 0 3 C7 4 5 2 0 3 0 5 0 9 0 12 2 0 1 4 2 0 0 0 3 1 0 1 5 0 0 2 9 0 0 3 B9 3 4 2 1 0 3 1 1 9 1 0 10 2 0 1 4 2 0 0 0 3 1 0 1 5 0 1 0 9 0 0 2 B10 4 5 2 1 0 3 1 1 5 1 0 9 1 0 11 2 0 1 4 2 0 0 0 3 1 1 0 5 0 0 2 9 0 0 3 C8 4 5 2 1 0 3 1 1 5 1 0 9 1 0 10 2 0 1 4 2 0 0 0 3 1 0 1 5 0 0 2 9 0 0 3 B11 0 1 11 2 0 1 0

13 a b current scan move up doorclose atfloor req1 req 0 1 7 2 p0 p1

Partial Information

20 B0 2 read (p0, a) 0 0 1 0 -1 1 0 0 1 0 0 read (p1, b) 0 1 1 0 -1 1 1 0 1 0 1 B1 6 current = 0 0 2 0 0 -1 1 10 0 1 1 2 0 scan = 0 0 3 0 0 -1 1 10 0 1 1 2 0 move = 0 0 4 0 0 -1 1 10 0 1 1 2 0 up = 1 0 5 0 0 -1 1 11 0 1 1 3 0 doorclose = 1 0 6 0 0 -1 1 11 0 1 1 3 0 atfloor = 1 0 7 0 0 -1 1 11 0 1 1 3 0 B2 1 req1 = a^b 3 8 0 0 -1 2 0 1 0 2 0 0 1 C1 1 req1 == 1 4 32767 0 0 -1 2 8 11 0 2 1 3 4 0 B3 1 req = req1 0 9 0 0 -1 1 8 0 1 0 3 C2 1

req[scan] == 1 4 32767 0 0 -1 2 9 3 0 2 1 3 4 0 B4 1 move = 1 0 4 0 0 -1 1 11 0 1 1 3 0 C3 1 req[current] == 1 4 32767 0 0 -1 2 9 11 0 2 1 2 3 0 C4 1 up == 1 4 32767 0 0 -1 2 5 11 0 2 1 1 3 0 B5 1 current = current+1 5 2 0 0 -1 2 2 11 0 2 1 0 3 0 B6 1 current = current-1 6 2 0 0 -1 2 2 11 0 2 1 0 3 0 B7 4 move = 0 0 4 0 0 -1 1 10 0 1 1 2 0 doorclose = 0 0 6 0 0 -1 1 10 0 1 1 2 0 req[current] = 0 0 9 0 0 -1 1 10 0 1 1 2 0 scan = current 0 3 0 0 -1 1 2 0 1 0 0 C5 1 req == 1 4 32767 0 0 -1 2 9 11 0 2 1 3 4 0 C6 1 up == 1 4 32767 0 0 -1 2 5 11 0 2 1 2 4 0 B8 1 scan = scan+1 5 3 0 0 -1 2 3 11 0 2 1 1 4 0 C7 1 scan >= 7 7 32767 0 0 -1 2 3 12 0 2 1 1 4 0 B9 1 up = 0 0 5 0 0 -1 1 10 0 1 1 3 0 B10 1 scan = scan-1 6 3 0 0 -1 2 3 11 0 2 1 1 4 0

C8 1 scan <= 0 8 32767 0 0 -1 2 3 10 0 2 1 1 4 0 B11 1 up = 1 0 5 0 0 -1 1 11 0 1 1 0 0 20 B0 1 B1 B1 1 B2 B2 1 C1 C1 2 0 B0 1 B3 B3 1 C2 C2 2 0 C5 1 B4 B4 1 C3 C3 2 0 C4 1 B7 C4 2 0 B6 1 B5 B5 1 C3 B6 1 C3 B7 1 C5 C5 2 0 B0 1 C6 C6 2 0 B10 1 B8 B8 1 C7 C7 2 0 C2 1 B9 B9 1 C2 B10 1 C8 C8 2 0 C2 1 B11 B11 0

Bibliography

[1] F.L Camposano, R. Saunders, and M.R Tabet, VHDL as input for High –level

Synthesis, proceedings of IEEE Design and Test of Computers, pp.43-49, 1991.

[1] Daniel D.Gajski, Nikil D.Dutt , Allen C-H Wu, and Steve Y-L Lin , High level

synthesis : Introduction to chip and System Design ,Kluwer Academic

Publishers , 1992.

[3] C.R.Mandal ,P.P.Chakrabarti , and S.Ghose , Gabind : a ga approach to

allocation and binding for the high-level synthesis of data paths, IEEE

Transactions on Very Large Scale Integration (VLSI) Systems , vol. 8 ,no. 6 ,pp

747-750 ,2000.

[4] C.R.Mandal, R.M. Zimmer , A Genetic Algorithm for Synthesis of Structured

Data Paths , Proceedings of the 13th International Conference on VLSI Design

[5] Synthesis of Arrays and Records , Pradip K.Jha , Stephen Barn field and

John Weaver ,IBM EDA Lab , Fishkill , NY , USA ,Rudra Mukherjee, Viewlogic

Systems Inc , San Jose, CA ,USA , Reinaldo A.Bergamaschi , IBM T.J Waston

research Center , NY ,USA.

[6] C.R.Mandal , P.P. Chakrabarti , and S.Ghose , Allocation and binding for

data path synthesis using a genetic approach , in proceedings of VLSI design

’96, pp.122-125 ,1996.

[7] Ramachandan , N.Gajski , D.D Chaiyakul , An algorithm for array variable

clustering , in proceedings of EUROASIC , The European Event in ASIC Design

on European Design and Test Conference ,1994 .

[8] M. Rahmouni and A. A. Jerraya, “Formulation and evaluation of scheduling

techniques for control flow graphs”, in Proceedings of EuroDAC'95, (Brighton),

pp. 386.391, 18-22 September 1995.

[9] C. Tseng and D.P Siewiorek, FACET : A procedure for the Automated

Synthesis of Digital Systems , 20th Design Automation Conference ,1983.

[10] Holmes, N.D. Gajski , D.D Architectural exploration for data paths with

memory hierarchy , in Proceedings of ED & TC on European Design and Test

Conference,1995.

[11] R. Camposano, “Path-based scheduling for synthesis”, IEEE transactions on

computer-Aided Design of Integrated Circuits and Systems, vol. Vol 10 No 1, pp.

85.93, Jan. 1991.

[12] Herman Schmit, Donald E. Thomas, Synthesis of application-specific

memory designs, in proceedings of IEEE Transactions on VLSI Systems, 1997.

[13] Peeter Ellervee ,Ahmed Hemani , Bengt Sventesson , High level Synthesis of

Control and Memory Intensive Applications , in Proceedings of IEEE

International Conference 1995.

representing bit vectors and arrays in control data flow graph …pabitra/facad/06cs6007t.pdf ·...

Documents

advisor ： dr. hsu graduate ： ching-lung chen author ...

scalars and vectors · the vectors a and b represent...

vectors, matrices, rotations€¦ · vectors, matrices,...

development of exploitation framework for vulnerability...

position vectors & force vectors

viral vectors may 8. viral vectors herpes simplex virus...

programming and data structure instructors: sujoy ghosh...

2009.07.09 integrated facad design mark perepelitza

section 9.2 vectors goals goals introduce vectors. introduce...

vectors readings: chapter 3. vectors vectors are the objects...

vectors...vectors and scalars 22/09/2018 scalars vectors...

curriculum vita of pabitra mohan khilardr. pabitra mohan...

programming and data structure cs13002 pabitra mitra dept....

www.mathsrevision.com nat 5 vectors vectors and scalars...

vectors: displacement and velocity. vectors examples of...

extracting finite state machine with datapath models...

vector algebra · coinitial vectors two or more vectors...

computing covariant lyapunov vectors, oseledets vectors

máca parasites vectors parasites & vectors

acquisition of bilingual comparable corpora - iit...