representing bit vectors and arrays in control data flow graph …pabitra/facad/06cs6007t.pdf ·...
Post on 15-Oct-2020
5 Views
Preview:
TRANSCRIPT
Representing Bit Vectors and Arrays in Control Data Flow Graph for High Level Synthesis
A thesis submitted in partial fulfillment of the requirements
for the degree of
Master of Technology in
Computer Science and Engineering by
Chaithanya Kiran Kona (06CS6007)
Under the guidance of Dr. Dipankar Sarkar
and Dr. Chittaranjan Mandal
Dept. of Computer Science and Engineering Indian Institute of Technology
Kharagpur
April 2008
Dept. of Computer Science and Engineering Indian Institute of Technology Kharagpur
Certificate May, 2008
This is to certify that the thesis entitled “Representing Bit Vectors and Arrays in Control Data
Flow Graph for High Level Synthesis” submitted to the Department of Computer Science and
Engineering, Indian Institute of Technology, Kharagpur by Chaithanya Kiran Kona (Roll No.
06CS6007) for the partial fulfillment of the requirements for the award of degree of Master of
Technology in Computer Science and Engineering is a bonafide record of the work carried out
by him under my supervision and guidance. The research report and results embodied in this
thesis have not been submitted for any other degree or diploma in any other University or
Institute.
.……….……………………….. .……….……………………….. Prof. C.R.Mandal Prof. Dipankar Sarkar Dept. of Computer Science and Engineering Dept. of Computer Science and Engineering Indian Institute of Technology Indian Institute of Technology Kharagpur – 721302 Kharagpur – 721302
Acknowledgement
I avail this unique opportunity to express my gratitude and indebtedness to my project
supervisors Prof. Dipankar Sarkar and Prof. C.R.Mandal, Department of Computer Science And
Engineering, Indian Institute of Technology, Kharagpur, for their sustained interest, advises,
perpetual encouragement and thoughtful constructive criticisms during the course of the
investigation and preparation of the manuscript.
I am sincerely grateful to Prof. Indranil Sen Gupta, Professor and Head, Department of
Computer Science and Engineering, Indian Institute of Technology, Kharagpur for providing all
necessary facilities for the successful completion of my project.
I further acknowledge Chandan Karfa for his support to enhance the features of the
synthesis tool. I would like to extend my heartfelt thanks to my friends for their support and help
to overcome the difficulties by always being with me in my ups and downs during the project.
I would also like to thank the supporting staff of Department of Computer Science and
Engineering for their timely help and assistance.
Abstract
High-Level Synthesis (HLS) comprises translating a behavioral specification into
its corresponding Register Transfer Level (RTL) specification of the system. Structured
Architecture Synthesis Tool (SAST) takes the behavioral description of an input design
and outputs the synthesizable RTL Verilog code. This work involves enhancing the
SAST by adding interfaces for the verifier.
The first step of HLS involves deriving control and data-path information from
the behavioral code into a Control Data Flow Graph (CDFG). We added array access
notations to the existing CDFG representation. Using that representation we can
schedule the array operations.
The preprocessor converts the input CDFG into intermediate representation (IR)
which consists of the precedence constraints or partial order between the operations in
each basic block, along with the incoming (in) and outgoing (out) variables set for each
basic block. We implemented preprocessing stage of SAST in order to handle the array
operations. This generated intermediate representation is used as input to the
scheduler.
Key words: High-Level synthesis, DFG, CDFG, RTL, Preprocessor
i
Contents
1 Introduction 2
1.1 High-level Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Contributions of the Present Work. . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Organization of thesis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Related Works 6
3 Structured Architecture Synthesis Tool (SAST) 8
3.1 Target Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Features of SAST. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4 Data Flow Graphs 16
4.1 Data Flow Graphs (DFG) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2 Data Flow Graphs for Array Operations. . . . . . . . . . . . . . . . . . . 20
4.2.1 Array access notation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.2.2 Array change notation. . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5 Control Data Flow Graphs (CDFG) 24 5.1 CDFG Representation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.2 Example Representation of CDFG. . . . . . . . . . . . . . . . . . . . . . . . . 26
6 Preprocessor 29 6.1 Live variable analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
ii
6.2 Dependency Graph Extraction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 6.3 Intermediate representation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.4 Implementation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
7 Experimentation and Results 40 7.1 Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
7.2 Conclusions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
7.3 Future Work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
7.4 Lift controller results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Bibliography 52
1
List of Figures
3.1 Hand-in-hand synthesis and verification . . . . . . . . . . . . . . . . . . . . 10
3.2 Schematic of structured architecture . . . . . . . . . . . . . . . . . . . . . . . . 13
4.1 a basic block in C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2 The basic block in single-assignment form . . . . . . . . . . . . . . . . . . . 17
4.3 An extended data flow graph for sample block. . . . . . . . . . . . . . . 18
4.4 Standard data flow graph for sample basic block. . . . . . . . . . . . . 19
4.5 Array access notation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.6 Array change notation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.7 flow chart for lift controller. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.8 Data flow graph for lift controller. . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.1 CDFG representation of GCD. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.2 CDFG representations with array operations . . . . . . . . . . . . . . . . 28
6.1 Data flow diagram for preprocessor before scheduling. . . . . . . . 30
6.2 Code for diffeq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.3 Partial order for diffeq code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2
Chapter 1
Introduction
1.1 High-level Synthesis
Synthesis is the process of translating a behavioral description into a
structural description. Synthesis is also defined as the process of
interconnecting primitive components at a certain level of abstraction (target
level) to realize a specification at a higher level of abstraction (source level).The
transformation of the design at source level is carried out to achieve some
predefined performance goals or constraints.
Several levels of the synthesis process are system synthesis, high level
synthesis, logic synthesis and layout synthesis. High level synthesis (HLS) takes
algorithmic or high level behavior as input and outputs the register transfer level
(RTL) description consisting of functional units, storage and interconnecting
units. Individual synthesis systems cater to different constraint goal sets. Typical
user constraints are area, clock speed and power.
The high-level synthesis (HLS) process consists of translating a behavioral
specification into an RTL structural description containing a data path and a
controller so that the data transfers under the control of the controller exhibit the
specified behavior. The input behavioral description to the HLS is represented as a
3
control data flow graph (CDFG) and the output of the HLS is a structural data
path of interconnected components and a controller. The synthesis process
consists of several interdependent sub-tasks such as, scheduling, allocation, and
binding and controller design. The operations in the behavioral description are
assigned control steps through a scheduling process. Each control step is the
basic time unit of the synchronous digital system. The allocation process
computes the minimum number of functional units and registers required to
synthesize the design based on the scheduling information of the operations and
the operators available in the component library. The variables are mapped to
registers and the operations are mapped to functional units by the binding
process. Finally, the controller is designed based on the data transfer required
among the data path elements in different control steps.
1.2 Contributions of the Present Work
The goal of present work is related to incorporating arrays and bit vectors
into SAST. SAST takes behavioral description having array variables as input
and produces the synthesizable RTL verilog as the output. It takes resource
library and user architectural constraints as additional inputs.
The features that are already there in the existing system are follows.
• A compiler that converts the VHDL behavioral specification into an
intermediate form (CDFG). This CDFG is later processed to extract the
dependencies between the operations which will be used in scheduling
phase.
4
• A GA based scheduling algorithm, which takes the behavioral description
in the form of a CDFG. Each operation in the CDFG is represented in three
address code. An operator library and the architectural constraints are
taken as additional inputs to the scheduling algorithm.
• To reduce the number of control steps and the resource requirement, a
method of handling the variable assignments has been devised which is
different from that of handling operations.
• Construction of the data path and controller for the scheduled input
design and generating the synthesizable RTL verilog for both, the data
path and the controller.
• A system to verify the results generated at each step of synthesis process
is implemented. It takes the two FSMDs from two steps of synthesis
process as input and finds the equivalence between them.
• To increase the performance, operation chaining feature was added to the
system on a per block basis based on a delay model.
To make SAST a full-fledged high level synthesis system that handles array
operations, enhancements and extensions have been carried out over the existing
system through the present work, these are as follows.
• We encoded lift controller behavior is encoded that contains array
operations and the corresponding CDFG is extracted which is used as
input to the SAST.
5
• We added array access and change notations to the existing CDFG
representation to handle the array operations.
• We enhanced the preprocessing phase of SAST by introducing array
access and change notations in the DFG (data flow graph).
1.3 Organization of thesis
The thesis for the present work is organized as follows:
• Chapter 1 is introduction, discussion about HLS, the phases involved in
HLS, basic block representation which involves operations on data arrays,
scheduler which assigns control steps to the memory operations along
with arithmetic and logic operations.
• Chapter 2 discusses related work referenced.
• Chapter 3 describes about the Structured Architecture Synthesis Tool
(SAST).
• Chapter 4 describes the CDFG representation for array operations.
• Chapter 5describes the CDFG representation for array operations.
• Chapter 6 describes preprocessing information of the array operations.
• Chapter 7 has results after experimentation along with conclusion and
future work.
6
Chapter 2
Related Works
The use of arrays and records in modern hardware-description languages
(HDL) allows designs to be modeled at very high levels of abstraction. Many
behavioral descriptions for manipulating large amounts of data computations
use array variables to represent data storages.
Aggregate data types such as arrays are often used in describing designs
at higher-levels of abstraction. These complex types are useful for grouping
related data into a single object, which makes the description more readable and
concise. From a design style point of view, the use of aggregate types makes the
design description more maintainable.
Another reason for the frequent usage of arrays originates from their wide
range usage in software languages. Many methodologies today incorporate the
use of 'C' for algorithmic development. This 'C' description is then translated to
an HDL (e.g., VHDL or Verilog) for synthesis. The arrays used in the 'C'
7
description can be mapped directly to arrays in the HDL. Thus, it is important to
be able to synthesize these data types efficiently.
In the existing SAST tool there are no arrays and bit vectors. We want
to enhance the synthesis tool, SAST by incorporating arrays and bit vectors.
8
Chapter 3
Structured Architecture Synthesis
Tool (SAST)
To deal with the increasing complexity of today's VLSI designs the use of
high level synthesis systems becomes increasingly crucial. Several HLS systems
like Maha [1], HAL [2], STAR [3], SAM [4], and GABIND [5] are now available to
support the HLS of digital systems. Over the last several years, these systems
have evolved from elementary systems producing non-optimized data paths to
more sophisticated systems generating data paths optimized with respect to area,
time, power and testability. With the advancement of the VLSI circuit
technology, a rapid scaling of the feature size has been performed. Device scaling
implies that the circuit performance will be increasingly determined by the
interconnection performance. For instance, interconnection contributes 50
percent of total delay in 0.35 micron technology whereas it is expected to rise up
to 70 percent in 0.25 micron technology. Thus, interconnections are expected to
9
play the most critical role in design of chips in deep sub-micron technologies.
The development of FPGA has also taken place around this time. These are now
becoming attractive platforms for prototyping designs, simulation acceleration,
hardware in loop simulation etc. To the best of our knowledge most HLS tools,
however, produce optimized designs in terms of resources, time steps, power,
area etc. without much emphasis on reduction of long and random
interconnections.
SAST is a genetic algorithm based HLS tool designed to synthesize the
behavioral specification into a structured architecture frame work .This tool is
depicted in figure 1. This structured architecture (SA) leads to interconnect
optimization. The schematic diagram of the SA is shown in figure 2. The
datapath is organized as architectural blocks (A-block). Each a block has a local
functional unit (FU), local storage and internal interconnections, as shown in
figure 2. SA also permits the use of memories as architectural components.
These are connected to the global buses like the A-blocks. These structured data
paths avoid random interconnections between datapath elements. Each A-block
has a simple implementation. This makes the generated design easy to
implement on programmable devices such as FPGAs.
10
3.1 Target Architecture
Figure 3.1 Hand-in-hand synthesis and verification
Structured Architecture Synthesis Tool (SAST) essentially takes the behavioral
description of an input design in the form of three address code, and outputs the
synthesizable RTL Verilog code. The generated data path is organized as
architectural blocks (A-block). Each A-block has a local functional unit (FU), local
storage and local buses (also called as access links). All the A-blocks in a design
are interconnected by a number of global buses. Other than the local memories in
all A-blocks, SAST also permits the use of global memories as architectural
components. These memories are similar to an A-block, except that it does not
contain any functional unit in it. These memories can be accessed globally by all
the A-blocks. These external memories are connected to all A-blocks by global
11
buses. The global memory units in the structured architecture play an important
role as a convenient interface for the system. While it may be difficult to initialize
a specific storage location within an A-block, it is considerably easier to store
initial operands and retrieve final results from the global memory units. Global
memories help improve the availability of operands and relieve the storage
requirement in individual A-blocks.
The schematic diagram of the Structured Architecture (SA) is shown in
Figure 2. All the data path components are of the same width. That is, the local
buses, storage units, functional units in the A-blocks and the global buses have
same width.
There are input/output ports that are connected to global buses, so that all
the A-blocks can access any of the ports. Each A-block has local memory as
register bank, which are connected to global buses through internal buses (access
links). And each A-block has one functional unit (FU), which takes input from
either local memory, or from internal buses. The output from the functional unit
sends back either to the register bank or to internal buses. Switches are there in
the design to enable/disable the connection between any two components in the
A-block. The group of the switches, which connects the internal buses and the
output of FU to the input ports of registers, are called as in-switches. The group
of switches, which connects the output of registers and output of FU to internal
buses, are called as out-switches. Global buses are connected to input ports of FU
through internal buses and in-switches. Output port of the FU is connected to the
global buses through internal buses and out-switches. The schematic diagram of
an A-block is shown in the Figure 3.
The phases in the synthesis tool are VHDL translation, scheduling,
Allocation and binding, controller generation and RTL code generation.
12
In SAST, after extraction of CDFG from the VHDL we have to
construct the live variables carrying data across several basic blocks and the
partial order between operations for each basic block, this module is called
as preprocessor which produces the intermediate representation (IR) necessary
for scheduling the design specified in CDFG.
The correctness of HLS process is verified in three phases. The phase-I
verifies the scheduling process. This phase is also called as scheduling verification.
The input of this phase is the CDFG and the output is the scheduled behavior. In
phase-II, the datapath generated after allocation and binding is verified against
the scheduled behavior. We will verify the registers sharing among the variables
of the input specification. This phase is called as datapath verification. In phase-III,
the controller will be verified against the data path. This phase is called controller
verification. Verification task of this phase involves checking the correctness of the
control signals.
13
Figure 3.2 Schematic of structured architecture.
The structure of the data path is characterized by a set of architectural
constraints like the number of A-blocks, the number of global memories, the
number of global buses interconnecting the A-blocks, the number of access links
or access width connecting an A-block to the global buses and the maximum
number of writes per time step to storage locations in an A-block. The
architectural parameters which are internal to an A-block (e.g. number of
accesslinks and number of write ports to internal memory, etc.) are same to each
A-block. These structured data paths avoid random interconnects between data
path elements. Each A-block has a simple implementation. This makes the
generated design easy to implement on programmable devices such as FPGAs.
14
3.2 Features of SAST
Reduction in Interconnection Cost: There are many high level synthesis tools
currently available in the market. But all of the present tools try to produce the
optimal RTL with the random interconnections among the data path components
which raise the interconnection cost while fabricating the design. Field
programmable gate arrays (FPGA) are naturally attractive for prototyping the
design generated by high level synthesis (HLS). Programmable devices tend to
have limited wiring resources between the data path elements so the designs
implementing on such devices required avoiding the long-distance
interconnections. We used a structured architecture for HLS which produces the
predictable interconnections among the data path components. This causes low
interconnection cost in the design.
VHDL to CDFG Parser: SAST has a parser which converts VHDL
behavioral specification to CDFG which is an intermediate form. This
intermediate form is given as an input for processing phase for extracting
dependencies between the operations and the control flow between the basic
blocks.
Scheduling: SAST uses genetic algorithm based scheduler for scheduling
the input design. SAST supports both time constrained and resource constrained
scheduling. Resource information and the maximum number of control steps the
scheduler can take to schedule for each basic block are provided to the scheduler
as input constraints. It also handles multi-cycle and pipelined functional units.
Register allocation and binding: After scheduling is completed, the next step
is live variable analysis and register allocation for each A-block. SAST uses
minimum number of registers to store the intermediate values in the design. It
15
uses register interconnection optimization (RIO), which reduces the number of
interconnections and switches in the design.
Data path Generation: Data path for each A-block consists of functional
unit, a register bank and access links, which connects A-blocks to the global
buses. SAST uses structured architecture (SA) in the data path, which reduces the
interconnection length between the data path components. SAST uses minimum
number of buses to schedule the input design.
RTL Generation: The final output from SAST is the RTL description in
verilog. It generates the synthesizable verilog code for both the data path and the
control path.
16
Chapter 4
Data Flow Graphs
4.1 Data Flow Graphs
A data flow graph is a model of a program with no conditionals. In a high-
level programming language, a code segment with no conditionals-more
precisely, with only one entry and exit point-is known as a basic block. Figure 4.1
below shows a simple basic block. As the C code is executed, we would enter this
basic block at the beginning and execute all the statements.
w = a + b;
x = a - c;
y = x+ d;
x = a + c;
z= y + e;
Figure 4.1 a basic block in C
17
Before we are able to draw the data flow graph for this code we need to
modify it slightly. There are two assignments to the variable x- it appears twice
on the left side of an assignment. We need to rewrite the code in single-
assignment form, in which a variable appears only once on the left side. Since
our specification is C code, we assume that the statements are executed
sequentially, so that any use of a variable refers to its latest assigned value. In
this case, x is not reused in this block (presumably it is used elsewhere), so we
just have to eliminate the multiple assignment to x. The result is shown in Figure
4.2 below, where we have used the names x1 and x2 to distinguish the separate
uses of x.
w = a + b;
x1 = a - c;
y = x1 + d;
x2 = a + c;
z = y + e;
Figure 4.2 The basic block in single-assignment form
The single-assignment form is important because it allows us to identify a
unique location in the code where each named location is computed. As an
introduction to the data flow graph, we use two types of nodes in the graph
round nodes denote operators and square nodes represent values. The value
nodes may be either inputs to the basic block, such as a and b, or variables
assigned to within the block, such as w and x1. The data flow graph for our
single-assignment code is shown in Figure 4.3 below.
18
The single-assignment form means that the data flow graph is acyclic if
we assigned to x multiple times, then the second assignment would form a cycle
in the graph including x and the operators used to compute x. Keeping the data
flow graph acyclic is important in many types of analyses we want to do on the
graph.
Figure 4.3 an extended data flow graph for sample block
The data flow graph is generally drawn in the form shown in Figure 4.4
below. Here, the variables are not explicitly represented by nodes. Instead, the
19
edges are labeled with the variables they represent. As a result, a variable can be
represented by more than one edge. However, the edges are directed and all the
edges for a variable must come from a single source. We use this form for its
simplicity and compactness.
Figure 4.4 Standard data flow graph for sample basic block
The data flow graph for the code makes the order in which the operations
are performed in the C code much less obvious. This is one of the advantages of
the data flow graph. We can use it to determine feasible reordering of the
20
operations, which may help us to reduce pipeline or cache conflicts. We can also
use it when the exact order of operations simply doesn't matter. The data flow
graph defines a partial ordering of the operations in the basic block. We must
ensure that a value is computed before it is used, but generally there are several
possible orderings of evaluating expressions that satisfy this requirement.
4.2 Data Flow graphs for Array Operations
We introduced array access and array change notations to the data flow
graphs such that they can represent array operations. From these data flow
graphs we can construct control data flow graph (CDFG) which is used as the
input to the preprocessor.
4.2.1 Array access notation
The array access notation can be represented in data flow graph as shown
in the figure 4.5. Consider the array variable X[i],
Figure 4.5 data flow graph for Array access notation
4.2.1 Array change notation
Consider the array operation shown below. It can be represented in data
flow graph using array change notation as shown in the figure 4.6.
[ ]
i X
X[i]
Array access notation
21
X[i] = Y
Figure 4.6 data flow graph for Array change notation
Figure 4.7 shows the flow chart for lift controller that includes array operations
and figure 4.8 shows the corresponding data flow graph for the lift controller.
[ ]= X
i Y
Array change notation
22
Figure 4.7 flow chart for lift controller
req = a ^ b
current = 0 scan = 0 move = 1 up = 1 doorclose = 1 atfloor =1
req == 1
req[scan] == 1
move = 1
req[current] == 1
up == 1 move = 0 doorclose =0 req[current]=0 scan = current
current = current+1 current = current-1
scan <= 0
scan >= 7
up == 1
up = 1
scan = scan-1
up = 0
scan = scan+1
23
^
==
- +
[ ]
=
a b
req 1
req scan
req[scan]
==
1
move
req current
==
[ ] req[current]
1
up
== [ ]=
0
req
up 1
== scan
+ -
Figure 4.8 data flow graph for lift controller
24
Chapter 5
Control Data Flow Graphs
Most of the HLS systems require the input in form of a data flow graph to
be used for all the phases. Manually writing a data-flow specification along with
the control parameters is time consuming and error-prone. So instead behavioral
design specification at very high abstraction level is provided as input to
systems. The designs are coded in high-level languages like C or hardware
design languages like VHDL, Verilog etc. Translation scheme involves the
methodologies involved in translating the high-level specification into CDFG.
The methodologies involved are similar to one in the front-end of a typical
compiler flow. In this section we give details of the CDFG representation used,
VHDL subset and the methodology for the translation scheme.
5.1 CDFG representation
The CDFG representation used for SAST is block based. The Control and
Data Flow Graph is a directed graph that can be represented as B = (V, E). The
nodes v є V represent a Basic block. The data dependency is maintained in each
basic block. Moreover each basic block can further be represented by a data flow
25
graph. Here each node of the data flow graph is a three-address instruction
specifying the appropriate operation to be performed. The basic blocks showing
branching have single conditional statements representing conditional constructs
like IF, CASE or LOOP constructs. Thus the directed edges in the CDFG
represent transfer of a value or control from one node to another.
The interpretation of B is imperative: an operation is executed after one of
its predecessors is executed. We have notions for reading and writing data from
ports namely as READ and WRITE operations, included into the CDFG
representation. There is a directed edge from block bi to bj , if bj immediately
follows bi in some execution sequence; that is if,
• there is a conditional or unconditional jump from the last statement
of bi to the first statement of bj , or
• bj immediately follows bi in the order of the program, and bi does
not end in an unconditional jump.
We say that bi is a predecessor of bj, and bj is a successor of b j . Basic
block is represented by a record consisting of a number of three address
statements in the basic block, list of three address statements, and by the list of
logical successors of the basic block. Both source and result variables of the
operations in the CDFG, are converted into a sanitized form, if variable v is in the
operation of a
CDFG, from the symbol table if v is at ith position, sanitized operation
contains as i. In translating a functional specification to a register transfer level
(RTL) design, one needs to know the data dependency information in each basic
block i to do scheduling and data-path allocation. Within each basic block this
dependency is preserved between the operations.
26
5.2 Example Representation of CDFG
The figure 5.1 is the CDFG representation for the GCD behavioral
description. From the figure we can say that B0 is the initial block and B3 is the
final block.
Figure 5.1 CDFG representation of GCD
27
In B0, from read port we are reading the value of y1 and y2 and final output is
writing in block B3.
20 B0 2 read (p0, a) read (p1, b) B1 6 current = 0 scan = 0 move = 0 up = 1 doorclose = 1 atfloor = 1 B2 1 req1 = a^b C1 1 req1 == 1 B3 1 req = req1 C2 1 req[scan] == 1 B4 1 move = 1 C3 1 req[current] == 1 C4 1 up == 1 B5 1 current = current+1 B6 1 current = current-1 B7 4 move = 0 doorclose = 0 req[current] = 0 scan = current C5 1 req == 1 C6 1 up == 1 B8 1 scan = scan+1
28
C7 1 scan >= 7 B9 1 up = 0 B10 1 scan = scan-1 C8 1 scan <= 0 B11 1 up = 1 20 B0 1 B1 B1 1 B2 B2 1 C1 C1 2 0 B0 1 B3 B3 1 C2 C2 2 0 C5 1 B4 B4 1 C3 C3 2 0 C4 1 B7 C4 2 0 B6 1 B5 B5 1 C3 B6 1 C3 B7 1 C5 C5 2 0 B0 1 C6 C6 2 0 B10 1 B8 B8 1 C7 C7 2 0 C2 1 B9 B9 1 C2 B10 1 C8 C8 2 0 C2 1 B11 B11 0
Figure 5.2 CDFG representations with array operations
29
Chapter 6
Preprocessor
The preprocessor converts the input CDFG into intermediate
representation(IR) which consists of the precedence constraints or partial order
between the operations in each basic block, along with the incoming(in) and
outgoing(out) variables (location availability in the A-blocks) set for each basic
block. Data flow diagram of the preprocessor, consisting of several subtasks is
shown in figure 6.1. Each component in figure 6.1 is explained briefly as follows,
• Input operations get the list of operations for each basic block in the
CDFG, constructs symbol table of variables in the CDFG.
• Build successors constructs the flow of control information for each basic
block, value on which control goes to successor block logically for each
basic block.
• Construct live sets compute input and output variables for each basic
block from the list of operations in the basic block and flow control
information.
30
• Construct partial order computes the precedence constraints between the
operations in the basic block, from the live sets and operations of the basic
block.
• Generate intermediate form puts the basic block information in a manner
which is suitable for scheduling basic blocks with the existing scheduling
algorithm of SAST, from the live sets and partial order of the basic blocks.
Figure 6.1 Data flow diagram for preprocessor before scheduling
31
6.1 Live variable analysis
This section computes the incoming and outgoing set of variables for each
basic block in the CDFG, using the flow of control information between basic
blocks and the list of operations in each basic block, built from the sanitization.
Data flow analysis is performed over CDFG to find out the incoming and
outgoing variables of each basic block.
Definition Data flow analysis is a collection of information that summarizes the
creation/destruction of values in a program, used to identify legal optimization
opportunities.
Alternatively, for each point p and each variable v in a program,
determine whether the value of v at p could be used along with some path in the
CDFG starting at p. If so, we say v is live at p; otherwise v is dead at p. For
example, a three address statement in basic block i, with operation j, Oij : x = y +
z, is said to define x and to use y and z, if x, y are not defined in i until operation
Oij. 4 different sets need to be maintained for each basic block i,
usei : set of variables whose values may be used in i prior to any definition of the
variable,
de fi : set of variables being defined in the i prior to any use of that variable in i,
ini : set of variables live at the entry point of i,
outi : set of variables live at the exit point of i.
These sets are used in computing the incoming and outgoing variable set
for i, from the flow of control information and operations list in i.
32
Computation of use, def sets Simply stating, for each basic block i, usei is the set
of variables used before defined in the basic block i and defi is the set of union all
LHS variables in the basic block i. The inputs to the algorithm 1 which computes
usei and defi sets for each basic block i are total number of basic blocks and the
operations in the basic block and returns the usei and defi sets for each basic block
i from the operations of the basic blocks in the CDFG.
Computation of in, out sets The data flow equations that compute ini and outi
sets for each basic block i, from the usei, defi sets of i and the flow of control
information are:
From equation 1, we say that a variable v is live coming out of a basic
block iff it is live coming into one of its successors. Similarly, using equation 2 a
variable v is live coming into a basic block i if either it is used before redefinition
in i or it is live coming out of i and is not redefined in i. The algorithm to
compute ini and outi sets for each basic block i from the flow of control
information is in algorithm 2.
33
34
35
6.2 Dependency graph extraction
After construction of live sets for each basic block i, partial order or
dependency graph between operations within each basic block i, is to be
constructed. A dependency graph of a basic block consists of nodes representing
functional operators, control operators, and read/write operators corresponding
to I/O interface. Nodes are connected by arcs that represent either the
communication of values or the ordering of I/O operations by dependencies. If a
node N1 computes a value that is used by node N2, then there is a path from N1
to N2. The communication between nodes along the path represents whether the
computed value is actually used. However, in addition to the write-before-write,
read-before-write, and write-before-read dependencies that exist between normal
operations, there exists read-before-read dependencies between operations to an
I/O port, since the values present at a port is changed by the execution sequence
of port operations. So, port operations in a basic block corresponding to the same
port must be having dependency in the dependency graph. With this approach,
the partial order constructed for the set of three address statements in figure 6.2
is shown in figure 6.3, with the incoming variables in= { 3, dx, u, x, y}. Top level
function which performs this task is construct Partial Order, which takes total
number of basic blocks, list of operations in the basic blocks, input and output
variable, sets as input parameters and returns the dependency flow graph
between operations within each basic block.
36
Figure 6.2 Code for Diffeq
6.3 Intermediate representation
Computation of live variable in and out sets, partial order within each
basic block is completed. Next step is to generate this information of each basic
block which is used for scheduling with SAST, into an intermediate
representation. We have to built the location availability in A-blocks, of the
incoming and outgoing variables sets of the basic blocks in the CDFG, to know
the destination of incoming and outgoing variables required for scheduling with
SAST, is taken as input from the user. Intermediate representation consists of the
incoming and outgoing variable set along with their location availability in A-
blocks, and the partial order within each basic block and also the module library
containing set of operators, with the different kind of operators for each operator.
So, each element j of the ini or outi set of the basic block i is a 2-tuple ,
where v j is the jth variable in the ini or outi and is the availability of
before coming into i or after coming out of i.
37
Figure 6.3 Partial order for diffeq code in figure6.2
38
6.4 Implementation
This section gives the details of data structures for the representation of
array operations in the preprocess stage.
The following existing data structure is used to implement preprocessor
that handles array operations.
typedef struct A //struct A holds necessary information to represent basic block { char bid[10]; //basic block id char **ip; //holds input operations in the block int **index; //sanitized indexes of i/p operations int **extra; //holds extra operations int *exin; // constants in the basic block int nopn; //total number of operations in block int nextra; //total extra operations int exnin; // extra inputs for constants int *in; //variables live at the block entry int *out; //variables live at exit of a block int *use; //list of var. being used before defined int *def; //list of var. being defined in this block int *total; //total src opernads in operations int **order; int *npred; //number of predecessors as operations int *ncprd; //number of i/p variables as operands int *pred; int *read; int *write; double *nconst; //constants in the operations int *ncidx; //number of A-blocks where i/p variables being stored int **clist; //A-blocks of i/p variables int *plist; //list of predecessor operations int *pflst; //type of predecessors opn/ var assgmt int *xclist; //A-blocks of outgoing variables int nin,nout,nuse,ndef; //no. of in,out, use and def sets int block; struct B *succ; // successor blocks, on what val control goes }*block;
39
The following existing functions in the preprocessor has been modified
such that preprocessor handles array operations.
int get_Operand; //gets the operands from the input equations
double add_Operand;
int get_Operator; //gets the operands from the input equations
void add_Operator;
`The following files are taken as input to the preprocessor, basic blocks
information and operations and control steps in the file design.cdfg, architectural
parameters in the file design.arch, and module library information in the file
design.opr.
40
Chapter 7
Experimentation and Results
7.1 Results
Lift controller behavior is encoded and the corresponding CDFG is
extracted. The preprocessing phase of SAST has enhanced and tested
successfully with lift controller benchmark. The benchmark encoded covers the
all the possible array operations that the preprocessor can handle.
The following input and files are used by processor at it gives output files
as given below,
Inputs: 3 different files
(i) Lift_221.cdfg -- the input CDFG.
(ii) Lift_221.opr -- the module library with the set of operations.
(iii) Lift_221.arch -- the architectural parameter file.
Outputs: 3 different files
1. Lift_2211.po: .po denotes the partial order. This file contains the list of
3A operations in the file " Lift_221.cdfg" represented as a partial order. It
contains the precedence information between the 3A. Also contains
41
control information between the basic blocks as given in file
“Lift_221.cdfg".
2. Lift_2211.bb: .bb for the basic blocks information. This file contains
architectural parameters (copied from .arch file) followed by the live
variable input and output sets of the basic blocks. Live variable sets also
contain the A-block location availability for both the input and output
sets.
3. Lift_2211.opr: contains the module library information as specified in the
file “Lift_221.opr", with the removal of type field for each operator.
Lift controller: The design has a request register in which pending
requests of users stored. This register is n-bit register where n- is the number of
floors. The variables used in this design are:
current -- current floor of the lift carriage -- integer,
req -- bits of this register contain pending requests from the floors -- bit
vector having as many bits as there are floors
scan -- for examining whether there is a pending request from a floor or
not -- used to index the variable "req" -- integer,
atfloor -- a (single) bit -- indicating whether the lift carriage is at
a floor (value 1) or in between two floors (value 0),
move -- a bit -- indicating whether lift is moving (value 1) or
stationary(value 0),
up -- a bit -- indicates whether a lift is moving up (or is to do so) (value1)
or moving down.
doorclosed -- a bit -- indicates whether door is closed.
42
The detailed results of the Lift design are given in the Lift controller results.
7.2 Conclusions
This work is concerned with the development of a HLS tool that supports
arrays and bit vector data types and operations on them.
In this work we encoded a new bench lift controller mark that contains
array operations and we extracted CDFG from that and also architectural
parameters and module library information. And we introduced array access
and array change notations into DFG such that the corresponding CDFG is to be
supported by preprocessor. We enhanced the preprocessor to support the array
operations.
7.3 Future Work
SAST takes VHDL behavioral description and produces RTL description in
synthesizable Verilog. To make it is more effective and efficient and following
enhancements can be done:
• Array variable clustering can be used in allocation of memories to
array variables resulting in optimization of number of global
memories.
• Compiler optimization like code re-writing techniques, loop
transformation etc can be included in the translation phase or
43
preprocessing phase which results in reducing the number of
memory operations.
7.4 Lift Controller Results
In this example pending requests from the users stored in req register. The
lift controller stops the lift wherever req bit is set to 1. After it resets that bit to 0.
The CDFG for Lift controller example is as follows. It has 20 basic blocks.
20 B0 2 read (p0, a) read (p1, b) B1 6 current = 0 scan = 0 move = 0 up = 1 doorclose = 1 atfloor = 1 B2 1 req1 = a^b C1 1 req1 == 1 B3 1 req = req1 C2 1 req[scan] == 1 B4 1 move = 1 C3 1 req[current] == 1 C4 1 up == 1 B5 1 current = current+1 B6 1 current = current-1 B7 4 move = 0 doorclose = 0 req[current] = 0 scan = current
44
C5 1 req == 1 C6 1 up == 1 B8 1 scan = scan+1 C7 1 scan >= 7 B9 1 up = 0 B10 1 scan = scan-1 C8 1 scan <= 0 B11 1 up = 1 20 B0 1 B1 B1 1 B2 B2 1 C1 C1 2 0 B0 1 B3 B3 1 C2 C2 2 0 C5 1 B4 B4 1 C3 C3 2 0 C4 1 B7 C4 2 0 B6 1 B5 B5 1 C3 B6 1 C3 B7 1 C5 C5 2 0 B0 1 C6 C6 2 0 B10 1 B8 B8 1 C7 C7 2 0 C2 1 B9 B9 1 C2 B10 1 C8 C8 2 0 C2 1 B11 B11 0
Module Library Information
Module library information used in the lift controller is given below. It has nine
operators.
9 2 2 2 2 2 2 2 2 2 = 0 5 1 1 5 1 1 [] 0 10 1 1 10 1 1 []= 0 10 1 1 10 1 1 ^ 0 10 1 1 10 1 1 == 1 10 1 1 10 1 1 + 0 10 1 1 10 1 1 - 0 10 1 1 10 1 1
45
>= 1 10 1 1 10 1 1 <= 1 10 1 1 10 1 1
Architectural Parameters
1 2 2 2 2 0 0 19 B0 0 1 0 0 1 0 1 0 1 0 B1 0 1 0 0 1 0 1 0 1 0 B2 0 1 0 0 1 0 1 0 1 0 C1 0 1 0 0 1 0 1 0 1 0 B3 0 1 0 0 1 0 1 0 1 0 C2 0 1 0 0 1 0 1 0 1 0 B4 0 1 0 0 1 0 1 0 1 0 C3 0 1 0 0 1 0 1 0 1 0 C4 0 1 0 0 1 0 1 0 1 0 B5 0 1 0 0 1 0 1 0 1 0 B6 0 1 0 0 1 0 1 0 1 0 B7 0 1 0 0 1 0 1 0 1 0 C5 0 1 0 0 1 0 1 0 1 0 C6 0 1 0 0 1 0 1 0 1 0 C7 0 1 0 0 1 0 1 0 1 0 B9 0 1 0 0 1 0 1 0 1 0 C8 0 1 0 0 1 0 1 0 1 0 B10 0 1 0 0 1 0 1 0 1 0 B11 0 1 0 0 1 0 1 0 1 0
Basic Block Information
1 2 2 2 2 B0 0 0 2 0 0 1 0 1 1 1 1 B1 2 4 0 1 0 1 1 1 10 2 0 1 11 2 0 1 5 0 0 0 0 1 1 0 1 2 0 1 0 3 0 1 1 5 1 1 3 B2 5 5 0 1 0 1 1 1 2 1 0 3 1 0
46
5 1 1 4 2 0 0 2 3 1 0 3 5 0 0 4 8 0 1 0 C1 4 5 2 1 0 3 1 1 5 1 0 8 1 0 11 2 0 1 4 2 0 0 0 3 1 0 1 5 0 0 2 8 0 0 3 B3 4 4 2 1 0 3 1 1 5 1 0 8 1 0 4 2 0 0 0 3 1 0 1 5 0 0 2 9 0 1 0 C2 4 5 2 1 0 3 1 1 5 1 0 9 1 0 11 2 0 1 4 2 0 0 0 3 1 0 1 5 0 0 2 9 0 0 3 B4 3 4 2 1 0 5 1 0 9 1 0 11 2 0 1 3 2 0 0 0 5 1 0 1 9 0 0 2 C3 3 4 2 1 0 5 1 1 9 1 0 11 2 0 1 3 2 0 0 0 5 1 0 1 9 0 0 2 C4 3 4
47
2 1 0 5 1 1 9 1 0 11 2 0 1 3 2 0 0 0 5 1 0 1 9 0 0 2 B5 3 4 2 1 0 5 1 1 9 1 0 11 2 0 1 3 2 0 1 0 5 1 0 1 9 0 0 2 B6 3 4 2 1 0 5 1 1 9 1 0 11 2 0 1 3 2 0 1 0 5 1 0 1 9 0 0 2 B7 2 3 2 1 0 5 1 1 10 2 0 1 4 2 0 0 0 3 1 1 3 5 0 0 1 9 0 1 2 C5 4 5 2 1 0 3 1 1 5 1 0 9 1 0 11 2 0 1 4 2 0 0 0 3 1 0 1 5 0 0 2 9 0 0 3 C6 4 5 2 1 0 3 1 1 5 1 0 9 1 0 11 2 0 1 4 2 0 0 0 3 1 0 1 5 0 0 2 9 0 0 3
48
B8 4 5 2 1 0 3 1 1 5 1 0 9 1 0 11 2 0 1 4 2 0 0 0 3 0 1 0 5 0 0 2 9 0 0 3 C7 4 5 2 0 3 0 5 0 9 0 12 2 0 1 4 2 0 0 0 3 1 0 1 5 0 0 2 9 0 0 3 B9 3 4 2 1 0 3 1 1 9 1 0 10 2 0 1 4 2 0 0 0 3 1 0 1 5 0 1 0 9 0 0 2 B10 4 5 2 1 0 3 1 1 5 1 0 9 1 0 11 2 0 1 4 2 0 0 0 3 1 1 0 5 0 0 2 9 0 0 3 C8 4 5 2 1 0 3 1 1 5 1 0 9 1 0 10 2 0 1 4 2 0 0 0 3 1 0 1 5 0 0 2 9 0 0 3 B11 0 1 11 2 0 1 0
49
13 a b current scan move up doorclose atfloor req1 req 0 1 7 2 p0 p1
Partial Information
20 B0 2 read (p0, a) 0 0 1 0 -1 1 0 0 1 0 0 read (p1, b) 0 1 1 0 -1 1 1 0 1 0 1 B1 6 current = 0 0 2 0 0 -1 1 10 0 1 1 2 0 scan = 0 0 3 0 0 -1 1 10 0 1 1 2 0 move = 0 0 4 0 0 -1 1 10 0 1 1 2 0 up = 1 0 5 0 0 -1 1 11 0 1 1 3 0 doorclose = 1 0 6 0 0 -1 1 11 0 1 1 3 0 atfloor = 1 0 7 0 0 -1 1 11 0 1 1 3 0 B2 1 req1 = a^b 3 8 0 0 -1 2 0 1 0 2 0 0 1 C1 1 req1 == 1 4 32767 0 0 -1 2 8 11 0 2 1 3 4 0 B3 1 req = req1 0 9 0 0 -1 1 8 0 1 0 3 C2 1
50
req[scan] == 1 4 32767 0 0 -1 2 9 3 0 2 1 3 4 0 B4 1 move = 1 0 4 0 0 -1 1 11 0 1 1 3 0 C3 1 req[current] == 1 4 32767 0 0 -1 2 9 11 0 2 1 2 3 0 C4 1 up == 1 4 32767 0 0 -1 2 5 11 0 2 1 1 3 0 B5 1 current = current+1 5 2 0 0 -1 2 2 11 0 2 1 0 3 0 B6 1 current = current-1 6 2 0 0 -1 2 2 11 0 2 1 0 3 0 B7 4 move = 0 0 4 0 0 -1 1 10 0 1 1 2 0 doorclose = 0 0 6 0 0 -1 1 10 0 1 1 2 0 req[current] = 0 0 9 0 0 -1 1 10 0 1 1 2 0 scan = current 0 3 0 0 -1 1 2 0 1 0 0 C5 1 req == 1 4 32767 0 0 -1 2 9 11 0 2 1 3 4 0 C6 1 up == 1 4 32767 0 0 -1 2 5 11 0 2 1 2 4 0 B8 1 scan = scan+1 5 3 0 0 -1 2 3 11 0 2 1 1 4 0 C7 1 scan >= 7 7 32767 0 0 -1 2 3 12 0 2 1 1 4 0 B9 1 up = 0 0 5 0 0 -1 1 10 0 1 1 3 0 B10 1 scan = scan-1 6 3 0 0 -1 2 3 11 0 2 1 1 4 0
51
C8 1 scan <= 0 8 32767 0 0 -1 2 3 10 0 2 1 1 4 0 B11 1 up = 1 0 5 0 0 -1 1 11 0 1 1 0 0 20 B0 1 B1 B1 1 B2 B2 1 C1 C1 2 0 B0 1 B3 B3 1 C2 C2 2 0 C5 1 B4 B4 1 C3 C3 2 0 C4 1 B7 C4 2 0 B6 1 B5 B5 1 C3 B6 1 C3 B7 1 C5 C5 2 0 B0 1 C6 C6 2 0 B10 1 B8 B8 1 C7 C7 2 0 C2 1 B9 B9 1 C2 B10 1 C8 C8 2 0 C2 1 B11 B11 0
52
Bibliography
[1] F.L Camposano, R. Saunders, and M.R Tabet, VHDL as input for High –level
Synthesis, proceedings of IEEE Design and Test of Computers, pp.43-49, 1991.
[1] Daniel D.Gajski, Nikil D.Dutt , Allen C-H Wu, and Steve Y-L Lin , High level
synthesis : Introduction to chip and System Design ,Kluwer Academic
Publishers , 1992.
[3] C.R.Mandal ,P.P.Chakrabarti , and S.Ghose , Gabind : a ga approach to
allocation and binding for the high-level synthesis of data paths, IEEE
Transactions on Very Large Scale Integration (VLSI) Systems , vol. 8 ,no. 6 ,pp
747-750 ,2000.
[4] C.R.Mandal, R.M. Zimmer , A Genetic Algorithm for Synthesis of Structured
Data Paths , Proceedings of the 13th International Conference on VLSI Design
,2006
[5] Synthesis of Arrays and Records , Pradip K.Jha , Stephen Barn field and
John Weaver ,IBM EDA Lab , Fishkill , NY , USA ,Rudra Mukherjee, Viewlogic
Systems Inc , San Jose, CA ,USA , Reinaldo A.Bergamaschi , IBM T.J Waston
research Center , NY ,USA.
[6] C.R.Mandal , P.P. Chakrabarti , and S.Ghose , Allocation and binding for
data path synthesis using a genetic approach , in proceedings of VLSI design
’96, pp.122-125 ,1996.
53
[7] Ramachandan , N.Gajski , D.D Chaiyakul , An algorithm for array variable
clustering , in proceedings of EUROASIC , The European Event in ASIC Design
on European Design and Test Conference ,1994 .
[8] M. Rahmouni and A. A. Jerraya, “Formulation and evaluation of scheduling
techniques for control flow graphs”, in Proceedings of EuroDAC'95, (Brighton),
pp. 386.391, 18-22 September 1995.
[9] C. Tseng and D.P Siewiorek, FACET : A procedure for the Automated
Synthesis of Digital Systems , 20th Design Automation Conference ,1983.
[10] Holmes, N.D. Gajski , D.D Architectural exploration for data paths with
memory hierarchy , in Proceedings of ED & TC on European Design and Test
Conference,1995.
[11] R. Camposano, “Path-based scheduling for synthesis”, IEEE transactions on
computer-Aided Design of Integrated Circuits and Systems, vol. Vol 10 No 1, pp.
85.93, Jan. 1991.
[12] Herman Schmit, Donald E. Thomas, Synthesis of application-specific
memory designs, in proceedings of IEEE Transactions on VLSI Systems, 1997.
[13] Peeter Ellervee ,Ahmed Hemani , Bengt Sventesson , High level Synthesis of
Control and Memory Intensive Applications , in Proceedings of IEEE
International Conference 1995.
top related