kazi spring 2008csci 6601 csci-660 introduction to vlsi design khurram kazi

35
Kazi Spring 2008 CSCI 660 1 CSCI-660 Introduction to VLSI Design Khurram Kazi

Post on 19-Dec-2015

230 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi

Kazi Spring 2008 CSCI 660 1

CSCI-660

Introduction to VLSI Design

Khurram Kazi

Page 2: Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi

2Kazi Spring 2008 CSCI 660

Overview of Synthesis flow

Page 3: Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi

3Kazi Spring 2008 CSCI 660

Fundamental Steps to a Good design

If you have a good start, the project will go smoothly

Partitioning the Design is a good start Partition by:

Functionality Don’t mix two different clock domains in a single block

Don’t make the blocks too large Optimize for Synthesis

Page 4: Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi

4Kazi Spring 2008 CSCI 660

Block diagram of the Framer Receiver direction:Is it partitioned well? Does it follow previous suggestions of the previous slide?

Frame_detectFraming state machine

Bit counterByte counter

Serial to parallel converter

ser_in

reset_b

clk

Clock generation

Overhead bytes RAM controller

(Generates signals for RAM)

8 bits data

counters

Byte clock

Overhead bytes RAM

SPE Data out processor(transports data, generates

SPE_Valid .. Etc.)

D1_3_clk

D4_12_clk

D1_3 bytes reader from RAM

Parallel to serial data

D4_12 bytes reader from RAM

Parallel to serial data

D1_3_data

D1_3_clk

D4_12_clk

D4_12_data

D1_3_data_val

D4_12_data_val

SPE_data

SPE_val

D1_12 data

FRAMER.vhd

Page 5: Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi

5Kazi Spring 2008 CSCI 660

Partitioning

Partition Design into smaller components:Partition can be done in HDLorDuring Synthesis

Page 6: Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi

6Kazi Spring 2008 CSCI 660

Recommended rules for Synthesis

Share resources whenever possible When implementing combinatorial paths do not have

hierarchy Register all outputs Do not implement glue logic between block, partition

them well Separate designs on functional boundary Keep block sizes to a reasonable size Separate core logic, pads, clock and JTAG

Page 7: Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi

7Kazi Spring 2008 CSCI 660

Resource Sharing

HDL Description if (select) then

sum <= A + B;

Else

sum <= C + D;

Mux

+

+

AB

CD

sum

select

+

muxAC

BD

sumselect

mux

One Possible Implementation

Another Implementation: shared resource Implementation -> Area-efficient

Page 8: Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi

8Kazi Spring 2008 CSCI 660

Sharable HDL Operators

Following HDL (VHDL and Verilog) synthetic operators can result in shared implementation

* + ->= < <== /= ==

Within the same blocks, the operators can be shared (i.e. they are in the same process)

Page 9: Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi

9Kazi Spring 2008 CSCI 660

DesignWare Implementation Selection•DesignWare implementation is dependent on Area and timing goals

•Smallest implementation is selected based on timing goals being met

+

Synthetic Modulesmallest

fastest Carry Look Ahead

Ripple Carry

Page 10: Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi

10Kazi Spring 2008 CSCI 660

Sharing Common Sub-Expressions

•Design compiler tries to share common sub-expressions to reduce the number of resources necessary to implement the design -> area savings while timing goals are met

SUM1 <= A + B + C;

SUM2 <= A + B + D;

SUM3 <= A + B + E;

+ + +

+

SUM1 SUM2 SUM3

A B C D E

Page 11: Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi

11Kazi Spring 2008 CSCI 660

Sharing Common Sub-Expression’s Limitations Sharable terms must be in the same order within the each

expression

sum1 <= A + B + C;

sum2 <= B + A + D; -> not sharable

sum3 <= A + B + E; -> sharable Sharable terms must occur in the same position (or use

parentheses to maintain ordering)

sum1 <= A + B + C;

sum2 <= D + A + B; -> not sharable

sum3 <= E + (A + B); -> sharable

Page 12: Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi

12Kazi Spring 2008 CSCI 660

How to Infer Specific Implementation (Adder with Carry-In

•Following expression infers adder with carry-in

sum <= A + B + Cin;

where A and B are vectors, and Cin is a single bit

A B

Cin

sum

+

Page 13: Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi

13Kazi Spring 2008 CSCI 660

Operator Reordering

•Design Compiler has the capability to produce the reordering the arithmetic operators to produce the fastest design

•For example

Z <= A + B + C + D; (Z is time constrained)

Initially the ordering is from left to right

A

B

C

DZ

+

+

+

Page 14: Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi

14Kazi Spring 2008 CSCI 660

Reordering of the Operator for a Fast Design

•If the arrival time of all the signals, A, B, C and D is the same, the Design Compiler will reorder the operators using a balanced tree type architecture

A

B

Z

+

+

+C

D

Page 15: Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi

15Kazi Spring 2008 CSCI 660

Reordering of the Operator for a Fast Design

•If the arrival time of the signal A is the latest, the Design Compiler will reorder the operators such that it accommodates the late arriving signal

C

B

D

AZ

+

+

+

Page 16: Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi

16Kazi Spring 2008 CSCI 660

Avoid hierarchical combinatorial blocks

The path between reg1 and reg2 is divided between three different block

Due to hierarchical boundaries, optimization of the combinatorial logic cannot be achieved

Synthesis tools (Synopsys) maintain the integrity of the I/O ports, combinatorial optimization cannot be achieved between blocks (unless “grouping” is used).

Not recommended Design Practice

CombinatorialLogic1

CombinatorialLogic2

CombinatorialLogic3

Block A Block B Block C

reg1 reg2

Page 17: Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi

17Kazi Spring 2008 CSCI 660

Recommend way to handle Combinatorial Paths

All the combinatorial circuitry is grouped in the same block that has its output connected the destination flip flop

It allows the optimal minimization of the combinatorial logic during synthesis

Allows simplified description of the timing interface

Recommended practice

CombinatorialLogic1 &

Logic2& Logic3

Block A Block C

reg1reg2

Page 18: Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi

18Kazi Spring 2008 CSCI 660

Register all outputs

Simplifies the synthesis design environment: Inputs to the individual block arrive within the same relative delay (caused by wire delays)

Don’t really need to specify output requirements since paths starts at flip flop outputs.

Take care of fanouts, rule of thumb, keep the fanout to 16 (dependent on technology and components that are being driven by the output)

Register all outputs

Block X Block Y

reg1reg2

Block Y

reg3

Page 19: Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi

19Kazi Spring 2008 CSCI 660

NO GLUE LOGIC between blocks

No Glue Logic between Blocks, nomatter what the temptation

Block X

reg1

Block Y

reg3

Top

Due to time pressures, and a bug found that can be simply be fixed by adding some simple glue logic. RESIST THE TEMPTATION!!!

At this level in the hierarchy, this implementation will not allow the glue logic to be absorbed within any lower level block.

Page 20: Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi

20Kazi Spring 2008 CSCI 660

Separate design with different goals

reg1

Slow Logic

Top

Timecritical path

reg3

reg1 may be driven by time critical function, hence will have different optimization constraints

reg3 may be driven by slow logic, hence no need to constrain it for speed

Page 21: Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi

21Kazi Spring 2008 CSCI 660

Optimization based on design requirements

reg1

Slow Logic

Top

Timecritical path

reg3

Area optimized block

Speed optimized block Use different entities to

partition design blocks Allows different

constraints during synthesis to optimize for area or speed or both.

Page 22: Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi

22Kazi Spring 2008 CSCI 660

Separate FSM with random logic

Separation of the FSM and the random logic allows you to use FSM optimized synthesis

reg1

RandomLogic

Top

FSM

reg3

Standard optimizationtechniques used

Use FSM optimization tool

Page 23: Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi

23Kazi Spring 2008 CSCI 660

Maintain a reasonable block size

Partition your design such that each block is between 1000-10000 gates (this is strictly tools and technology dependent)

Larger the blocks, longer the run time -> quick iterations cannot be done.

Page 24: Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi

24Kazi Spring 2008 CSCI 660

Partitioning of Full ASIC

Top-level block includes I/O pads and the Mid block instantiation

Mid includes Clock generator, JTAG, CORE logic

CORE LOGIC includes all the functionality and internal scan circuitry

Clockgenerator(PLL etc)

JTAG

CORELogic

Mid

Top

I/O Pads

Page 25: Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi

25Kazi Spring 2008 CSCI 660

Synthesis Constraints

Specifying an Area goal Area constraints are vendor/library dependent

(e.g. 2 input-nand gate, square mils, grid etc) Design compiler has the Max Area constraint

as one of the constraint attributes.

Page 26: Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi

26Kazi Spring 2008 CSCI 660

Timing constraints for synchronous designs

Define timing paths within the design, i.e. paths leading into the design, internal paths and design leading out of the design Define the clock Define the I/O timing relative to the clock

reg2

Block to be synthesized

reg3A EDCB

clk

Page 27: Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi

27Kazi Spring 2008 CSCI 660

Define a clock for synthesis

Clock source Period Duty cycle Defining the clock constraints the internal timing

paths

reg2

Block to be synthesized

reg3DCB

clk

Duty cycle

Clock period

QD QD

1 Clock cycle

Page 28: Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi

28Kazi Spring 2008 CSCI 660

Timing goals for synchronous design

Define timing constraints for all paths within a design Define the clocks Define the I/O timing relative to the clock

reg2

Block to be synthesized

reg3DCB QD QD

Constrained by clk

Paths B and D still unconstraint

A E

clk

Page 29: Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi

29Kazi Spring 2008 CSCI 660

Constraining input path

Input delay is specified relative to the clock External logic uses some time within the clock period and i.e. TclkToQ(clock to Q delay) + Tw (net delay) ->{At input to B} Example command for this in synopsys design compiler:

dc_shell> set_input_delay –clock clk 5 (where 5 represents the input delay)

reg2

Block to be synthesized

B QDA

clk

Q W

TclkToQ Tw

Page 30: Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi

30Kazi Spring 2008 CSCI 660

Constraining output path

Output delay is specified relative to the clock How much of the clock period does the external logic

(shown by cloud b) use up? Tb + Tsetup; The amount to be specified as the output delay

reg2

Block to be synthesized

b QDA

clk

Q

TclkToQ

Tsetup

Tb

External logic

Page 31: Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi

31Kazi Spring 2008 CSCI 660

Timing paths

Page 32: Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi

32Kazi Spring 2008 CSCI 660

Combinatorial logic may have multiple paths

•Static Timing Analysis uses the longest path to calculate a maximum delay or the shortest path to calculate a minimum delay.

Page 33: Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi

33Kazi Spring 2008 CSCI 660

Schematic converted into a timing graph

Each arrow represents a net or a cell delay (timing arc)

Page 34: Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi

34Kazi Spring 2008 CSCI 660

Calculating a path’s delay1.0

0.50.34

0.25

0.12

Path delay = 1.0 + 0.5 + 0.34 + 0.25 + 0.12 = 2.21

0.0

0.75

0.450.56

0.2

0.1

Path delay = 0.75 + 0.45 + 0.56 +0.1 + 0.2 +0.1 = 2.16 0.1

Page 35: Kazi Spring 2008CSCI 6601 CSCI-660 Introduction to VLSI Design Khurram Kazi

35Kazi Spring 2008 CSCI 660

Summarizing: High level synthesis is constraint driven

Resource sharing, sharing common sub-expressions and implementation selection are all dependent on design constraints and coding style

Design Compiler based on timing constraints decides what to share, how to implement and what ordering should be done.

If no constraints are given, area based optimization is performed (maybe a good start to get an idea of the synthesized circuit)

It is imperative that realistic constraints should be set prior to compilation

High Level synthesis takes place only when optimizing an HDL description