eecs 150 - components and design techniques for digital systems lec 27 – summary (whirlwind)...

EECS 150 - Components and Design

Techniques for Digital Systems

Lec 27 – Summary (whirlwind)

12-9-04

David CullerElectrical Engineering and Computer Sciences

University of California, Berkeley

http://www.eecs.berkeley.edu/~cullerhttp://www-inst.eecs.berkeley.edu/~cs150

12/9/04 EECS150 Lec26 - Summary 2

Background

Transfer Function

Transistor Physics

Devices

Circuits

FlipFlops

Machine Organization

Instruction Set Arch

Pgm Language

Asm / Machine Lang

CS 61C

Deep Digital Design Experience

Fundamentals of Boolean Logic

Synchronous Circuits

Finite State Machines

Timing & Clocking

Device Technology & Implications

Controller Design

Arithmetic Units

Bus Design

Encoding, Framing

Testing, Debugging

Hardware Architecture

HDL, Design Flow (CAD)

Course Content

Components and Design Techniques for Digital Systems

Synchronous Digital Hardware Systems

Example digital representation: acoustic waveform

A series of numbers is used to represent the waveform, rather than a voltage or current, as in analog systems.

• Synchronous: “Clocked” - all changes in the system are controlled by a global clock and happen at the same time (not asynchronous)

• Digital: All inputs/outputs and internal values (signals) take on discrete values (not analog).

Trick you into building an extreme project

• FPGA/SDRAM provides full game logic

– Court, obstructions

– Moving paddles

– Moving, colliding ball

– All the physics

• Court displayed to NTSC (TV) Video Output

– Real time Sound effects ???

• N64 controller (and switches) for input

• How to make it multiplayer?– The network

Levels of Digital Design

What makes Digital Systems tick?

Combinational

What determines the systems performance?

The 150 “stuff”

• Building blocks of computer systems– ICs (Chips), PCBs, Chassis, Cables & Connectors

• CMOS Transistors– Voltage controlled switches

– Complementary forms (nmos, pmos)

• Logic gates from CMOS transistors– Logic gates implement particular boolean functions

» N inputs, 1 output

– Serial and parallel switches

– Dual structure

– P-type “pull up” transmit 1

– N-type

• Complex gates: mux

• Synchronous Sequential Elements– D FlipFlops

Combinational Logic (CL) Defined

yi = fi(x0 , . . . . , xn-1), where x, y are {0,1}.

Y is a function of only X.

• If we change X, Y will change – immediately (well almost!).

– There is an implementation dependent delay from X to Y.

Transistor-level Logic Circuits - NAND

• Inverter (NOT gate):• NAND gate

• Logic Function: – out = 0 iff both a AND b = 1 therefore out = (ab)’– pFET network and nFET network are duals of one another.

How about AND gate?

a b out

nand (out, a, b)

Combinational logic summary

• Logic functions, truth tables, and switches– NOT, AND, OR, NAND, NOR, XOR, . . ., minimal set

• Axioms and theorems of Boolean algebra– Proofs by re-writing and perfect induction

• Gate logic– Networks of Boolean functions and their time behavior

• Canonical forms– Two-level and incompletely specified functions

• Optimization– Two-level simplification using K-maps– Automation of simplification– Multi-level logic

• Later– Design case studies– Time behavior

Transistor-level Logic Circuits - Latch

• Positive Level-sensitive latch

• Transistor Level• Positive Edge-triggered

flip-flop built from two level-sensitive latches:

clk’

D FlipFlop

10 gates

D Flip-Flop

• Make S and R complements of each other in Master stage

– Eliminates 1s catching problem

» Input only needs to settle by clock edge

– Can't just hold previous value (must have new value ready every clock period)

– Value of D just before clock goes low is what is stored in flip-flop

– Can make R-S flip-flop by adding logic to make D = S + R' Q

master stage slave stage

Timing Methodologies

• Rules for interconnecting components and clocks– Guarantee proper operation of system when strictly followed

• Approach depends on building blocks used for memory elements

– Focus on systems with edge-triggered flip-flops

» Found in programmable logic devices

– Many custom integrated circuits focus on level-sensitive latches

• Basic rules for correct timing:– (1) Correct inputs, with respect to time, are provided to the flip-flops

– (2) No flip-flop changes state more than once per clocking event

there is a timing "window" around the clocking event during which the input must remain stable and unchanged in order to be recognized

datachangingstable

Tsu Th

dataD Q D Q

Timing Methodologies (cont’d)

• Definition of terms– clock: periodic event, causes state of memory element to

change; can be rising or falling edge, or high or low level

– setup time: minimum time before the clocking event by which the input must be stable (Tsu)

– hold time: minimum time after the clocking event until which the input must remain stable (Th)

What’s an FSM?

• Next state is function of state and input

• Moore Machine: output is a function of the state

• Mealy Machine: output is a function of state and input

Often PLAs

State / outputinputA

inputB

State inputA/outputA

inputB/outputB

Formal Design Process for FSMs

• Circuit Diagram:

– XOR gate for ns calculation

– DFF to hold present state

– no logic needed for output

Logic equations from table:

OUT = PS

NS = PS xor IN

• Review of Design Steps:

1. Circuit functional specification

2. State Transition Diagram

3. Symbolic State Transition Table

4. Encoded State Transition Table

5. Derive Logic Equations

6. Circuit DiagramFFs for state

CL for NS and OUT

Composing FSMs into larger designs

FSM FSM

Sequential Synchronous Elements

• Basic registers– Common control, MUXes

• Simple, important FSMs– simple internal feedback

– Ring counters, Pattern detectors

– Binary Counters

• Universal Shift Register

• Using Counters to build controllers– Simplify control by controlling simpler FSM

150 and the changing times

• Advancing technology changes the trade-offs and design techniques

– 2x transistors per chip every 18 months

• ASIC, Programmable Logic, Microprocessor• Programmable logic invests chip real-estate to

reduce design time & time to market• FPGA:

– programmable interconnect, – configurable logic blocks

» LUT + storage– Block RAM– IO Blocks

• PLAs– General devices for SoP or PoS logic

Virtex-E Configurable Logic Block (CLB)CLB = 4 logic cells (LC) in two slices

LC: 4-input function generator, carry logic, storage ele’t

80 x 120 CLB array on 2000E

16x1 synchronous RAM FF or latch

• Basic Idea:– Language constructs describe circuits

with two basic forms:

– Structural descriptions similar to hierarchical netlist.

– Behavioral descriptions use higher-level constructs (similar to conventional programming).

• Originally designed to help in abstraction and simulation.

– Now “logic synthesis” tools exist to automatically convert from behavioral descriptions to gate netlist.

– Greatly improves designer productivity.

– However, this may lead you to falsely believe that hardware design can be reduced to writing programs!

• “Structural” example:Decoder(output x0,x1,x2,x3;

inputs a,b){

wire abar, bbar;inv(bbar, b);inv(abar, a);nand(x0, abar, bbar);nand(x1, abar, b );nand(x2, a, bbar);nand(x3, a, b );

} • “Behavioral” example:

Decoder(output x0,x1,x2,x3;inputs a,b)

{case [a b]

00: [x0 x1 x2 x3] = 0x0;01: [x0 x1 x2 x3] = 0x2;10: [x0 x1 x2 x3] = 0x4;11: [x0 x1 x2 x3] = 0x8;

endcase;}

Finite State Machines in Verilog

inputsMoore outputs

Mealy outputs

next state

current state

combinationallogic

Design Methodology in Detail

Design Specification

Design Partition

Design EntryBehavioral Modeling

Simulation/FunctionalVerification

Pre-SynthesisSign-Off

Synthesize and MapGate-level Net List

PostsynthesisDesign Validation

PostsynthesisTiming Verification

Test Generation andFault Simulation

Cell Placement/ScanInsertation/Routing

Verify Physical andElectrical Rules

Synthesize and MapGate-level Net List

Design IntegrationAnd Verification

Design Sign-Off

Configuring CLBs

3-LUT FF1

latchLogic Block set by configuration bit-stream

3-input "look up table"

OUTPUTINPUTS

11111110

000A 2 A 1 A 0

inputsout

NAND gate in FPGA CLB

out = ~(A1 A2 A3)

00111100

000A 2 A 1 A 0

inputs

Nextstate bit in FPGA CLB

nextstate = A2 xor A1

11111110

000A 2 A 1 A 0

out = ~(A1 A2 A3)

00111100

000A 2 A 1 A 0

nextstate = A2 xor A1

Configuring Routes

Timing for Synchronous Circuits

• In general, for correct operation:

for all paths.• How do we enumerate all paths?

– Any circuit input or register output to any register input or circuit output.

– “setup time” for circuit outputs depends on what it connects to

– “clk-Q time” for circuit inputs depends on from where it comes.

reg regCL CL

clock input

output

option feedback

input output

T time(clkQ) + time(CL) + time(setup)

T clkQ + CL + setup

Typical SRAM Organization: 16-word x 4-bit

SRAMCell

- +Sense Amp - +Sense Amp - +Sense Amp - +Sense Amp

: : : :

Word 0

Word 1

Word 15

Dout 0Dout 1Dout 2Dout 3

- +Wr Driver

ss Deco

WrEnDin 0Din 1Din 2Din 3

Classical DRAM Organization (Square)

decoder

rowaddress

Column Selector & I/O Circuits

ColumnAddress

RAM Cell Array

word (row) select

bit (data) lines

Row and Column Address together select 1 bit a time

Each intersection representsa 1-T DRAM Cell

Square keeps the wires short:Power and speed advantagesLess RC, faster precharge anddischarge is faster access time!

DRAM with Column buffer

Column Latches

Sense Amps

Memory Array(2,048 x 2,048)

A0…A10…

Word LineStorage Cell

DECODER

Pull column into fast buffer storage

Access sequence of bit from there

Digital Arithmetic

• Circuit design for unsigned addition– Full adder per bit slice– Delay limited by Carry Propagation

» Ripple is algorithmically slow, but wires are short

• Carry select– Simple, resource-intensive– Excellent layout

• Carry look-ahead– Excellent asymptotic behavior– Great at the board level, but wire length effects are significant on chip

• Digital number systems– How to represent negative numbers– Simple operations– Clean algorithmic properties

• 2s complement is most widely used– Circuit for unsigned arithmetic– Subtract by complement and carry in– Overflow when cin xor cout of sign-bit is 1

2s Complement Adder/Subtractor

A - B = A + (-B) = A + B + 1

Add/Subtract

A 3 B 3 B 3

A 2 B 2 B 2

A 1 B 1 B 1

A 0 B 0 B 0

Sel Sel Sel Sel

S 3 S 2 S 1 S 0

Overflow

Digital design - as we’ve seen it

System specification (in words)

Datapath specification Controller specification

Comb. logic operations

Verilog dataflow

Gates / LUTs

FSM generation

STT / STD / Encoding

Logic: nextstate/outputs

Verilog behavior

Gates / LUTs / FF

ARTART

Lec 7, 8: FSM impl.

Lec 8, 9: Modeling FSMs

Lec 5, 6: Logic min.

Lec 4: HDL, Labs

Lec 2, 3: CMOS, FPGA

Lec 5, 6: Logic min.

Lec 4: HDLs, Labs

Lec 2, 3: CMOS, FPGA

Final Example: Ant Brain (Ward, MIT)• Sensors: L and R antennae, 1 if in

touching wall• Actuators: F - forward step, TL/TR - turn

left/right slightly• Goal: find way out of maze• Strategy: keep the wall on the right

Serial Line TX/RX – dealing with I/O

Receiver

DisplayControl

Display

ResetR

CharRcvd

Sender

Send 8

CharToSend

Keyboard

KeyboardDecode

ResetS

The GAME

• CP1: N64 interface

• CP2: Digital video encoder

• CP3: SDRAM controller

• CP4: IEEE 802.15.4 (cc2420) interface

• Project CP: game engine

• EndgameGame

Physics

Video Encode

ADV7194

composite video

ITU 601/656

N64 controller interface

SDRAMControl

board state

Render EngineSDRAM Control

Joystick

Interface

Computer Organization

• Computer design as an application of digital logic design procedures

• Computer = processing unit + memory system

• Processing unit = control + datapath

• Control = finite state machine– Inputs = machine instruction, datapath conditions

– Outputs = register transfer control signals, ALU operation codes

– Instruction interpretation = instruction fetch, decode, execute

• Datapath = functional units + registers + interconnect– Functional units = ALU, multipliers, dividers, etc.

– Registers = program counter, shifters, storage registers

– Interconenct = busses and wires

• Instruction Interpreter vs Fixed Function Device

system

data-path control

stateregisters

combinationallogic

multiplexer comparatorcode

registers

register logic

switchingnetworks

Design hierarchy

Datapath vs Control

• Datapath: Storage, FU, interconnect sufficient to perform the desired functions

– Inputs are Control Points– Outputs are signals

• Controller: State machine to orchestrate operation on the data path

– Based on desired function and signals

Datapath Controller

Control Points

signals

Datapath Design

• Datapath consists of state (reg, reg file), function units (adders, ALUs), and interconnect (mux, tri-state & bus)

• It can perform certain register transfers: source regs through function units and interconnect to dest reg

– Set of reg. Transfers occur on each cycle

• Each datapath element has control points– Reg (LD), FU (op), MUX (sel), TriState (OE)– Controller asserts the proper control point to cause the data

path to carryout the requested register transfers

• The RTLs associated with each step in the high level algorithm determine the STD of the contoller

– Controller inputs are datapath outputs (conditions)– Controller outputs are datapath inputs (control points)

Array Multiplier

b3 0 b2 0 b1 0 b0 0

P7 P6 P5 P4

bj sum in

sum out

carryout

carryin

Each row: n-bit adder with AND gates

What is the critical path?

Generates all n partial products simultaneously.

“Shift and Add” Multiplier

• Sums each partial product, one at a time.

• In binary, each partial product is shifted versions of A or 0.

Control Algorithm:

1. P 0, A multiplicand,

B multiplier

2. If LSB of B==1 then add A to P

else add 0

3. Shift [P][B] right 1

4. Repeat steps 2 and 3 n-1 times.

5. [P][B] has product.

Bn-bit shift registers

An-bit register

n-bit adder

• Cost n, = n clock cycles.

• What is the critical path for determining the min clock period?

DIVIDE HARDWARE Version 2

• 32-bit Divisor register, 32-bit ALU, 64-bit Remainder register, 32-bit Quotient register

Remainder

Quotient

Divisor

add/sub

Shift Left

WriteControl

32 bits

64 bits

Shift Left

Register Transfers - interconnect

• Point-to-point connection– Dedicated wires– Muxes on inputs of

each register

• Common input from multiplexer– Load enables

for each register– Control signals

for multiplexer

• Common bus with output enables– Output enables and load

enables for each register

rt rd R4

rs rt rd R4

Register Transfer Level Descriptions• A standard high-level

representation for describing systems.

• It follows from the fact that all synchronous digital system can be described as a set of state elements connected by combination logic (CL) blocks:

• RTL comprises a set of register transfers with optional operators as part of the transfer.

• Example:

regA regB

regC regA + regB

if (start==1) regA regC

• Personal style:– use “;” to separate transfers that

occur on separate cycles.

– Use “,” to separate transfers that occur on the same cycle.

• Example (2 cycles):

regA regB, regB 0;

regC regA;

reg regCL CL

clock input

output

option feedback

input output

List Processor Example

• RTL gives us a framework for making high-level optimizations.– Fixed function unit– Approach extends to instruction interpreters

• General design procedure outline:1. Problem, Constraints, and Component Library Spec.2. “Algorithm” Selection3. Micro-architecture Specification4. Analysis of Cost, Performance, Power5. Optimizations, Variations6. Detailed Design

A_SEL01

Memory

NEXT_SEL

LD_NEXT

NEXT_ZERO

SUM_SEL

LD_SUM

3. Architecture #1Direct implementation of RTL description:

Datapath

Controller

If (START==1) NEXT0, SUM0; repeat { SUMSUM + Memory[NEXT+1]; NEXTMemory[NEXT];

} until (NEXT==0);RSUM, DONE1;

Approaching an ISA

• Instruction Set Architecture– Defines set of operations, instruction format, hardware

supported data types, named storage, addressing modes, sequencing

• Meaning of each instruction is described by RTL on architected registers and memory

• Given technology constraints assemble adequate datapath

– Architected storage mapped to actual storage– Function units to do all the required operations– Possible additional storage (eg. MAR, MBR, …)– Interconnect to move information among regs and FUs

• Map each instruction to sequence of RTLs• Collate sequences into symbolic controller STD• Lower symbolic STD to control points• Implement controller

Instruction Sequencing

• Example – an instruction to add the contents of two registers (Rx and Ry) and place result in a third register (Rz)

• Step 1: Fetch the ADD instruction from memory into an instruction register

• Step 2: Decode instruction– Instruction in IR has the code of an ADD instruction

– Register indices used to generate output enables for registers Rx and Ry

– Register index used to generate load signal for register Rz

• Step 3: Execute instruction– Enable Rx and Ry output and direct to ALU

– Setup ALU to perform ADD operation

– Direct result to Rz so that it can be loaded into register

InitializeMachine

Register-to-Register

BranchNot Taken

Branch Taken

Instruction Execution

• Control State Diagram (for each diagram)– Reset

– Fetch instruction

– Decode

– Execute

• Instructions partitioned into three classes– Branch

– Load/store

– Register-to-register

• Different sequencethrough diagram for each instruction type

• Controller manipulates the data path to perform the instruction

FetchInstr.

XEQInstr.

Load/StoreBranch

Incr.PC

Networking Layers

Application:send @sdata dest

Application:rcv @rdata [src]

logical communication

actual

System:

dataheader trailer

actual

System:

actual

Hardware:

actual

header trailer

Hardware

actual

Analog Transmitter

actual

Analog Receiver

What the PHY does

• Code, transmit, receive, decode frames

• activation and deactivation of the radio transceiver

• energy detection (ED) within current channel

• link quality indication (LQI) for received packets

• channel selection

• clear channel assessment (CCA) for CSMA-CA

• Carrier Sense Media Access – Collision Avoidance (CSMA-CA)

• Listen for a period of time to hear if the channel is free (CCA)

• If hear traffic, back off for random period of time– Typically exponentially increasing backoff

– Try again

– May also due random delay before first CCA

• If channel is clear, transmit

• Ethernet does CSMA-CD (collision detect)

Error Correction Codes (ECC)

• Memory systems generate errors (accidentally flipped-bits)– DRAMs store very little charge per bit

– “Soft” errors occur occasionally when cells are struck by alpha particles or other environmental upsets.

– Less frequently, “hard” errors can occur when chips permanently fail.

– Problem gets worse as memories get denser and larger

• Where is “perfect” memory required?– servers, spacecraft/military computers, ebay, …

• Memories are protected against failures with ECCs

• Extra bits are added to each data-word– used to detect and/or correct faults in the memory system

– in general, each possible data word value is mapped to a unique “code word”. A fault changes a valid code word to an invalid one - which can be detected.

Correcting Code Concept

• Detection: bit pattern fails codeword check• Correction: map to nearest valid code word• Example: Parity bit

Space of possible bit patterns (2N)

Sparse population of code words (2M << 2N)

- with identifiable signature

Error changes bit pattern to

non-code

SECDED

• You receive:

–1111110–0000010–1010010

• What is the correct value?

1 2 3 4 5 6 7 positions

001 010 011 100 101 110 111

P1 P2 d1 P3 d2 d3 d4 role

Position of error = C3C2C1

Where Ci is parity of group i

Concept: Redundant Check

• Send a message M and a “check” word C

• Simple function on <M,C> to determine if both received correctly (with high probability)

• Example: XOR all the bytes in M and append the “checksum” byte, C, at the end

– Receiver XORs <M,C>

– What should result be?

– What errors are caught?

bit i is XOR of ith bit of each byte

CRC concept • I have a msg polynomial M(x) of degree m• We both have a generator poly G(x) of degree m• Let r(x) = remainder of M(x) xn / G(x)

– M(x) xn = G(x)p(x) + r(x)– r(x) is of degree n

• What is (M(x) xn – r(x)) / G(x) ?

• So I send you M(x) xn – r(x) – m+n degree polynomial– You divide by G(x) to check– M(x) is just the m most signficant coefficients, r(x) the lower m

• n-bit Message is viewed as coefficients of n-degree polynomial over binary numbers

n bits of zero at the end

tack on n bits of remainder

Instead of the zeros

Q DQ1Q DQ2Q DQ3Q DQ4

serial_in

Controlling Energy Consumption

• Largest contributing component to CMOS power consumption is switching power:

• Factors influencing power consumption:n: total number of nodes in circuit

: activity factor (probability of each node switching)

f: clock frequency (does this effect energy consumption?)

Vdd: power supply voltage

• What control do you have over each factor? • How does each effect the total Energy?

What control do you have as a designer?

221 ddavgavgavg VcfnP

Digital Design

Given a functional description and performance, cost, & power constraints, come up with an implementation using a set of primitives.

• How do we learn how to do this? 1. Learn about the primitives and how to generate them.

2. Learn about design representation.

3. Learn formal methods to optimally manipulate the representations.

4. Look at design examples.

5. Use trial and error - CAD tools and prototyping.

• Digital design is in some ways more an art than a science. The creative spirit is critical in combining primitive elements & other components in new ways to achieve a desired function.

• However, unlike art, we have objective measures of a design: performance cost power

Traversing Digital Design

EE 40 CS61C

EECS150 wks 1-6

So what’s on the final?

• 5 questions (one full design problem)

• Focused on latter third, but build upon everything we’ve done

– Digital arithmetic

– Datapath / Control / Computer Organization

– RTL

– Error coding

• But also– Combinational logic, timing and delays, controller design

• Partly recalling what was presented, partly putting your knowledge to work to solve a new problem

Maintaining the Digital Abstraction (in an analog world)

• Circuit design with very sharp transitions

• Noise margin for logical values

• Carefully Design Storage Elements (SE)– Internal feedback

• Structured System Design– SE + CL, cycles must cross SE

• Timing Methodology– All SE advance state together

– All inputs stable across state change

• Channel coding, framing, encapulation

• Error coding, detection, correction

reg regCL CL

clock input

output

option feedback

input output

Moore’s Law: 2x stuff per year or so

Bell’s Law – new computer class per 10 years

streaming informationto/from physical world

Number CrunchingData Storage

productivityinteractive

• Enabled by technological opportunities

• Smaller, more numerous and more intimately connected

• Ushers in a new kind of application

• Ultimately used in many ways not previously imagined

What to take away from EECS 150

• Hands-on understanding of digital design techniques and their relationship to the underlying technology.

• Experience with the fundamental process of the design of digital systems

– Components, DP, RTL, FSM, Controller

• An intellectual toolbox for a changing world.

eecs 150 - components and design techniques for digital systems lec 27 – summary (whirlwind)...

Documents

eecs 150 - components and design techniques for digital...

project whirlwind

eecs 150 - components and design techniques for digital...

eecs 252 graduate computer architecture lec 9 – precise...

eecs 150 - components and design techniques for digital...

eecs 150 - components and design techniques for digital...

1 eecs 150 - components and design techniques for digital...

1 eecs 150 - components and design techniques for digital...

1 cs294-8: research seminar on communications to the extreme...

whirlwind #2

culler reading lyric

uc berkeleycs150/fa07/lectures/lec01-intro4.pdf©uc berkeley...

1 eecs 150 - components and design techniques for digital...

9/16/04ucb eecs150 d. culler fa04 1 eecs 150 - components...

1 internet-scale systems research group eric brewer, david...

ibm / ucb eecs collaboration meeting may 11, 1999 david e....

2013 newsletter - home - whirlwind wheelchair... handicap...

curriculum vitae steven d. culler,...

1 internet-scale systems research group (isrg) eric brewer,...

signaling and hardware support - eecs instructional...