1 2. design methodology computer as a system consisting of its components and interconnection. two...

1

2. Design Methodology

Computer as a system consisting of its components and interconnection.

Two central properties of a system are its structure and behavior.

o Structure : an abstract graph consisting of its block diagram with no functional information.

o Behavioral description : for a given input a, describe the corresponding output f(a). → the function = the behavior

Example : XOR function a schematic diagram

1

87

65

3

4

2

NOTOR x1 x⊕ 2

NOT

x2AND•

ANDx1•

2

Truth table

(behavior)

Hardware Description Languageto describe a system’s structure and behavior VHDL(VHSIC hardware description language) : IEEE standard -Very High Speed Integrated Circuit

o Provide precise, technology – independent description of digital circuits at the gate level and register level.

o Processed with computers together with CAD program. → A VHDL description of a processor P can simulate the behavior and correctness of P before all the details of its design have been specified.

Input Output

x1 x2

0 0 1

1

0 1 0 1

0 1 1 0

x1 x⊕ 2

3

Example) A VHDL description of a half adder

block symbol

A VHDL description : an entity part and an architecture part.

The entity part describe the structure of the height level (as a single component), describing the system interface

entity half-adder is

port(x,y : in bit; sum, carry : out bit);end half-adder

The architecture part specifies behavior and/or internal structure architecture behavior of half-adder is begin

sum <= x xor y;carry <= x and y; end

behavior.

(‘xor’ and ‘and’ are predefined functions)

If xy=’11’ then carry <= 1 else carry <= 0

Half-adderx

y

sum

carry

4

VLSI can also convey timing

Carry ← x and y after 5ns

Architecture structure of half-adder is component xor_circuit port(a,b : in bit; c : out bit); end component; component nand_gate port(d,e : in bit, f : out bit); end component;

signal alpha : bit;

begin

xor : xor_circuit port map(a=>x, b=>y, c=>sum);NAND1 : nand_gate port map(d=>x, e=>y, f=>alpha);NAND2 : nand_gate port map(d=>alpha, e=>alpha, f=>carry);

end structure;

a

b xor c

de

NAND1 fde

NAND2 f

xy

•• sum

carry•alpha

5

The design problem : Given a desired range of behavior and a set of available components, determine a structure formed from these components that achieves the desired behavior at the minimum cost.

The behavior of the overall system may derivate from the desired behavior. → Performance evaluation

The design of a computer can be reviewed on many different levels, depending on the components

Three major design levels : processor level, register level, and a gate level (logic level)

Processor level – CPU, memory, I/O (VLSI/LSI)

Register level – register, combination circuit, and simple sequential circuit (MSI)

Gate level (logic) – logic gate (NAND, NOR), flip-flops

: the subject of classical switching theory, the concern of the logic diagram

6

Top-down hierarchical design

Step 1 : specify the processor-level structure of a system.

Step 2 : specify the register-level structure of each distinct component specified in step 1.

Step 3 : specify the gate-level structure of each distinct component specified in step : substantial theoretical basis (switching theory)

largely an art, highly dependent on the designer

Design problem for combination circuits : Design a logic circuit to realize a given set of combinational functions using the minimum number of gates, with constraints of (1) the # of logic levels < d, (2) fan-in(# of input lines) < k, and fan-out < l.

7

2.2 The Gate Level

Combinational function : time- invariant, input determines output

Sequential function : function of time, input and internal state determine output.

A physical realization of a combinational function is a combinational circuit constructed from gates : AND, OR, and NOT.

A set G of gate types is said to be functionally(logically) complete if any combinational function can be realized by a circuit that contains gates from G only.

NANDs and NORs are very important in logic design because they are a complete gate types by themselves and can be easily, inexpensively manufactured using current IC technologies.

The behavior of any combination circuit can be expressed by Boolean algebra.

o Two-level circuits(d=2) : literal, SOP (sum-of-products), POS (product-of-sum), minterm, maxterm

8

Implicant : A product term p is called an implicant of a function f if for any input combination x, P(x) = 1 implies that f(x) = 1.

The difference between implicant and minterm?

The implicant p is a prime implicant iff it is not covered by any other implicant of the function.

The solution of the gate minimization problem in SOP

Step1 : Compute all P.I. Of the given function f.

Step2 : Select a minimal set of P.I.s whose logical sum is f. K-map and Quine-McClusky tabular method

irreducible set minimal set (cover all minterm should include all essential)

11

111

111

11

111

111

irreducible minimal

Quine-McCluskey (tabular) method

1. Arrange all minterms in group such that all terms in the same group have the same # of 1’s in their binary representation.

2. Compare every term of the lowest-index group with each term in the successive group. Whenever possible, combine two terms being compared by means of

gxi+gxi′=g(xi+xi′)=g. Two terms from adjacent groups are combinable if their binary representation differ by just a single digit in the same position (from all 1-cube).

3. The process continues until no further combinations are possible. The remaining unchecked terms constitute the set of PI.

Ex) f(x1,x2,x3,x4) = (0,1,2,5,6,7,8,9,10,13,15)

Using prime implicant chart, we can find essential PI

1 1 1 115

(5,7) (5,13) (6,7) (9,13)

0 1 - 1 - 1 0 1 0 1 1 - 1 - 0 1

0 1 1 1 1 1 0 1

7 13

(1,5) (1,9) (2,6)

(2,10) (8,9)

(8,10)

0 1 0 1 0 1 1 0

1 0 0 1

1 0 1 0

5 6 9

10

0 - 0 1 - 0 0 1 0 - 1 0 - 0 1 0 1 0 0 - 1 0 - 0

- 1 1 11 1 - 1 (13,15)

(7,15)

1 2 8

(0,1,8,9) (0,2,8,10) (1,5,9,13)

(5,7,13,15)

- 0 0 - - 0 - 0 - - 0 1 - 1 - 1

(0,1)

(0,2)

(0,8)

0 0 0 -

0 0 - 0

- 0 0 0

0 0 0 00

x1,x2,x3,x4x1,x2,x3,x4x1,x2,x3,x4x1,x2,x3,x4#

0 0 0 1

1 0 0 00 0 1 0

(2,6) (6,7)

(0,1,8,9) (0,2,8,10) (1,5,9,13)

(5,7,13,15)

0 1 2 5 6 7 8 9 10 13 15

The essential PI’s are (0,2,8,10) and (5,7,13,15) . So, f(x1,x2,x3,x4) = (0,2,7,8) + (5,7,13,15) + PI’s

Here are 4 different choices (2,6) + (0,1,8,9), (2,6) + (1,5,9,13) (6,7) + (0,1,8,9), or (6,7) + (1,5,9,13)

The reduced PI chart

A PI pj dominates PI pk iff every minterm covered by pk is also covered by pj.

pj pk

m1 m2 m3 m4

(can remove)

Branching method

p1 p2

p3 p4 p5

m1 m2 m3 m4 m5

If we choose p1 first, then p3, p5 are next.

p1

p4p3

p5

p3p2

Quine – McCluskey method (no limitation of the # of variables)

(2,6)

(6,7)

(0,1,8,9)

(1,5,9,13)

1 6 9

Ex) f(A,B,C,D) = (3,9,11,12,13,14,15) + d (1,4,6)

PI chart:

3 9 11 12 13 14 15

(1,3, 9, 11) (4, 6,12,14) (9,13,11,15)

(12,13,14,15)

12 13 14 15

Reduced PI chart:

(4, 6,12,14) (9,13,11,15)

(12,13,14,15)

Result: (1,3,9,11) + (12,13,14,15)

SYEN 3330 Digital Systems Chapter 6-1

Sequential Circuits

A Sequential circuitcontains:

Storage Elements: Latches Flip-Flops Binary registers

Combinatorial Logic: Implements a multiple-output

switching function Inputs, labeled Inputs, are signals from

the outside. Outputs, labeled Outputs are signals to

the outside. Other inputs, labeled State or Present

State, are signals from memoryelements.

The remaining outputs, labeled NextState are inputs to memory elements.

Combina-tionalLogic

Storage Elements

Inputs Outputs

StateNextState


Sequential Circuits

Combinatorial Logic: Next state function:

Next State = f(Inputs, State) Output function (Mealy):

Outputs = g(Inputs, State) Alternate output function (Moore):

Outputs = h(State) Type of output function heavily

influences the design

Combina-tionalLogic

Storage Elements

Inputs Outputs

StateNextState


Sequential Logic Design Process

1. Word description

2. State table or State diagram

3. Reduced state table (not covered)

4. Code Assignment (State, Input, Output)

5. Choose flip-flop types

6. Derive output function(s)

7. Derive excitation functions

8. Draw the logic diagram

16

Input Output

Y(t) X1(t) X2(t) Y(t) J(t) K(t) Z(t)

0

0

0

0

1

1

1

1

0

0

1

1

0

0

1

1

0

1

0

1

0

1

0

1

0

0

0

1

0

1

1

1

0

0

0

1

d

d

d

d

d

d

d

d

1

0

0

0

0

1

1

0

1

0

0

1

Truth Table for Combinational Circuit

00 01 11 10

0 1

1 d d d d

00 01 11 10

0 1 1

1 1 1

00 01 11 10

0 d d d d

1 1

J(t) = x1x2

K(t) = x1’x2’

Z(t) = x1x2y = x1’x2’y + x1’x2y’ + x1x2’y’ +x1x2y

Example : Design of a serial binary adder

Serial adder

x1

x2

zc

Enter serially (bit by bit)

Input x1, x2

S1/1S1/0S1/0S0/1S1

S1/0S0/1S0/1S0/0S0

11100100state

no carry

carry

17

Serial adder

x1x2

z

J

K

y

y’

Reading Assignment

x1

x2

J

K

y

y’

Z

clk

1

0

18

Example: Design of a 4-bit-stream serial adderSUM(i) = x1(i) plus x2(i) plus x3(i) plus x4(i) plus C(i – 1)

19

2.3 The Register Level

The next highest level after the gate levelRelated information bits are grouped into ordered sets such as words or vectors.(a register = a storage device for words)

The major component types at the register level

type Component Function

Combinational

Sequential

Word gates

Multiplexers

Decoders and encoders

Programmable logic device

Arithmetic elements

(Parallel) registers

Shift registers

Counters

Boolean operation

Data routing, general function generation

Code checking and conversion

General function generation

Numerical and logical operations

Information storage

Information storage

Serial-parallel conversion

Control/timing signal generation

busRegister level component

Register level component

Gate-level : bit Register-level : word

20

Multiplexer

To implement n-var fn, a multiplexer with (n-1) select input and 2n-1 data input.

The register-level components are linked together by buses. A circuit symbol of a register-level component

S0

S1

x3 x3’0 1

x1

x2

Z = f (x1, x2, x3)

1

01 1 0 1 1 1

1

11 0 0 1 0 1

0

00 1 0 0 1 1

0

10 0 0 0 0 1

f(x1 ,x2 ,x3) = (1,4,5,6)x1 x2 x3

f(0, 0, x3) = x3

f(0, 1, x3) = 0

f(1, 0, x3) = 1

f(1, 1, x3) = x3‘

Example

Multifunction unit

2

x1

2

x2

2

x3

2

x4

2

x5

n

Z1

n

Z2

kSelect

Enable

Control lines

Data lines

Data outputlines

(Bits of information in parallel)

Normal End

Abnormal End

Control output lines

Separate data and control lines

21

Word-based Boolean algebra

The Boolean algebra of binary variable in a straightforward way to a 2m-valued Boolean algebra whose elements are word-based combination function that perform the mapping :

Z : (Bm)n Bm (cf. Z : Bn B, where B={0,1})Let Z(x0, x1, ···, xn-1) be any two-valued Boolean function.

Let X0, X1, ···, Xn-1 denote m-bit binary words,

Xi = (xi,0, xi,0, ··· xi,n-1) for i = 0,1, ···,n-1

Define Z : (Bm)n Bm as follows:

Z(X0, X1, ···, Xn-1) = [Z(x0,0, x1,0, ··· , xn-1,0), Z(x0,1, x1,1, ··· , xn-1,1), ··· Z(x0,m-1, x1,m-1, ··· , xn-1,m-1)]

with this definition, we can extend the usual gate operations, AND, OR, NOT etc., to word-level gate operations. The set of all 22mn combinational function of

up to n m-bit words is a Boolean algebra with respect to the m-bit word-gate operations {AND, OR, NOT}.

A word-based Boolean algebra is useful only in analyzing certain aspects of register-level design, but it does not provide an adequate design theory.

22

1. The operations performed by some of the basic components are numerical rather than logical. can’t incorporate Boolean algebra

2. Many of the logical operations associated with the basic components are complex and do not have the properties of the gate function (associativity, communality) which simplify gate-level design.

3. The lack of a uniform word size for all signals makes it difficult to define a useful algebra to describe operations on these signals.

The status of register-level hardware design remain more an art than a science heuristic and intuitive method.

The register-level hardware design is analogous to the programming using an assembly language.

23

2.2 Register_level Components

Word gates are universal design components(logically complete). In practice, their usefulness is limited because of the relatively simple or low-level operation and because of the variability in word size.

Word gate

X1 X2 x01 x02 xm-1,1 xm-1,2

Z

m

m m

z0 zm-1

……Z = X Y

Z

m

m 1

Zm-1

……

X1 X2

z0

x01 x2

z1

x1,1 xm-1,1

24

MUX

K-input m-bit multiplexer

Mux

x0 x1 X2 p-1

Select input

P

m m m

m

ZEnable e

eaxaxaxz

eaXaXaXZ

pp

pp

p

jjj

i

)(

)(

12,121,10,0

1212110

12

00

25

Decoder : A 1-out-of-2n or 1/2n decoder is a combinational circuit with n input data lines and 2n output data lines such that each of 2n possible combinations Xi activates exactly one of 2n output lines.

Primary application : addressing

Encoder : 2k input data lines and k output address lines 8 bit encoder

z2 z1z0

x7x1x0

26

A magnitude comparator

27

Register

Register : an m-bit register is an ordered set of m F/F(Flip/Flop) to store anm-bit word.

Shift register

register Z

X

Y

4

4

D

cl C 0

1

x0

z0

D

cl C 0

1

x1

z1

D

cl C 0

1

x3

z3

D

cl C 0

1

x2

z2

clock

clear

•

•• •

• •

x2 x3

X(x0,x1,x2,x3)

clock

28

4 bit left-right shift register

Left-(Right-) shifting an unsigned binary number is equivalent to multiplication (division) by 2

Application of register

① storage data(most time)

② serial-parallel and parallel-serial conversion

③ arithmetic operation.

4 bit left-right shift register

left-shift input

right-shift input

Paralleled data input

right-shift output

left-shift output Paralleled data output

Parallel load enableleft-shift enable

right-shift enableclear

29

Counter :a simple sequential machine designed to cycle through a predetermined sequence of k distinct states s0, … ,sk-1 in response to clock pulse.

Application of counter ① a program counter

② to generate timing signal

Buses : A bus is a set of wires designed to transfer all bits of a w-bit word from a specified source to a specified destination. (Unidirectional VS. Bidirectional Dedicated VS. shared)

Dedicated : To connect n units in all possible ways → n(n-1) buses. Shared : connect one of several sources to one of several destination.

Module 16 counter

clock pulse

K0 K1 K2 K3

①

②

③

④

dedicated

① ② ③ ④ Shared : control will be much more complicate

Most time we use shared buses.

30

Programmable Logic Device

IC Cost Dilemma:• IC circuit density increases exponentially with time.• The more ICs you make, the cheaper they get.• Complex logic ICs have very specific functions (so you make fewer).

Question : How do I make very high volume parts that are very dense?

Answer #1 : You make microprocessors or ICs (like automobile ignition controllers) that have very large volume. OR

Answer #2 : You make programmable parts.

Programming DevicesDevices may be:

1. Permanently programmed at the time of IC manufacture, 2. Programmed at the time of use (board level

manufacturing), or 3. Dynamically re-programmed during use.

Permanent programming techniques done at the time of manufacture include final level interconnect addition via metalization or device alteration through laser or e-beam programming

Use time programming techniques include shorting diodes, blowing fuses, shorting devices, and dumping charge into wells.

Dynamically reprogrammed devices can be bulk erased and reprogrammed, or incrementally erased and reprogrammed.

Read Only Memory Read Only Memories (ROM) or Programmable Read

Only Memories (PROM) have: 1. N input lines, 2. M output lines, and 3. 2N decoded minterms.

The N input lines are connected to a fixed decoder AND array of 2N lines. Each line represents a minterm of N variables. Thus there are 2N decoded minterms.

Each of the M outputs lines are connected to an OR gate which has a programmable number of input connections. Any (or all) of the minterms may be ORed together for each of the M output lines.

A program map for a PROM (or ROM) LOOKS LIKE A MULTIPLE OUTPUT FUNCTION TABLE.

Read Only Memories (Continued)Example: A 8 X 4 ROM (N

= 3 input lines, M= 4 output lines)

The fixed "AND" array is a decoder with 3 input bits to one-of-8 minterm output lines.

The programmable "OR" array is shown as a "Wire-OR" function. A "Dot" in the array corresponds to including that minterm.

D7

D6

D5

D4

D3

D2

D1

D0

A2

A1

A0

A

B

C

F3F2F1F0

Read Only Memories (Continued)The 32 X 8 ROM example

corresponds to the multiple output truth table:

Input A B C

Output F0 F1 F2 F3

0 0 0 1 1 0 1 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 1 1 0 0 1 0 1 0 0 0 0 0 0 1 0 1 1 0 1 0 1 1 0 0 0 0 1 1 1 1 0 1 0 0

The "internal organization" of the memory array often does not match the "logical organization".

For example, one manufacture sells an N=13 (inputs decoded to 8192 minterms) by M=8 (outputs) PROM that is internally organized as a 128 rows by 512 columns bit array. An output MUX selects a group of 8 bits.

Programmable Array Logic (PAL)PAL devices are closely related to the ROM in that the device is

organized as a regular array of programmable elements. The PAL has a programmable set of AND terms, combined with a

limited number of fixed OR terms. Where the ROM array is guaranteed to implement any function of

N inputs, the PAL may run out of OR terms. Thus, it may be very important to minimize the number of OR terms in order to use a PAL.

Another difference is that a ROM does not easily allow multi-level implementations. The designer must use separate ROMs for multiple levels. The PAL allows outputs from OR terms to be used as inputs to AND terms, making multi-level design easy.

Programmable Array Logic (Cont.)

Example: 4 Input, 3 Output PAL with fixed, 3-input OR terms and programmable polarity outputs.

This device is unprogrammed.

"0"

"0"

"0"

In1

In2

In3

In4

Out1

Out2

Out3

Programmable Array Logic (Cont.) An "X" at a cross line

includes that variable in the AND term. An "X" in an AND gate removes that term. An "X" at the EXOR forms a "TRUE" term, else the output is complemented. Thus we have:

Out1 = (In1In2'In3 + In1')'

Note that Out1 leaves the PAL in COMPLEMENT form, since the EXOR is tied to one. Out2 is shown in "TRUE" form.

"0"

"0"

"0"

In1

In2

In3

In4

Out1

Out2

Out3

Programmable Array Logic (Cont.)

What are the equations for the other terms?

"0"

"0"

"0"

In1

In2

In3

In4

Out1

Out2

Out3

Programmable Logic Array (PLA) The last type of programmable logic element we will

discuss is the PLA which has a programmable array of AND and OR terms.

A PLA typically has a large number of inputs and outputs and can be used to implement equations that are impractical for a ROM (because of the number of inputs required).

Generally the product terms limit the application of a PLA. Use minimization techniques to reduce the number of product terms in an implementation if it is to fit in a PLA.

The program for a PLA is very similar to the connection array for a multiple output Boolean function, such as that generated by CAFE.

40

Field Programmable Gate Array (FPGA)

o Two-dimensional array of general-purpose logic blocks(cells) whose functions are programmable and the cells are linked to anther by programmable buses.

o The pattern of the data in the configuration memory(CM) determines the cells functions and their interconnection wiring replacing the contents of CM make design changes.

Basic FPGA cell

x0

x1

x2

x3

x4 x5 x6 x7

s0 s1

z4-input MUX

b c d

s0 s1

z4-input MUX

a

0 x0

x1

x2

x3

abcd

Check output by truth table

Main complexity

– FPGA can be reprogrammed repeatedly.

41

FPGAs are very well suited to CAD design and manufacture. The process of mapping a new design into one or more FPGA chips can be almost entirely automated.

1. Compiling from VHDL specification to logic models.

2. CAD tools from logic elements to cells.

3. Transfer to FPGA chips via program units.

Register-Level Design

: The behavior of a register-level machine is designed by a finite set of operations to be performed a word.

Program control unit

micro programmed controller

hardwired controller

data processing unit

42

Design Techniques

: given a set of algorithms in instructions, design a circuit using a specified set of register-level components while satisfying certain cost and performance criteria

heuristic approach due to lack of appropriate mathematical tools

Step1: Define the desired behavior by a set of sequences of register- transfer operations, such that each operation can be implemented directly using the available design components. This constitutes an algorithm AL to be executed

Step2: Analyze AL to determine the type of components and the member of each type required for the data path DP.

C : A + B

C : A ← A + B,C ← C + D;

C (to) : A ← A + B

C (to+1) : C ← C + D

43

Step3: Construct a block diagram for DP using the components identified in step 2. Make the connections between the components so that all data paths

implied by AL are present and the given performance-cost constraints are met.

Step4: Analyze AL and DP to identify the control signals needed. Introduce into DP the logic or control points necessary to apply these signals.

Step5: Design the control unit CU for DP that meets all the requirements of AL.

Step6: Verify, typically by computer simulation, that the final design operates correctly and meets all performance-cost goals

ex) Design of a fixed-point binary multiplier → multiplying two 8-bit number in sign-magnitude form sign-magnitude form

X = x0 x1 ··· x7

P = X · Y Y = y0 y1 ··· y7

P0 = x0 XOR y0

P1 ··· P14 = x1 ··· x7 * y1 ··· y7

sign magnitude

47

Processor-Level Design

: It is concerned with the storage and processing of blocks of information such as

programs and data files.

Components : VLSI

CPU, memories, I/O devices, interconnection networks

Typical questions of interests are

1. What is the time required to execute a given set of programs?

2. How many storage space is needed for a given set of program or data?

3. To what extent are the various components of the system utilized?

No simple characterization of program

→ no easy answer

Use average programs (benchmarks programs)

→ probabilistic or statistical analysis

48

Performance evaluation

: The goal is to determine the function Y (x1, x2, ··· , xn) where xi is a design parameter. Y : processing time, resource utilization

desirable: analytic model (algebraic expression of xi)

→ difficult because the components can interact in complex ways. So, we do not have analytic model.

Queueing Theory : A simple Queueing model for a sing queue and a single server

Limited resources( CPU, memory, I/O ) must be shared.

Items requiring service

Queue Server

Shared resource

Serviced items

49

Another approach : to construct a physical prototype model of the target system, run it under representative working conditions and monitor its performance.

Processor-Level components: CPU, memory, I/O devices, interconnection networks

· CPU Important points in CPU design 1. The type of instruction forming the CPU’s instruction set and their execution times. 2. The register-level organization of the data processing unit. 3. The register-level organization of the program control unit.

4. The manner in which CPU communicates with external device.

VLSI → inexpensive single-chip CPU → inexpensive single system in a chip → changes in CPU architecture for many application.

50

· Memory (affected seriously by VLSI) : vary greatly in cost and performance

1. main memory: consists of relatively fast storage device connected directly to and controlled by CPU 2. secondary memory: consists of slower and less expensive device

that communicate indirectly with CPU via main memory 3. A cache is positioned between CPU and main memory to reduce the average time to access the memory system. A cache is mostly

integrated on the same IC chip as the CPU

· I/O devices: the means by which a computer communicates with the outside world → data transfer do not change the information content or meaning of data I/O device’s speed is slower than main memory. can be controlled by CPU or IOP

51

· Interconnection network : to establish dynamic communication paths between components via

buses → shared How to solve contention → by selecting one of the requesting devices based on some given priority and connecting to the desired bus. The communication between processor-level components is generally asynchronous. The causes of asynchronous communication 1. A high degree of independence exists among components, for

example, CPUs and IOPs execute different types of programs and interact at unpredictable times. 2. Component operating speed vary over a wide range 3. The physical distance separating the components may be too large to permit synchronous transmission

52

Asynchronous communication: by handshaking

A B

ready signal

recognize

transmission

acknowledge

The speed of data transfer is independent of the operating speeds of two devices.

With standard interface: easy modular expansion

Bus control is done by a processor such as CPU of IOP

IOP acts as a buffer between slow I/O devices and fast main memory

In large systems, special processors are used to supervise data transfers over shared buses

Computer network: most difficult communication problem

53

· processor-level design technique

processor-level design is less amenable to formal analysis than register-level

design due to the difficulty of precise description of the desired behavior.

→ The usual approach take a prototype design of known performance, then

modify.

Performance Specification

1. Should be capable of executing a instructions of type b per second.

2. Should be able to support c I/O device of type d.

3. Should be hardware/software compatible with computers of type e.

4. The total cost should not exceed f.

Lack of understanding between the structures of a computer and its performance

→ impossible to predict the performance accurately

54

Design Process

1. Select a prototype design and adopt it to satisfy the given performance constraints.

2. Determine the performance of the proposed system (simulation or benchmarks

performance)

If unsatisfactory, modify the design and repeat until an acceptable design is obtained.

→ contribute to the relative slow evolution of computer architecture.

a single processor single computer

a multi processor single computer

a multi processor multi computer

55

Main

memory

S (switching network)

IOP1

IOPk

CPU

D1

Dp

Dα

Dβ

···

···

I/O devices

a single processor single computer

IOP1

IOPk

CPUn

CPU2

CPU1

···

D1

Dp

Di

Dj

Mn

M2

M1

S1

···

···

···

a multi processor single computer a multi processor multi computer

CPU1

IOP1

D1

Di

Dk

···

···

S1M

IOP2

CPU2

···

S2M

56

Simple performance measure

: main memory bandwidth and CPU instruction execution speed

main memory bandwidth: max. bit rate per sec. at which instructions and data can

be fetched from memory

· CPU speed: single CPU execution time vary from one instruction to another.

the execution time of α common instruction.

e.g. fixed-point addition may be chosen as representative

→ better: take an average of all CPU instruction execution times weighted

by their frequency.

1/te : CPU instruction execution rate

limitation: not the performance of the system as a whole (I/O operation is

ignored)

use a benchmark: a set of actual representative program in a particular environment

i

n

iie tPt

1

Probability (occurrence of type Ii instruction)

Execution time of Ii

57

Queueing model

Parameters: arrival rate, service rate, average arrival rate (λ), average service rate (μ)

actual arrival / service rate: by probability distribution function

Items requiring service

Queue Server

Shared resource

Serviced items

Markovprocess of arrival

The number of server

M / M / 1 model

58

The probability of exactly n items arriving in a time period of length t:

Poisson probability distribution

Interarrival time distribution PI(t)

→ The probability that at least one item arriving during time t

PS (t) = the probability that the service required by an item is completed in time t or less after its removal from the queue.

tλn

p en

tλtnP

!

)(),(

tPI etPtP 1),0(1)(

tSS etPtP 1),0(1)(

59

The performance measurement

1. mean queue length (lQ)

: the average number of items waiting in the system including the items

waiting for services and those actually being served.

2. mean waiting time (tQ)

: the average time that system both waiting for service and being served.

PQ: Probability that at time t, there are exactly n items in the queueing system either

waiting service or being served.

In state of equilibrium, PQ (n, t) = PQ (n)

If λ > μ, then the queue grows indefinitely

: traffic intensity, mean utilization of the server

μλifμ

λ

μ

λnP n

Q )()()( 1

μ

λρ

60

PQ (n) = ρn (1 – ρ)

1)1(

1)1(

)1

1()1(

)()1(

)()1(

)1(

)1(

)(

2

0

0

0

1

1

1

d

d

d

d

d

d

n

n

nPnl

n

n

n

n

n

n

n

n

nQQ

61

λμμλ

μλ

λρ

ρ

λλ

lt QQ

1

1

1

1

1

)(

ρ

ρρ

ρ

ρρll QW

11

2

)( λμμ

λ

μλμμtt QW

111

lW : mean number of items waiting in the queue excluding those being serviced.

tW : mean time spent waiting in the queue excluding service time

Example: New jobs arrive at a computer at an average of 10 per min. The computer is idle 25% of time. What is the average time T that each job spends in the computer? What is the average number of jobs (N) in main memory waiting for execution?

λ = 10, ρ = 1 – 0.25 = 0.75

252250

750

1

10

3

103

4011

3

40

750

10750

10

22

..

.

.,.

ρ

ρlN

λμtT

μμμ

λρ

W

Q

1 2. design methodology computer as a system consisting of its components and interconnection. two...

Documents