1 2. design methodology computer as a system consisting of its components and interconnection. two...
TRANSCRIPT
1
2. Design Methodology
Computer as a system consisting of its components and interconnection.
Two central properties of a system are its structure and behavior.
o Structure : an abstract graph consisting of its block diagram with no functional information.
o Behavioral description : for a given input a, describe the corresponding output f(a). → the function = the behavior
Example : XOR function a schematic diagram
1
87
65
3
4
2
NOTOR x1 x⊕ 2
NOT
x2AND•
ANDx1•
2
Truth table
(behavior)
Hardware Description Languageto describe a system’s structure and behavior VHDL(VHSIC hardware description language) : IEEE standard -Very High Speed Integrated Circuit
o Provide precise, technology – independent description of digital circuits at the gate level and register level.
o Processed with computers together with CAD program. → A VHDL description of a processor P can simulate the behavior and correctness of P before all the details of its design have been specified.
Input Output
x1 x2
0 0 1
1
0 1 0 1
0 1 1 0
x1 x⊕ 2
3
Example) A VHDL description of a half adder
block symbol
A VHDL description : an entity part and an architecture part.
The entity part describe the structure of the height level (as a single component), describing the system interface
entity half-adder is
port(x,y : in bit; sum, carry : out bit);end half-adder
The architecture part specifies behavior and/or internal structure architecture behavior of half-adder is begin
sum <= x xor y;carry <= x and y; end
behavior.
(‘xor’ and ‘and’ are predefined functions)
If xy=’11’ then carry <= 1 else carry <= 0
Half-adderx
y
sum
carry
4
VLSI can also convey timing
Carry ← x and y after 5ns
Architecture structure of half-adder is component xor_circuit port(a,b : in bit; c : out bit); end component; component nand_gate port(d,e : in bit, f : out bit); end component;
signal alpha : bit;
begin
xor : xor_circuit port map(a=>x, b=>y, c=>sum);NAND1 : nand_gate port map(d=>x, e=>y, f=>alpha);NAND2 : nand_gate port map(d=>alpha, e=>alpha, f=>carry);
end structure;
a
b xor c
de
NAND1 fde
NAND2 f
xy
•• sum
carry•alpha
5
The design problem : Given a desired range of behavior and a set of available components, determine a structure formed from these components that achieves the desired behavior at the minimum cost.
The behavior of the overall system may derivate from the desired behavior. → Performance evaluation
The design of a computer can be reviewed on many different levels, depending on the components
Three major design levels : processor level, register level, and a gate level (logic level)
Processor level – CPU, memory, I/O (VLSI/LSI)
Register level – register, combination circuit, and simple sequential circuit (MSI)
Gate level (logic) – logic gate (NAND, NOR), flip-flops
: the subject of classical switching theory, the concern of the logic diagram
6
Top-down hierarchical design
Step 1 : specify the processor-level structure of a system.
Step 2 : specify the register-level structure of each distinct component specified in step 1.
Step 3 : specify the gate-level structure of each distinct component specified in step : substantial theoretical basis (switching theory)
largely an art, highly dependent on the designer
Design problem for combination circuits : Design a logic circuit to realize a given set of combinational functions using the minimum number of gates, with constraints of (1) the # of logic levels < d, (2) fan-in(# of input lines) < k, and fan-out < l.
7
2.2 The Gate Level
Combinational function : time- invariant, input determines output
Sequential function : function of time, input and internal state determine output.
A physical realization of a combinational function is a combinational circuit constructed from gates : AND, OR, and NOT.
A set G of gate types is said to be functionally(logically) complete if any combinational function can be realized by a circuit that contains gates from G only.
NANDs and NORs are very important in logic design because they are a complete gate types by themselves and can be easily, inexpensively manufactured using current IC technologies.
The behavior of any combination circuit can be expressed by Boolean algebra.
o Two-level circuits(d=2) : literal, SOP (sum-of-products), POS (product-of-sum), minterm, maxterm
8
Implicant : A product term p is called an implicant of a function f if for any input combination x, P(x) = 1 implies that f(x) = 1.
The difference between implicant and minterm?
The implicant p is a prime implicant iff it is not covered by any other implicant of the function.
The solution of the gate minimization problem in SOP
Step1 : Compute all P.I. Of the given function f.
Step2 : Select a minimal set of P.I.s whose logical sum is f. K-map and Quine-McClusky tabular method
irreducible set minimal set (cover all minterm should include all essential)
11
111
111
11
111
111
irreducible minimal
Page 9
Quine-McCluskey (tabular) method
1. Arrange all minterms in group such that all terms in the same group have the same # of 1’s in their binary representation.
2. Compare every term of the lowest-index group with each term in the successive group. Whenever possible, combine two terms being compared by means of
gxi+gxi′=g(xi+xi′)=g. Two terms from adjacent groups are combinable if their binary representation differ by just a single digit in the same position (from all 1-cube).
3. The process continues until no further combinations are possible. The remaining unchecked terms constitute the set of PI.
Page 10
Ex) f(x1,x2,x3,x4) = (0,1,2,5,6,7,8,9,10,13,15)
Using prime implicant chart, we can find essential PI
1 1 1 115
(5,7) (5,13) (6,7) (9,13)
0 1 - 1 - 1 0 1 0 1 1 - 1 - 0 1
0 1 1 1 1 1 0 1
7 13
(1,5) (1,9) (2,6)
(2,10) (8,9)
(8,10)
0 1 0 1 0 1 1 0
1 0 0 1
1 0 1 0
5 6 9
10
0 - 0 1 - 0 0 1 0 - 1 0 - 0 1 0 1 0 0 - 1 0 - 0
- 1 1 11 1 - 1 (13,15)
(7,15)
1 2 8
(0,1,8,9) (0,2,8,10) (1,5,9,13)
(5,7,13,15)
- 0 0 - - 0 - 0 - - 0 1 - 1 - 1
(0,1)
(0,2)
(0,8)
0 0 0 -
0 0 - 0
- 0 0 0
0 0 0 00
x1,x2,x3,x4x1,x2,x3,x4x1,x2,x3,x4x1,x2,x3,x4#
0 0 0 1
1 0 0 00 0 1 0
(2,6) (6,7)
(0,1,8,9) (0,2,8,10) (1,5,9,13)
(5,7,13,15)
0 1 2 5 6 7 8 9 10 13 15
Page 11
The essential PI’s are (0,2,8,10) and (5,7,13,15) . So, f(x1,x2,x3,x4) = (0,2,7,8) + (5,7,13,15) + PI’s
Here are 4 different choices (2,6) + (0,1,8,9), (2,6) + (1,5,9,13) (6,7) + (0,1,8,9), or (6,7) + (1,5,9,13)
The reduced PI chart
A PI pj dominates PI pk iff every minterm covered by pk is also covered by pj.
pj pk
m1 m2 m3 m4
(can remove)
Branching method
p1 p2
p3 p4 p5
m1 m2 m3 m4 m5
If we choose p1 first, then p3, p5 are next.
p1
p4p3
p5
p3p2
Quine – McCluskey method (no limitation of the # of variables)
(2,6)
(6,7)
(0,1,8,9)
(1,5,9,13)
1 6 9
Page 12
Ex) f(A,B,C,D) = (3,9,11,12,13,14,15) + d (1,4,6)
PI chart:
3 9 11 12 13 14 15
(1,3, 9, 11) (4, 6,12,14) (9,13,11,15)
(12,13,14,15)
12 13 14 15
Reduced PI chart:
(4, 6,12,14) (9,13,11,15)
(12,13,14,15)
Result: (1,3,9,11) + (12,13,14,15)
SYEN 3330 Digital Systems Chapter 6-1 Page 13
Sequential Circuits
A Sequential circuitcontains:
Storage Elements: Latches Flip-Flops Binary registers
Combinatorial Logic: Implements a multiple-output
switching function Inputs, labeled Inputs, are signals from
the outside. Outputs, labeled Outputs are signals to
the outside. Other inputs, labeled State or Present
State, are signals from memoryelements.
The remaining outputs, labeled NextState are inputs to memory elements.
Combina-tionalLogic
Storage Elements
Inputs Outputs
StateNextState
SYEN 3330 Digital Systems Chapter 6-1 Page 14
Sequential Circuits
Combinatorial Logic: Next state function:
Next State = f(Inputs, State) Output function (Mealy):
Outputs = g(Inputs, State) Alternate output function (Moore):
Outputs = h(State) Type of output function heavily
influences the design
Combina-tionalLogic
Storage Elements
Inputs Outputs
StateNextState
SYEN 3330 Digital Systems Chapter 6-5 Page 15
Sequential Logic Design Process
1. Word description
2. State table or State diagram
3. Reduced state table (not covered)
4. Code Assignment (State, Input, Output)
5. Choose flip-flop types
6. Derive output function(s)
7. Derive excitation functions
8. Draw the logic diagram
16
Input Output
Y(t) X1(t) X2(t) Y(t) J(t) K(t) Z(t)
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
0
0
0
1
0
1
1
1
0
0
0
1
d
d
d
d
d
d
d
d
1
0
0
0
0
1
1
0
1
0
0
1
Truth Table for Combinational Circuit
00 01 11 10
0 1
1 d d d d
00 01 11 10
0 1 1
1 1 1
00 01 11 10
0 d d d d
1 1
J(t) = x1x2
K(t) = x1’x2’
Z(t) = x1x2y = x1’x2’y + x1’x2y’ + x1x2’y’ +x1x2y
Example : Design of a serial binary adder
Serial adder
x1
x2
zc
Enter serially (bit by bit)
Input x1, x2
S1/1S1/0S1/0S0/1S1
S1/0S0/1S0/1S0/0S0
11100100state
no carry
carry
17
Serial adder
x1x2
z
J
K
y
y’
Reading Assignment
x1
x2
J
K
y
y’
Z
clk
1
0
18
Example: Design of a 4-bit-stream serial adderSUM(i) = x1(i) plus x2(i) plus x3(i) plus x4(i) plus C(i – 1)
19
2.3 The Register Level
The next highest level after the gate levelRelated information bits are grouped into ordered sets such as words or vectors.(a register = a storage device for words)
The major component types at the register level
type Component Function
Combinational
Sequential
Word gates
Multiplexers
Decoders and encoders
Programmable logic device
Arithmetic elements
(Parallel) registers
Shift registers
Counters
Boolean operation
Data routing, general function generation
Code checking and conversion
General function generation
Numerical and logical operations
Information storage
Information storage
Serial-parallel conversion
Control/timing signal generation
busRegister level component
Register level component
Gate-level : bit Register-level : word
20
Multiplexer
To implement n-var fn, a multiplexer with (n-1) select input and 2n-1 data input.
The register-level components are linked together by buses. A circuit symbol of a register-level component
S0
S1
x3 x3’0 1
x1
x2
Z = f (x1, x2, x3)
1
01 1 0 1 1 1
1
11 0 0 1 0 1
0
00 1 0 0 1 1
0
10 0 0 0 0 1
f(x1 ,x2 ,x3) = (1,4,5,6)x1 x2 x3
f(0, 0, x3) = x3
f(0, 1, x3) = 0
f(1, 0, x3) = 1
f(1, 1, x3) = x3‘
Example
Multifunction unit
2
x1
2
x2
2
x3
2
x4
2
x5
n
Z1
n
Z2
kSelect
Enable
Control lines
Data lines
Data outputlines
(Bits of information in parallel)
Normal End
Abnormal End
Control output lines
Separate data and control lines
21
Word-based Boolean algebra
The Boolean algebra of binary variable in a straightforward way to a 2m-valued Boolean algebra whose elements are word-based combination function that perform the mapping :
Z : (Bm)n Bm (cf. Z : Bn B, where B={0,1})Let Z(x0, x1, ···, xn-1) be any two-valued Boolean function.
Let X0, X1, ···, Xn-1 denote m-bit binary words,
Xi = (xi,0, xi,0, ··· xi,n-1) for i = 0,1, ···,n-1
Define Z : (Bm)n Bm as follows:
Z(X0, X1, ···, Xn-1) = [Z(x0,0, x1,0, ··· , xn-1,0), Z(x0,1, x1,1, ··· , xn-1,1), ··· Z(x0,m-1, x1,m-1, ··· , xn-1,m-1)]
with this definition, we can extend the usual gate operations, AND, OR, NOT etc., to word-level gate operations. The set of all 22mn combinational function of
up to n m-bit words is a Boolean algebra with respect to the m-bit word-gate operations {AND, OR, NOT}.
A word-based Boolean algebra is useful only in analyzing certain aspects of register-level design, but it does not provide an adequate design theory.
22
1. The operations performed by some of the basic components are numerical rather than logical. can’t incorporate Boolean algebra
2. Many of the logical operations associated with the basic components are complex and do not have the properties of the gate function (associativity, communality) which simplify gate-level design.
3. The lack of a uniform word size for all signals makes it difficult to define a useful algebra to describe operations on these signals.
The status of register-level hardware design remain more an art than a science heuristic and intuitive method.
The register-level hardware design is analogous to the programming using an assembly language.
23
2.2 Register_level Components
Word gates are universal design components(logically complete). In practice, their usefulness is limited because of the relatively simple or low-level operation and because of the variability in word size.
Word gate
X1 X2 x01 x02 xm-1,1 xm-1,2
Z
m
m m
z0 zm-1
……Z = X Y
Z
m
m 1
Zm-1
……
X1 X2
z0
x01 x2
z1
x1,1 xm-1,1
24
MUX
K-input m-bit multiplexer
Mux
x0 x1 X2 p-1
Select input
P
m m m
m
ZEnable e
eaxaxaxz
eaXaXaXZ
pp
pp
p
jjj
i
)(
)(
12,121,10,0
1212110
12
00
25
Decoder : A 1-out-of-2n or 1/2n decoder is a combinational circuit with n input data lines and 2n output data lines such that each of 2n possible combinations Xi activates exactly one of 2n output lines.
Primary application : addressing
Encoder : 2k input data lines and k output address lines 8 bit encoder
z2 z1z0
x7x1x0
26
A magnitude comparator
27
Register
Register : an m-bit register is an ordered set of m F/F(Flip/Flop) to store anm-bit word.
Shift register
register Z
X
Y
4
4
D
cl C 0
1
x0
z0
D
cl C 0
1
x1
z1
D
cl C 0
1
x3
z3
D
cl C 0
1
x2
z2
clock
clear
•
•• •
• •
x2 x3
X(x0,x1,x2,x3)
clock
28
4 bit left-right shift register
Left-(Right-) shifting an unsigned binary number is equivalent to multiplication (division) by 2
Application of register
① storage data(most time)
② serial-parallel and parallel-serial conversion
③ arithmetic operation.
4 bit left-right shift register
left-shift input
right-shift input
Paralleled data input
right-shift output
left-shift output Paralleled data output
Parallel load enableleft-shift enable
right-shift enableclear
29
Counter :a simple sequential machine designed to cycle through a predetermined sequence of k distinct states s0, … ,sk-1 in response to clock pulse.
Application of counter ① a program counter
② to generate timing signal
Buses : A bus is a set of wires designed to transfer all bits of a w-bit word from a specified source to a specified destination. (Unidirectional VS. Bidirectional Dedicated VS. shared)
Dedicated : To connect n units in all possible ways → n(n-1) buses. Shared : connect one of several sources to one of several destination.
Module 16 counter
clock pulse
K0 K1 K2 K3
①
②
③
④
dedicated
① ② ③ ④ Shared : control will be much more complicate
Most time we use shared buses.
30
Programmable Logic Device
IC Cost Dilemma:• IC circuit density increases exponentially with time.• The more ICs you make, the cheaper they get.• Complex logic ICs have very specific functions (so you make fewer).
Question : How do I make very high volume parts that are very dense?
Answer #1 : You make microprocessors or ICs (like automobile ignition controllers) that have very large volume. OR
Answer #2 : You make programmable parts.
Page 31
Programming DevicesDevices may be:
1. Permanently programmed at the time of IC manufacture, 2. Programmed at the time of use (board level
manufacturing), or 3. Dynamically re-programmed during use.
Permanent programming techniques done at the time of manufacture include final level interconnect addition via metalization or device alteration through laser or e-beam programming
Use time programming techniques include shorting diodes, blowing fuses, shorting devices, and dumping charge into wells.
Dynamically reprogrammed devices can be bulk erased and reprogrammed, or incrementally erased and reprogrammed.
Page 32
Read Only Memory Read Only Memories (ROM) or Programmable Read
Only Memories (PROM) have: 1. N input lines, 2. M output lines, and 3. 2N decoded minterms.
The N input lines are connected to a fixed decoder AND array of 2N lines. Each line represents a minterm of N variables. Thus there are 2N decoded minterms.
Each of the M outputs lines are connected to an OR gate which has a programmable number of input connections. Any (or all) of the minterms may be ORed together for each of the M output lines.
A program map for a PROM (or ROM) LOOKS LIKE A MULTIPLE OUTPUT FUNCTION TABLE.
Page 33
Read Only Memories (Continued)Example: A 8 X 4 ROM (N
= 3 input lines, M= 4 output lines)
The fixed "AND" array is a decoder with 3 input bits to one-of-8 minterm output lines.
The programmable "OR" array is shown as a "Wire-OR" function. A "Dot" in the array corresponds to including that minterm.
D7
D6
D5
D4
D3
D2
D1
D0
A2
A1
A0
A
B
C
F3F2F1F0
Page 34
Read Only Memories (Continued)The 32 X 8 ROM example
corresponds to the multiple output truth table:
Input A B C
Output F0 F1 F2 F3
0 0 0 1 1 0 1 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 1 1 0 0 1 0 1 0 0 0 0 0 0 1 0 1 1 0 1 0 1 1 0 0 0 0 1 1 1 1 0 1 0 0
The "internal organization" of the memory array often does not match the "logical organization".
For example, one manufacture sells an N=13 (inputs decoded to 8192 minterms) by M=8 (outputs) PROM that is internally organized as a 128 rows by 512 columns bit array. An output MUX selects a group of 8 bits.
Page 35
Programmable Array Logic (PAL)PAL devices are closely related to the ROM in that the device is
organized as a regular array of programmable elements. The PAL has a programmable set of AND terms, combined with a
limited number of fixed OR terms. Where the ROM array is guaranteed to implement any function of
N inputs, the PAL may run out of OR terms. Thus, it may be very important to minimize the number of OR terms in order to use a PAL.
Another difference is that a ROM does not easily allow multi-level implementations. The designer must use separate ROMs for multiple levels. The PAL allows outputs from OR terms to be used as inputs to AND terms, making multi-level design easy.
Page 36
Programmable Array Logic (Cont.)
Example: 4 Input, 3 Output PAL with fixed, 3-input OR terms and programmable polarity outputs.
This device is unprogrammed.
"0"
"0"
"0"
In1
In2
In3
In4
Out1
Out2
Out3
Page 37
Programmable Array Logic (Cont.) An "X" at a cross line
includes that variable in the AND term. An "X" in an AND gate removes that term. An "X" at the EXOR forms a "TRUE" term, else the output is complemented. Thus we have:
Out1 = (In1In2'In3 + In1')'
Note that Out1 leaves the PAL in COMPLEMENT form, since the EXOR is tied to one. Out2 is shown in "TRUE" form.
"0"
"0"
"0"
In1
In2
In3
In4
Out1
Out2
Out3
Page 38
Programmable Array Logic (Cont.)
What are the equations for the other terms?
"0"
"0"
"0"
In1
In2
In3
In4
Out1
Out2
Out3
Page 39
Programmable Logic Array (PLA) The last type of programmable logic element we will
discuss is the PLA which has a programmable array of AND and OR terms.
A PLA typically has a large number of inputs and outputs and can be used to implement equations that are impractical for a ROM (because of the number of inputs required).
Generally the product terms limit the application of a PLA. Use minimization techniques to reduce the number of product terms in an implementation if it is to fit in a PLA.
The program for a PLA is very similar to the connection array for a multiple output Boolean function, such as that generated by CAFE.
40
Field Programmable Gate Array (FPGA)
o Two-dimensional array of general-purpose logic blocks(cells) whose functions are programmable and the cells are linked to anther by programmable buses.
o The pattern of the data in the configuration memory(CM) determines the cells functions and their interconnection wiring replacing the contents of CM make design changes.
Basic FPGA cell
x0
x1
x2
x3
x4 x5 x6 x7
s0 s1
z4-input MUX
b c d
s0 s1
z4-input MUX
a
0 x0
x1
x2
x3
abcd
Check output by truth table
Main complexity
– FPGA can be reprogrammed repeatedly.
41
FPGAs are very well suited to CAD design and manufacture. The process of mapping a new design into one or more FPGA chips can be almost entirely automated.
1. Compiling from VHDL specification to logic models.
2. CAD tools from logic elements to cells.
3. Transfer to FPGA chips via program units.
Register-Level Design
: The behavior of a register-level machine is designed by a finite set of operations to be performed a word.
Program control unit
micro programmed controller
hardwired controller
data processing unit
42
Design Techniques
: given a set of algorithms in instructions, design a circuit using a specified set of register-level components while satisfying certain cost and performance criteria
heuristic approach due to lack of appropriate mathematical tools
Step1: Define the desired behavior by a set of sequences of register- transfer operations, such that each operation can be implemented directly using the available design components. This constitutes an algorithm AL to be executed
Step2: Analyze AL to determine the type of components and the member of each type required for the data path DP.
C : A + B
C : A ← A + B,C ← C + D;
C (to) : A ← A + B
C (to+1) : C ← C + D
43
Step3: Construct a block diagram for DP using the components identified in step 2. Make the connections between the components so that all data paths
implied by AL are present and the given performance-cost constraints are met.
Step4: Analyze AL and DP to identify the control signals needed. Introduce into DP the logic or control points necessary to apply these signals.
Step5: Design the control unit CU for DP that meets all the requirements of AL.
Step6: Verify, typically by computer simulation, that the final design operates correctly and meets all performance-cost goals
ex) Design of a fixed-point binary multiplier → multiplying two 8-bit number in sign-magnitude form sign-magnitude form
X = x0 x1 ··· x7
P = X · Y Y = y0 y1 ··· y7
P0 = x0 XOR y0
P1 ··· P14 = x1 ··· x7 * y1 ··· y7
sign magnitude
44
45
46
47
Processor-Level Design
: It is concerned with the storage and processing of blocks of information such as
programs and data files.
Components : VLSI
CPU, memories, I/O devices, interconnection networks
Typical questions of interests are
1. What is the time required to execute a given set of programs?
2. How many storage space is needed for a given set of program or data?
3. To what extent are the various components of the system utilized?
No simple characterization of program
→ no easy answer
Use average programs (benchmarks programs)
→ probabilistic or statistical analysis
48
Performance evaluation
: The goal is to determine the function Y (x1, x2, ··· , xn) where xi is a design parameter. Y : processing time, resource utilization
desirable: analytic model (algebraic expression of xi)
→ difficult because the components can interact in complex ways. So, we do not have analytic model.
Queueing Theory : A simple Queueing model for a sing queue and a single server
Limited resources( CPU, memory, I/O ) must be shared.
Items requiring service
Queue Server
Shared resource
Serviced items
49
Another approach : to construct a physical prototype model of the target system, run it under representative working conditions and monitor its performance.
Processor-Level components: CPU, memory, I/O devices, interconnection networks
· CPU Important points in CPU design 1. The type of instruction forming the CPU’s instruction set and their execution times. 2. The register-level organization of the data processing unit. 3. The register-level organization of the program control unit.
4. The manner in which CPU communicates with external device.
VLSI → inexpensive single-chip CPU → inexpensive single system in a chip → changes in CPU architecture for many application.
50
· Memory (affected seriously by VLSI) : vary greatly in cost and performance
1. main memory: consists of relatively fast storage device connected directly to and controlled by CPU 2. secondary memory: consists of slower and less expensive device
that communicate indirectly with CPU via main memory 3. A cache is positioned between CPU and main memory to reduce the average time to access the memory system. A cache is mostly
integrated on the same IC chip as the CPU
· I/O devices: the means by which a computer communicates with the outside world → data transfer do not change the information content or meaning of data I/O device’s speed is slower than main memory. can be controlled by CPU or IOP
51
· Interconnection network : to establish dynamic communication paths between components via
buses → shared How to solve contention → by selecting one of the requesting devices based on some given priority and connecting to the desired bus. The communication between processor-level components is generally asynchronous. The causes of asynchronous communication 1. A high degree of independence exists among components, for
example, CPUs and IOPs execute different types of programs and interact at unpredictable times. 2. Component operating speed vary over a wide range 3. The physical distance separating the components may be too large to permit synchronous transmission
52
Asynchronous communication: by handshaking
A B
ready signal
recognize
transmission
acknowledge
The speed of data transfer is independent of the operating speeds of two devices.
With standard interface: easy modular expansion
Bus control is done by a processor such as CPU of IOP
IOP acts as a buffer between slow I/O devices and fast main memory
In large systems, special processors are used to supervise data transfers over shared buses
Computer network: most difficult communication problem
53
· processor-level design technique
processor-level design is less amenable to formal analysis than register-level
design due to the difficulty of precise description of the desired behavior.
→ The usual approach take a prototype design of known performance, then
modify.
Performance Specification
1. Should be capable of executing a instructions of type b per second.
2. Should be able to support c I/O device of type d.
3. Should be hardware/software compatible with computers of type e.
4. The total cost should not exceed f.
Lack of understanding between the structures of a computer and its performance
→ impossible to predict the performance accurately
54
Design Process
1. Select a prototype design and adopt it to satisfy the given performance constraints.
2. Determine the performance of the proposed system (simulation or benchmarks
performance)
If unsatisfactory, modify the design and repeat until an acceptable design is obtained.
→ contribute to the relative slow evolution of computer architecture.
a single processor single computer
a multi processor single computer
a multi processor multi computer
55
Main
memory
S (switching network)
IOP1
IOPk
CPU
D1
Dp
Dα
Dβ
···
···
I/O devices
a single processor single computer
IOP1
IOPk
CPUn
CPU2
CPU1
···
D1
Dp
Di
Dj
Mn
M2
M1
S1
···
···
···
a multi processor single computer a multi processor multi computer
CPU1
IOP1
D1
Di
Dk
···
···
S1M
IOP2
CPU2
···
S2M
56
Simple performance measure
: main memory bandwidth and CPU instruction execution speed
main memory bandwidth: max. bit rate per sec. at which instructions and data can
be fetched from memory
· CPU speed: single CPU execution time vary from one instruction to another.
the execution time of α common instruction.
e.g. fixed-point addition may be chosen as representative
→ better: take an average of all CPU instruction execution times weighted
by their frequency.
1/te : CPU instruction execution rate
limitation: not the performance of the system as a whole (I/O operation is
ignored)
use a benchmark: a set of actual representative program in a particular environment
i
n
iie tPt
1
Probability (occurrence of type Ii instruction)
Execution time of Ii
57
Queueing model
Parameters: arrival rate, service rate, average arrival rate (λ), average service rate (μ)
actual arrival / service rate: by probability distribution function
Items requiring service
Queue Server
Shared resource
Serviced items
Markovprocess of arrival
The number of server
M / M / 1 model
58
The probability of exactly n items arriving in a time period of length t:
Poisson probability distribution
Interarrival time distribution PI(t)
→ The probability that at least one item arriving during time t
PS (t) = the probability that the service required by an item is completed in time t or less after its removal from the queue.
tλn
p en
tλtnP
!
)(),(
tPI etPtP 1),0(1)(
tSS etPtP 1),0(1)(
59
The performance measurement
1. mean queue length (lQ)
: the average number of items waiting in the system including the items
waiting for services and those actually being served.
2. mean waiting time (tQ)
: the average time that system both waiting for service and being served.
PQ: Probability that at time t, there are exactly n items in the queueing system either
waiting service or being served.
In state of equilibrium, PQ (n, t) = PQ (n)
If λ > μ, then the queue grows indefinitely
: traffic intensity, mean utilization of the server
μλifμ
λ
μ
λnP n
Q )()()( 1
μ
λρ
60
PQ (n) = ρn (1 – ρ)
1)1(
1)1(
)1
1()1(
)()1(
)()1(
)1(
)1(
)(
2
0
0
0
1
1
1
d
d
d
d
d
d
n
n
nPnl
n
n
n
n
n
n
n
n
nQQ
61
λμμλ
μλ
λρ
ρ
λλ
lt QQ
1
1
1
1
1
)(
ρ
ρρ
ρ
ρρll QW
11
2
)( λμμ
λ
μλμμtt QW
111
lW : mean number of items waiting in the queue excluding those being serviced.
tW : mean time spent waiting in the queue excluding service time
Example: New jobs arrive at a computer at an average of 10 per min. The computer is idle 25% of time. What is the average time T that each job spends in the computer? What is the average number of jobs (N) in main memory waiting for execution?
λ = 10, ρ = 1 – 0.25 = 0.75
252250
750
1
10
3
103
4011
3
40
750
10750
10
22
..
.
.,.
ρ
ρlN
λμtT
μμμ
λρ
W
Q