functional units - edward bosworth€¦ · web view32–bit word addresses: 0, 4, 8, ….,...

Functional Units

Functional Units of a Modern Computer

We begin this lecture by repeating a figure from a previous lecture.

Logically speaking a computer has four components.

Connecting the Components

Early schemes for connecting the components could be very messy.

The backplane of a PDP–10, circa 1970

Another Motherboard

Copper Traces on a PCB

These traces, which do the job of wires, are the real advantage of a PCB. All modern busses are just collections of traces, aluminum or copper.

Types of Busses

There are a number of ways to classify computer busses.

Here we focus on one distinction: synchronous and asynchronous.

A bus transaction is an exchange of data or control information.

In simple transactions, a source asserts signals onto a bus, and a destination copies the values into some sort of buffer memory.

More complex transactions might comprise a number of simple transactions.A memory read transaction might include the following:

Assert the memory address on the address lines.

Assert READ on the control lines.

Memory places data on the data lines.

The distinction of simple vs. complex transactions is not important.The term “transaction” covers a variety of possible exchanges.

A synchronous bus is one in which there is a clock signal, used to coordinatebus transactions. Naturally, it has a line (or trace) dedicated to transmit it.

An asynchronous bus is one without a clock signal.It often uses control signals, such as REQ and ACK, to coordinate transactions.

Clock Terminology

Clock frequencies are specified in Hertz, or cycles per second. Common units include KHz (thousand cycles per second), MHz (million cycles per second), and GHZ (billion cycles per second).

The inverse of clock frequency is clock period or clock cycle time.

Consider a 2 GHz clock. The frequency is 2.0(109 per second, written more precisely as 2.0(109 sec–1.

The period is 1.0 / (2.0(109 sec–1) = 0.5(10–9 sec= 0.5 nanosecond

= 500 picoseconds.

Clock signals are described with standard terminology.

The clock may be said to “tick” on every rising edge.

The Clock: System and Bus

A modern computer is a synchronous sequential machine.

A synchronous machine is one in which the actions are coordinated by a central clock signal, either directly or indirectly.

A sequential machine is one that has memory and in which the results of a computation depend on both the input and the contents of memory.

Each computer has one “master clock”, probably not called by that name.

All other clock signals are derived from this master clock.

We can have a frequency divider used to produce a slower clock. For example, a 2 GHz (2000 MHz) clock signal can be converted by a divide–by–four circuit into a 500 MHz clock signal.

There are also frequency doublers, used to produce a faster clock.A 1 GHz clock signal can be doubled to produce a 2 GHz clock signal.

Bus clocks are often in the 150 MHz to 250 MHz range.

Bus Clock Signals

The bus clock frequency is usually a fraction of the system clock frequency.

In a typical example, a 2 GHz system clock can be passed through a divide–by–8 circuit to produce a 250 MHz bus clock signal.

System clock signals are generally represented as square waves, as we have done in previous slides in this lecture.

Bus clock signals are commonly represented as trapezoidal waves, to emphasize the fact that signal levels do not change instantaneously.

There is no abstract theory behind this representation; it is just common.

Signal Assertion Levels

A control signal is said to be asserted when it causes some bus action to take place.

A signal may be asserted high or asserted low.

The terms high and low refer to the voltages used to represent logical signals.In the original TTL protocol, the following are standard.

5.0 volts logic 1

0.0 voltslogic 0.

In TTL, a signal is asserted high when it is driven from 0 volts to 5 volts,in order to achieve its affect.

In TTL, a signal is asserted low when it is driven from 5 volts to 0 volts,in order to achieve its affect.

A control signal that is active high is simply named. X is active high.

There are several notations used to indicate that a signal is active low.Here are several common notations.

Example: Memory Control Signals

Consider a memory unit that is controlled by the CPU.

There are two signals to the memory. One activates it and one indicates the type of transaction commanded by the CPU.

SELECT#This active–low signal selects the memory chip and activates

its circuitry. When high, the chip is inactive.

R/W#

This is a two–valued signal, indicating which of the two

possible transactions is commanded by the CPU.

The symbol with the #, W#, indicates the action commanded when the signal is asserted low.

The symbol without the #, R, indicates the action when the signal is high.

If R/W# = 0, the CPU is writing to memory.

If R/W# = 1, the CPU is reading from memory.

Memory Control

The effect of the control signals on memory is summarized in this truth table.

Select#

R/W#

Action

1

0

Memory contents are not changedor accessed. Nothing happens.

1

1

0

0

CPU writes data to the memory.

0

1

CPU reads data from the memory.

There is a standard shorthand for the above table.

Select#

R/W#

Action

1

d

Memory contents are not changed or accessed.

0

0

CPU writes data to the memory.

0

1

CPU reads data from the memory.

The “d” in the top row indicates that when Select# = 1, the value of the R/W# signal has no affect on what the memory does. It stands for “don’t care”.

Synchronous Control Signals

Here is a depiction of two control signals, each of which is asserted low.

Signal A# is asserted during the second half of clock pulse T1.

Signal B# is asserted during the first half of clock pulse T1 and is shown as asserted for the duration of this diagram.

Signal A# is somewhat unusual in that it is asserted at the same time as a clock transition.

Often signals are asserted before a transition so that the signal is stable at the time of the transition.

Remember the Garden Hose

At the time scales of human experience, propagation of electric signals appears to be instantaneous.

In the world of CPU design, when timings are routinely expressed in tens of picoseconds (1/100 of a nanosecond or 10–11 second), times to charge a control line become significant.

Consider a bus with a length of 10 centimeters, about 4 inches.

The speed of light is about 30 centimeters per nanosecond.The speed of signal transmission of a bus is about 20 centimeters per nanosecond.

The signal will take about 0.5 nanoseconds, or 500 picoseconds, to traverse the length of the bus.

The signal on the bus is said to be stable after 500 picoseconds.

Collections of Signals

We now have notations for control signals.

What about other signals, such as addresses and data?

The interest in representing such non–control signals is not their actual value, but whether or not they are valid.

For example, we do not want to know the actual values on address lines, but whether or not there is a valid address being asserted.

The above notation might be used for a 32–bit address line.

A valid address is asserted on these lines in the middle of the first half of T1.It remains valid until the middle of the first half of T2.

A Synchronous Bus Timing Diagram

This is a bus read. The sequence: the address becomes valid, RD# is asserted, and later the data become valid.

An Asynchronous Bus Timing Diagram

Here, the importance is the interplay of the Master Synchronization (MSYN#) and Slave Synchronization (SSYN#) signals.

The sequence:1.The address becomes valid; MREQ# and RD# are asserted low.

2.MSYN# is asserted low, causing the memory to react.

3.Data become valid and SSYN# is asserted low.

4.When SSYN# goes high, the data are no longer valid.

More on Commercial Memory Modules

Here again is a picture of a commercial memory module.

Here is a picture of the slot into which three memory modules may be placed.

Inserting the DIMM

The next figure shows how to insert the Dual In–Line Memory Module.

The previous memory module had eight memory chips.This module has eight memory chips and a control chip in the middle.

We shall discuss this organization of a module into multiple chips in a later lecture.

Connecting the CPU to the Motherboard

Early CPU chips in the Intel line, such as the 8088 and 80286 could be packaged as DIP (dual in–line pin) modules. Here is a picture of the Intel 8086.

Later Pentium CPU chips required more pins, hence different packaging.

Cooling the Central Processor Unit

A modern CPU generates a lot of heat. This has given rise to an industry devoted to building cooling devices. Here is a picture of one such device.

Note that it is the CPU attached to the top of this radiator device.We do not see the fan.

The Fetch – Execute Cycle

Again we mention the fetch–execute cycle, which is the key feature of a stored program computer.

The instruction is fetched from memory and then executed.

The cycle can be described differently, with more phases, as useful to support discussions, but it is really the same.

Discussions of the MIPS–32 architecture divide the process into five phases:

1.Fetch instruction from memory.

2.Decode the instruction and read two registers.

3.Execute the operation or calculate an address.

4.Access an operand in data memory or write back a result.

5.For LW only, write the results of the memory read into a register.

LW (Load Word) loads a 32–bit register from memory.

This key cycle focuses on the IR (Instruction Register), which is a part of the CU (Control Unit) of the CPU.

Any binary data copied into the IR are automatically interpreted as an instruction.

What Is Executed?

Each executable instruction is represented in memory as a binary number.

Your instructor prefers to use hexadecimal representation of these numbers, as he finds it easier to read. Consider the following sequence.

B8 23 01 05 25 00 8B D8 03 D8 8B CB 2B C8 2B C0 EB EE

This is represented in standard format, with four bytes (eight hexadecimal digits) per line. At this point, we cannot interpret this sequence.

The sequence might represent four floating–point numbers, followed by a 16–bit integer. It might represent a strange sequence of character data.

If this sequence is read into the IR (Instruction Register), but not all at once, it will be interpreted as a sequence of assembly language instructions.

The process of reverse engineering a sequence of absolute binary code into a sequence of assembly language instructions is called disassembly.

The Disassembly of the Sample Code

Here is the disassembled object code.

B82301 MOV AX, 0123 ; Move value 0x0123 to AX052500 ADD AX, 0025 ; Add value 0x0025 to AX8BD8 MOV BX, AX ; Copy contents of AX into BX03D8 ADD BX, AX ; Add contents of BX to AX8BCB MOV CX, AX ; Copy contents of AX into CX2BC8 SUB CX, AX ; Subtract AX from CX2BC0 SUB AX, AX ; Subtract AX from AX, clearing itEBEE JMP 100 ; Go to address 100

We shall discuss disassembly a bit more in future lectures. Basically, this is the main function of a good debugger.

Terminology:

The name “assembly language” refers to the sequence of human readable code as seen in the column above. Assembly language instructions, called “mnemonics” are supposed to be suggestive to the programmer.

The term “object code” refers to the sequence of binary numbers, as shown in the left column and on the previous slide. Such code is often represented in hexadecimal form for easier reading.

The Assembly–Link–Execute Cycle

Here is the common sequence for processing assembly language programs.The process starts with a file in the form of readable text.

The assembler produces listing files (to be read by humans) as well as the object file, containing object code, which is almost ready for execution.

The linker connects the object code of the user program with object code from any external procedures used by that code. It produces a file ready for execution.

The loader places the executable file into memory and starts execution.

The process for higher level languages is similar, except for the fact that the source code is said to be compiled. Compilers for IBM mainframe languages tend to emit assembly language that is then assembled. Other compilers emit object code.

The Microsoft Common Language Infrastructure

Microsoft introduced its .NET architecture on February 13, 2002.

One of the goals of this architecture was to make it easier to call programs written in one of Microsoft’s languages from those written in another.

One common option was to call C++ code from Visual Basic code.The VB code would handle the user interface.The C++ code would handle the computations.

In order to make the languages interoperable, certain standards were required.

One change gave rise to the CIL (Common Intermediate Language), originally called MSIL (Microsoft Intermediate Language).

The process of producing an executable program in this architecture is a bit indirect.

1.The compiler converts the source code into a CIL file.

2.The CIL file is converted to a form called bytecode that is called a .NET assembly.

3.The .NET assembly is assembled into an object file and linked to form

an executable file. This is then executed.

There are many more features to the CLI (Common Language Infrastructure)than are mentioned in this slide.

The Memory Component

Here are a few introductory remarks on memory. It will be studied in greater detail in two future sets of lectures.

Modern computers tend to have memory that is byte addressable. A byte corresponds to the size required to store the older character codes. It is a grouping of 8 binary bits.

The CPU has two registers dedicated to handling memory.

The MAR(Memory Address Register) holds the address being accessed.

The MBR(Memory Buffer Register) holds the data being written to the memory or

being read from the memory. This is sometimes called the Memory Data Register.

Modern memory must allow for addressable entities that have larger sizes. Common lengths are 16, 32, and 64 bits, corresponding to 2, 4, and 8 bytes.

How do we address these larger items? Here, we assume standard memory alignments for multiple byte items, as these yield more efficient programs.

The standard:every byte has its own address.

16–bit ( 2 byte) words have addresses that are multiples of 2.

32–bit (4 byte) words have addresses that are multiples of 4.

Consider a 64 KB Memory

This is considered only because it allows for very easy illustrations.

64 K = 216 = 65,536.

Byte addresses run the range from 0 through 65,535.

16–bit word addresses: 0, 2, 4, …., 65,534.

32–bit word addresses: 0, 4, 8, …., 65,532.

The 16–bit word at address 65,534 contains the bytes at addresses 65,534 and 65,535.

The 32–bit word at address 65,532 contains bytes at addresses 65532, 65533, 65534, and 65535.

In general, a 16–bit word at address N comprises two bytes, at address N and address N + 1.

In general, a 32–bit word at address N comprises four bytes, at addresses N, N + 1, N + 2, and N + 3.

Which bytes go where? This has given rise to the discussion called “Big–Endian vs. Little–Endian”.

Example: A 32–bit Value

Suppose that the 32–bit register EAX contains the value 0x01020304 which hasdecimal representation 16, 909, 060. The value in EAX can be represented as:

This figure uses the bit numbering scheme common for Intel computers.

Suppose the instruction MOV Z, EAX is executed. What is placed into address Z?

This depends on whether the computer is a big–endian or little–endian device.

(The Pentium is a little–endian device, but we shall examine both options.)

The little–endian design stores the “little end” at address Z.

The big–endian design stores the “big end” at address Z.

The 32–Bit Example Continued

The value that goes into each address is a one–byte number, comprising two hexadecimal digits. The hexadecimal digits are not reversed.

AddressBig-EndianLittle-Endian

Z0104

Z + 10203

Z + 20302

Z + 30401

Here is a graphic representation of the same process.

Example: “Core Dump” at Address 0x200

Consider this memory map.

Address

0x200

0x201

0x202

0x203

Contents

02

04

06

08

What is the 32–bit value stored at address 0x200?

Big–endian:

The number is 0x02040608. Its decimal value is

2(2563 + 4(2562 + 6(2561 + 8(1 = 33,818,120

Little–endian:The number is 0x08060402. Its decimal value is

8(2563 + 6(2562 + 4(2561 + 2(1 = 134,611,970.

NOTE:Read the bytes backwards, not the hexadecimal digits.

What is the 16–bit value stored at address 0x200?

Big–endian:

The number is 0x0204. Its decimal value is

2(256 + 4 = 516

Little–endian:The number is 0x0402. Its decimal value is

4(256 + 2(1 = 1,206.

The 16–bit value at 0x200 comprises bytes at addresses 0x200 and 0x201 only.The values at 0x202 and 0x203 are not relevant.

I/O Ports

Computer memory is best considered as an array of addressable bytes.

I/O ports are addressed similarly, but are best considered as an array of addressable registers, each of fixed width.

Each I/O device is connected to the computer bus through a number of registers called I/O ports. There are three main types used.

Data registerFor an input device, this contains data to be sent to the CPU.

For an output device, this contains data to be output.

Control registerThis allows the CPU to control operation of the I/O device.

This register is best considered as a set of Boolean flags, each

controlling one option: can the device interrupt the CPU, etc.

Status registerThis allows the CPU to detect the status of the device.

Is the power on? Is there a paper jam?

The control register is an output register; the CPU writes information to it.

The status register is an input register; the CPU reads status information from it.

Some simple devices combine the two into a Control & Status register.

functional units - edward bosworth€¦ · web view32–bit word addresses: 0, 4, 8, ….,...

Documents