computer systems & networks i – computer systems roger webb (13dj01) [email protected]

Computer Systems & Computer Systems & NetworksNetworks

I – Computer Systems

Roger Webb (13DJ01)

[email protected]

SyllabusSyllabus

1. Introduction

2. Computer Interconnection Structures

3. Internal & External Memory

4. Input & Output

5. Operating Systems – part 1


7. Computer Arithmetic – part 1


9. Instruction Sets: Characteristics & Function

10. Instruction Sets: Addressing Modes & Formats

11. CPU Structure & Function – part 1


13. Control Unit Operation

Computer Organisation & Architecture – William Stallings

.....................................................................chapter 2

.....................chapter 3

........................................chapter 4

................................................................chapter 6

........................................chapter 7

.....................................chapter 8

........chapter 9

.chapter 10

.........................chapter 11

..............................................chapter 14

IntroductionIntroduction

Early Computing Devices: (8000BC – 1600)

8000 BC Tally Bones/Sticks

History of Computing:

Computer:- a person who performs computations

5000 BC – Sumeria – Abaq (writing in dust)

3000 BC – China - the Abacus

800 BC – Abax arrives in Europe

1594 – John Napier natural logs



Mechanical Computers: (1600-1800)

1615 – Slide Rule – William Oughtred

1617 – Napier’s Bones

1642 – Pascaline - Blaise Pascal

1804 – Punch Card Loom

– Jacquard


History of Computing:Electro-Mechanical Computers:

(1800-1890)

1833 – Difference Engine – Babbage

- to make accurate trig and log tables

- never actually finished – “of no use to science”1842 – Analytic Engine (programmable

version)

(1991 Science Museum spent £400,000 to build working model of simplified Difference Engine – 3tons, 6’ high and 10’ long)

1840 – Lady Ada Lovelace – 1st programmer

- suggested that Babbage’s engine could be programmed using punch cards



Mechanical Computers: (1890-1920)

1890 – Holerith’s Tabulating Machine - punch card data for sorting census data

1906 – Lee De Forest invents vacuum tube (triode)1911 – Holerith founds Tabulating Machines Company

1924 – Renamed International Business Machines Corporation (IBM) by Thomas J. Watson



Electro-Mechanical Computers: (1920-1945)

1943 – Colossus (Bletchley Park)

1920 – Punch Card Technology

1944 – Mark I (Harvard)

similar to Babbage engine.IBM: “electro-mechanical computers will never

replace punched card-equipment”

IntroductionIntroductionHistory of Computing:

Electronic Computers: (1940-1950)1942- First Electronic Computer - ABC

uses base-2 number system, memory and logic circuits built from vacuum tubes

IBM: “we will never be interested in an electronic computing machine”

1945: “I think there is a world market for maybe 5 computers” Chairman IBM

1946 – ENIAC (Electronic Numerical Integrator &

Computer)to compute trajectory tables for the army (only 20%of bombs got within 1000ft)ENIAC’s 18,000 vacuum tubes dimmed the lightsof Philadelphia when switched onUsed in building first atomic bomb tests at Los AlamosCalculated pi to 2000 places in seventy hoursOperated in decimal

ENIAC - detailsENIAC - details

• Decimal (not binary)• 20 accumulators of 10 digits• Programmed manually by switches• 18,000 vacuum tubes• 30 tons• 15,000 square feet• 140 kW power consumption• 5,000 additions per second

von Neumann/Turingvon Neumann/Turing

• Stored Program concept• Main memory storing programs and data• ALU operating on binary data• Control unit interpreting instructions from memory

and executing• Input and output equipment operated by control unit• Princeton Institute for Advanced Studies - IAS• Completed 1952

MainMemory

Arithmetic and Logic Unit

Program Control Unit

InputOutputEquipment


History of Computing:Electronic Computers: (1940-1950)

1947- First Computer “Bug”moth found in Mark II relay

causing malfunction – hence “debugging”

1947 – TransistorSchockley, Bardeen & Brattain

paves the way for smaller, low power

and more stable systems to be built using same designs but replacing valve technology with transistor technology

IAS ComputerIAS ComputerIAS – Institute of Advanced Study (Princeton) 1952Basic Von Neuman Architecture:

• 1000 x 40 bit words– Binary number– 2 x 20 bit instructions

• Set of registers (storage in CPU)– Memory Buffer Register– Memory Address Register– Instruction Register– Instruction Buffer Register– Program Counter– Accumulator– Multiplier Quotient

Structure of IAS - detailStructure of IAS - detail

MainMemory




MBR

Arithmetic & Logic Circuits

MQAccumulator

MAR

ControlCircuits

IBR

IR

PC

Address

Instructions& Data

Central Processing Unit


History of Computing:Electronic Computers: (1950-70)1954 - First commercial silicon transistor

Texas Instruments

1958- First Integrated CircuitJack Kilby from Texas Instruments

made simple oscillator

1971 – 1st Microprocessor -IntelIntel (1968) make 4004 processor to do job of 12 chips

1972 – Pioneer 10 uses 4004 chipsintel 4004 is first uprocessor to asteroid belt...



1st Generation Computers: (1950-60)

Using Valve Technology

1951 - First Commercial ComputerUNIVAC–1

correctly forecast US election in 1951 after

only 5% of vote was in

1953- IBM 7011st attempt from IBM

1954 – IBM 6502nd attempt made as upgrade of punch card machines estimated total market as 50 – sold 1000...



2nd Generation Computers: (1960-65)

Using transistor technology1958 – Univac – NTDS32kwords memory, 10,702 transistors, $500,000, 25kw

1960 – PDP 14kwords memory – CRT display

first two player computer game – spacewar (1962)

1963 – PDP 8$18,000 by 1971 25 companies making mini-comp



3rd Generation Computers: (1965-70)

Using Integrated Circuits1964 – IBM System 360

Upward compatibility - no need to redevelop when upgrading

1966 – Funding for ARPA netThe beginnings of the internet

1967 – First Handheld Electronic Calculator



4th Generation Computers: (1971- )

Using Large Scale Integrated Circuits

80486

Pentium 4

1971 – LSI ICs several thousand transistors on a chip

1971 – Intel invent Microprocessor

1980’s– VLSI ICsseveral tens of thousands of transistors


• Multi Processor Systems – the way ahead• Fastest machine on the planet – as of November

2002 – the Earth Simulator in Japan

Generations of ComputerGenerations of Computer

• Vacuum tube - 1946-1957• Transistor - 1958-1964• Small scale integration - 1965 on

– Up to 100 devices on a chip

• Medium scale integration - to 1971– 100-3,000 devices on a chip

• Large scale integration - 1971-1977– 3,000 - 100,000 devices on a chip

• Very large scale integration - 1978 to date– 100,000 - 100,000,000 devices on a chip

• Ultra large scale integration– Over 100,000,000 devices on a chip

Moore’s LawMoore’s Law

• Increased density of components on chip• Gordon Moore – co-founder of Intel• Number of transistors on a chip will double every year• Since 1970’s development has slowed a little

– Number of transistors doubles every 18 months• Cost of a chip has remained almost unchanged• Higher packing density means shorter electrical paths,

giving higher performance• Smaller size gives increased flexibility• Reduced power and cooling requirements• Fewer interconnections increases reliability

Growth in CPU Transistor Growth in CPU Transistor CountCount

Pentium 4

Itanium 2

Semiconductor MemorySemiconductor Memory• 1970• Fairchild produce SRAM

– Size of a single core• i.e. 1 bit of magnetic core storage

– Holds 256 bits– Non-destructive read – 6 transistors– Much faster than core

• Intel produce DRAM– Cheaper but slower – 1 transistor

• Capacity approximately doubles each year

DRAM

Core

Memory

Speeding it upSpeeding it up

• Pipelining• On board cache• On board L1 & L2 cache• Branch prediction• Data flow analysis• Speculative execution

Performance MismatchPerformance Mismatch

• Processor speed increased• Memory capacity increased• Memory speed lags behind processor speed

Trends in DRAM useTrends in DRAM use

SolutionsSolutions

• Increase number of bits retrieved at one time– Make DRAM “wider” rather than “deeper”

• Change DRAM interface– Cache

• Reduce frequency of memory access– More complex cache and cache on chip

• Increase interconnection bandwidth– High speed buses

– Hierarchy of buses

Internet ResourcesInternet Resources

• http://www.intel.com/ – Search for the Intel Museum

• http://www.ibm.com• http://www.dec.com• http://www.digidome.nl• http://ed-thelen.org/comp-hist/• http://www.computerhistory.org/• http://www.cs.man.ac.uk/CCS

SyllabusSyllabus

1. Introduction



4. Input & Output











.....................................................................chapter 2

.....................chapter 3

........................................chapter 4

................................................................chapter 6

........................................chapter 7

.....................................chapter 8

........chapter 9

.chapter 10

.........................chapter 11

..............................................chapter 14

von Neumann/Turingvon Neumann/Turing

• Stored Program concept• Main memory storing programs and data• ALU operating on binary data• Control unit interpreting instructions from memory

and executing• Input and output equipment operated by control unit• Princeton Institute for Advanced Studies - IAS• Completed 1952

MainMemory




Program ConceptProgram Concept• Hardwired systems are inflexible• General purpose hardware can do different tasks, given correct

control signals• Instead of re-wiring, supply a new set of control signals

What is a program?What is a program?• A sequence of steps• For each step, an arithmetic or logical operation is done• For each operation, a different set of control signals is needed

Function of Control UnitFunction of Control Unit• For each operation a unique code is provided

– e.g. ADD, MOVE

• A hardware segment accepts the code and issues the control signals

• We have a computer!

ComponentsComponents

• The Control Unit and the Arithmetic and Logic Unit constitute the Central Processing Unit

• Data and instructions need to get into the system and results out– Input/output

• Temporary storage of code and results is needed– Main memory

Top Level View

Instruction CycleInstruction Cycle

• Two steps:– Fetch

– Execute

Fetch CycleFetch Cycle

• Program Counter (PC) holds address of next instruction to fetch

• Processor fetches instruction from memory location pointed to by PC

MainMemory



InputOutput

MBR


MQAccumulator

MAR

ControlCircuits

IBR

IR

PC

Address

Instructions& Data

Central Processing Unit• Increment PC

– Unless told otherwise

• Instruction loaded into Instruction Register (IR)

• Processor interprets instruction and performs required actions

Execute CycleExecute Cycle

• Processor-memory– data transfer between CPU and main memory

• Processor I/O– Data transfer between CPU and I/O module

MainMemory



InputOutput

MBR


MQAccumulator

MAR

ControlCircuits

IBR

IR

PC

Address

Instructions& Data

Central Processing Unit• Data processing– Some arithmetic or

logical operation on data

• Control– Alteration of

sequence of operations

– e.g. jump

• Combination of above

Example of Program Example of Program ExecutionExecution

ConnectingConnecting

• All the units must be connected• Different type of connection for different type of unit

– Memory

– Input/Output

– CPU

Memory ConnectionMemory Connection

• Receives and sends data• Receives addresses (of locations)• Receives control signals

– Read

– Write

– Timing

Input/Output ConnectionInput/Output Connection

• Similar to memory from computer’s viewpoint• Output

– Receive data from computer

– Send data to peripheral

• Input– Receive data from peripheral

– Send data to computer

• Receive control signals from computer• Send control signals to peripherals

– e.g. spin disk

• Receive addresses from computer– e.g. port number to identify peripheral

• Send interrupt signals (control)

CPU ConnectionCPU Connection

• Reads instruction and data• Writes out data (after processing)• Sends control signals to other units• Receives (& acts on) interrupts

BusesBuses

• There are a number of possible interconnection systems• Single and multiple BUS structures are most common

• e.g. Control/Address/Data bus (PC)• e.g. Unibus (DEC-PDP)

Data BusData Bus• Carries data

– Remember that there is no difference between “data” and “instruction” at this level

• “Width” is a key determinant of performance– 8, 16, 32, 64 bit – number of parallel lines

Address busAddress bus• Identify the source or destination of data

• e.g. CPU needs to read an instruction (data) from a given location in memory

• Bus width determines maximum memory capacity of system– e.g. 8080 has 16 bit address bus giving 64k address space

Control BusControl Bus• Control and timing information

– Memory read/write signal– Interrupt request– Clock signals

Bus Interconnection Bus Interconnection SchemeScheme

Big, Red & Double Decker?Big, Red & Double Decker?

• What do buses look like?– Parallel lines on circuit boards

– Ribbon cables

– Strip connectors on mother boards• e.g. PCI

– Sets of wires

Single Bus ProblemsSingle Bus Problems

• Lots of devices on one bus leads to:– Propagation delays

• Long data paths mean that co-ordination of bus use can adversely affect performance

• If aggregate data transfer approaches bus capacity

• Most systems use multiple buses to overcome these problems

Traditional Bus StrtuctureTraditional Bus Strtucture

High Performance BusHigh Performance Bus

Bus TypesBus Types

• Dedicated– Separate data & address lines

• Multiplexed– Shared lines

– Address valid or data valid control line

– Advantage - fewer lines

– Disadvantages• More complex control• Ultimate performance

Bus ArbitrationBus Arbitration

• More than one module controlling the bus• e.g. CPU and DMA controller• Only one module may control bus at one time• Arbitration may be centralised or distributed

Centralised ArbitrationCentralised Arbitration

• Single hardware device controlling bus access– Bus Controller or Arbiter

• May be part of CPU or separate

Distributed ArbitrationDistributed Arbitration

• Each module may claim the bus• Control logic on all modules

TimingTiming

• Co-ordination of events on bus• Synchronous

– Events determined by clock signals

– Control Bus includes clock line

– A single 1-0 is a bus cycle

– All devices can read clock line

– Usually sync on leading edge

– Usually a single cycle for an event

Synchronous Timing Synchronous Timing DiagramDiagram

Asynchronous Timing Asynchronous Timing DiagramDiagram

SyllabusSyllabus

1. Introduction



4. Input & Output











.....................................................................chapter 2

.....................chapter 3

........................................chapter 4

................................................................chapter 6

........................................chapter 7

.....................................chapter 8

........chapter 9

.chapter 10

.........................chapter 11

..............................................chapter 14

Memory CharacteristicsMemory Characteristics

• Location• Capacity• Unit of transfer• Access method• Performance• Physical type• Physical characteristics• Organisation

CapacityCapacity• Word size– The natural unit of organisation

• Number of words– or Bytes

LocationLocation

• CPU – Internal - External

Unit of TransferUnit of Transfer• Internal

– Usually governed by data bus width

• External– Usually a block which is much larger than a word

• Addressable unit– Smallest location which can be uniquely addressed– Word internally

Access MethodsAccess Methods

• Sequential– Start at the beginning and read through in order– Access time depends on location of data and previous location– e.g. tape

• Direct– Individual blocks have unique address– Access is by jumping to vicinity plus sequential search– Access time depends on location and previous location– e.g. disk

• Random– Individual addresses identify locations exactly– Access time is independent of location or previous access– e.g. RAM

• Associative– Data is located by a comparison with contents of a portion of the store– Access time is independent of location or previous access– e.g. cache

Memory HierarchyMemory Hierarchy

• Registers– In CPU

• Internal or Main memory– May include one or more levels of cache

– “RAM”

• External memory– Backing store

PerformancePerformance

• Access time– Time between presenting the address and getting the

valid data

• Memory Cycle time– Time may be required for the memory to “recover” before

next access

– Cycle time is access + recovery

• Transfer Rate– Rate at which data can be moved

Physical TypesPhysical Types

• Semiconductor– RAM

• Magnetic– Disk & Tape

• Optical– CD & DVD

• Others– Bubble

– Hologram

Physical CharacteristicsPhysical Characteristics• Decay• Volatility• Erasable• Power consumption

OrganisationOrganisation• Physical arrangement of bits into words• Not always obvious• e.g. interleaved

The Bottom LineThe Bottom Line• How much?

– Capacity• How fast?

– Time is money• How expensive?

Hierarchy ListHierarchy List

• Registers• L1 Cache• L2 Cache• Main memory• Disk cache• Disk• Optical• Tape

CPU

Semiconductor MemorySemiconductor Memory

• RAM – Misnamed as all semiconductor memory is random

access

– Read/Write

– Volatile

– Temporary storage

– Static or dynamic

Dynamic RAMDynamic RAM

• Bits stored as charge in capacitors• Charges leak• Need refreshing even when powered• Simpler construction• Smaller per bit• Less expensive• Need refresh circuits• Slower• Main memory

Static RAMStatic RAM

• Bits stored as on/off switches• No charges to leak• No refreshing needed when powered• More complex construction• Larger per bit• More expensive• Does not need refresh circuits• Faster• Cache

Read Only Memory (ROM)Read Only Memory (ROM)

• Permanent storage• Microprogramming (see later)• Library subroutines• Systems programs (BIOS)• Function tables

Types of ROMTypes of ROM

• Written during manufacture– Very expensive for small runs

• Programmable (once)– PROM

– Needs special equipment to program

• Read “mostly”– Erasable Programmable (EPROM)

• Erased by UV

– Electrically Erasable (EEPROM)• Takes much longer to write than read

– Flash memory• Erase whole memory electrically

Organisation in detailOrganisation in detail

• A 16Mbit chip can be organised as 1M of 16 bit words

• A bit per chip system has 16 lots of 1Mbit chip with bit 1 of each word in chip 1 and so on

• A 16Mbit chip can be organised as a 2048 x 2048 x 4bit array– Reduces number of address pins

• Multiplex row address and column address• 11 pins to address (211=2048)• Adding one more pin doubles range of values so x4

capacity

RefreshingRefreshing

• Refresh circuit included on chip• Disable chip• Count through rows• Read & Write back• Takes time• Slows down apparent performance

Typical 16 Mb DRAM (4M x 4)Typical 16 Mb DRAM (4M x 4)

Module Module OrganisationOrganisation

1 bit per chip per word

Module Organisation (2)Module Organisation (2)Larger Memory Size

CacheCache

• Small amount of fast memory• Sits between normal main memory and CPU• May be located on CPU chip or module

So you want fast?So you want fast?

• It is possible to build a computer which uses only static RAM

• This would be very fast• This would need no cache

– How can you cache cache?

• This would cost a very large amount

Cache operation - overviewCache operation - overview

• CPU requests contents of memory location• Check cache for this data• If present, get from cache (fast)• If not present, read required block from main memory

to cache• Then deliver from cache to CPU• Cache includes tags to identify which block of main

memory is in each cache slot

Cache DesignCache Design

• Size• Mapping Function• Replacement Algorithm• Write Policy• Block Size• Number of Caches

Size does matterSize does matter

• Cost– More cache is expensive

• Speed– More cache is faster (up to a point)

– Checking cache for data takes time

Locality of ReferenceLocality of Reference

• During the course of the execution of a program, memory references tend to cluster• e.g. loops

Newer RAM Technology (1)Newer RAM Technology (1)

• Basic DRAM same since first RAM chips• Enhanced DRAM

– Contains small SRAM as well

– SRAM holds last line read (c.f. Cache!)

• Cache DRAM– Larger SRAM component

– Use as cache or serial buffer

Newer RAM Technology (2)Newer RAM Technology (2)

• Synchronous DRAM (SDRAM)– currently on DIMMs

– Access is synchronized with an external clock

– Address is presented to RAM

– RAM finds data (CPU waits in conventional DRAM)

– Since SDRAM moves data in time with system clock, CPU knows when data will be ready

– CPU does not have to wait, it can do something else

– Burst mode allows SDRAM to set up stream of data and fire it out in block

SDRAMSDRAM

SyllabusSyllabus

1. Introduction



4. Input & Output











.....................................................................chapter 2

.....................chapter 3

........................................chapter 4

................................................................chapter 6

........................................chapter 7

.....................................chapter 8

........chapter 9

.chapter 10

.........................chapter 11

..............................................chapter 14

Input/Output ProblemsInput/Output Problems

• Wide variety of peripherals– Delivering different amounts of data

– At different speeds

– In different formats

• All slower than CPU and RAM• Need I/O modules

– Interface to CPU and Memory

– Interface to one or more peripherals

I/O Module FunctionI/O Module Function

• Control & Timing• CPU Communication• Device Communication• Data Buffering• Error Detection

I/O StepsI/O Steps

• CPU checks I/O module device status• I/O module returns status• If ready, CPU requests data transfer• I/O module gets data from device• I/O module transfers data to CPU• Variations for output, DMA, etc.

I/O Module

Links to Peripheral Devices

Control bus

Data bus

Address bus

I/O Module DecisionsI/O Module Decisions

• Hide or reveal device properties to CPU• Support multiple or single device• Control device functions or leave for CPU• Also O/S decisions

– e.g. Unix treats everything it can as a file

Input Output TechniquesInput Output Techniques

• Programmed• Interrupt driven• Direct Memory Access (DMA)

Programmed I/OProgrammed I/O• CPU has direct control over I/O

– Sensing status

– Read/write commands

– Transferring data

• CPU waits for I/O module to complete operation– Wastes CPU time

Programmed I/O - detailProgrammed I/O - detail

• CPU requests I/O operation• I/O module performs operation• I/O module sets status bits• CPU checks status bits periodically• I/O module does not inform CPU directly• I/O module does not interrupt CPU• CPU may wait or come back later

I/O CommandsI/O Commands

• CPU issues address– Identifies module (& device if >1 per module)

• CPU issues command– Control - telling module what to do

• e.g. spin up disk

– Test - check status• e.g. power? Error?

– Read/Write• Module transfers data via buffer from/to device

Addressing I/O DevicesAddressing I/O Devices• Under programmed I/O data transfer is very like memory

access (CPU viewpoint)• Each device given unique identifier• CPU commands contain identifier (address)

I/O MappingI/O Mapping

• Memory mapped I/O– Devices and memory share an address space– I/O looks just like memory read/write– No special commands for I/O

• Large selection of memory access commands available

• Isolated I/O– Separate address spaces– Need I/O or memory select lines– Special commands for I/O

• Limited set

Interrupt Driven I/OInterrupt Driven I/O

• Overcomes CPU waiting• No repeated CPU checking of device• I/O module interrupts when ready

Interrupt Driven I/O - Basic OperationInterrupt Driven I/O - Basic Operation

• CPU issues read command• I/O module gets data from peripheral whilst CPU

does other work• I/O module interrupts CPU• CPU requests data• I/O module transfers data

CPU ViewpointCPU Viewpoint

• Issue read command• Do other work• Check for interrupt at end of each instruction cycle• If interrupted:-

– Save context (registers)

– Process interrupt• Fetch data & store

• See Operating Systems notes

Design IssuesDesign Issues

• How do you identify the module issuing the interrupt?• How do you deal with multiple interrupts?

– i.e. an interrupt handler being interrupted

Identifying Interrupting ModuleIdentifying Interrupting Module• Different line for each module

– Limits number of devices• Software poll

– CPU asks each module in turn - Slow• Daisy Chain or Hardware poll

– Interrupt Acknowledge sent down a chain– Module responsible places vector on bus– CPU uses vector to identify handler routine

• Bus Master– Module must claim the bus before it can raise interrupt– e.g. PCI & SCSI

Multiple InterruptsMultiple Interrupts

• Each interrupt line has a priority• Higher priority lines can interrupt lower priority lines• If bus mastering only current master can interrupt

Direct Memory AccessDirect Memory Access

• Interrupt driven and programmed I/O require active CPU intervention– Transfer rate is limited

– CPU is tied up

• DMA is the answer

DMA FunctionDMA Function

• Additional Module (hardware) on bus• DMA controller takes over from CPU for I/O

DMA OperationDMA Operation

• CPU tells DMA controller:-– Read/Write

– Device address

– Starting address of memory block for data

– Amount of data to be transferred

• CPU carries on with other work• DMA controller deals with transfer• DMA controller sends interrupt when finished

DMA Transfer - Cycle StealingDMA Transfer - Cycle Stealing

• DMA controller takes over bus for a cycle• Transfer of one word of data• Not an interrupt

– CPU does not switch context

• CPU suspended just before it accesses bus– i.e. before an operand or data fetch or a data write

• Slows down CPU but not as much as CPU doing transfer

DMA Configurations (1)DMA Configurations (1)

• Single Bus, Detached DMA controller• Each transfer uses bus twice

– I/O to DMA then DMA to memory

• CPU is suspended twice

CPUDMAController

I/ODevice

I/ODevice

Main Memory

DMA Configurations (2)DMA Configurations (2)

• Single Bus, Integrated DMA controller• Controller may support >1 device• Each transfer uses bus once

– DMA to memory

• CPU is suspended once

CPUDMAController

I/ODevice

I/ODevice

Main Memory

DMAController

I/ODevice

SyllabusSyllabus

1. Introduction



4. Input & Output











.....................................................................chapter 2

.....................chapter 3

........................................chapter 4

................................................................chapter 6

........................................chapter 7

.....................................chapter 8

........chapter 9

.chapter 10

.........................chapter 11

..............................................chapter 14

O/S Objectives and FunctionsO/S Objectives and Functions

• Convenience– Making the computer easier to use

• Efficiency– Allowing better use of computer resources

LayersLayers

Operating System ServicesOperating System Services

• Program creation• Program execution• Access to I/O devices• Controlled access to files• System access• Error detection and response• Accounting

Types of Operating SystemTypes of Operating System

• Interactive• Batch• Single program (Uni-programming)• Multi-programming (Multi-tasking)

Early SystemsEarly Systems

• Late 1940s to mid 1950s– No Operating System

– Programs interact directly with hardware

– Two main problems:– Scheduling– Setup time

Simple Batch SystemsSimple Batch Systems

• Resident Monitor program• Users submit jobs to operator• Operator batches jobs• Monitor controls sequence of events to process batch• When one job is finished, control returns to Monitor

which reads next job• Monitor handles scheduling

Desirable Hardware FeaturesDesirable Hardware Features

• Memory protection– To protect the Monitor (not screen saver!)

• Timer– To prevent a job monopolizing the system

• Privileged instructions– Only executed by Monitor

– e.g. I/O

• Interrupts– Allows for relinquishing and regaining control

Multi-programmed Batch SystemsMulti-programmed Batch Systems

• I/O devices very slow• When one program is waiting for I/O, another can use

the CPU

Single ProgramSingle Program

Multi-Programming with 2 ProgramsMulti-Programming with 2 Programs

Multi-Programming with 3 ProgramsMulti-Programming with 3 Programs

Time Sharing SystemsTime Sharing Systems

• Allow users to interact directly with the computer– i.e. Interactive

• Multi-programming allows a number of users to interact with the computer

Process StatesProcess States

Process Control BlockProcess Control Block

• Identifier PID• State Running, Ready, Blocked, New, Exit • Priority High, Medium, Low, etc….• Program Counter Where were we?• Memory pointersPreserve our calculations• Context data What was in the registers• I/O status Reading, Writing, Null• Accounting information How Much Used Already

Memory ManagementMemory Management

• Uni-program– Memory split into two

– One for Operating System (monitor)

– One for currently executing program

• Multi-program– “User” part is sub-divided and shared among active

processes

SwappingSwapping

• Problem: I/O is so slow compared with CPU that even in multi-programming system, CPU can be idle most of the time

• Solutions:– Increase main memory

• Expensive• Leads to larger programs

– Swapping

What is Swapping?What is Swapping?

• Long term queue of processes stored on disk• Processes “swapped” in as space becomes available• As a process completes it is moved out of main

memory• If none of the processes in memory are ready (i.e. all

I/O blocked)– Swap out a blocked process to intermediate queue

– Swap in a ready process or a new process

– But swapping is an I/O process...

PartitioningPartitioning

• Splitting memory into sections to allocate to processes (including Operating System)

• Fixed-sized partitions– May not be equal size

– Process is fitted into smallest hole that will take it (best fit)

– Some wasted memory

– Leads to variable sized partitions

Variable Sized Partitions (1)Variable Sized Partitions (1)

• Allocate exactly the required memory to a process• This leads to a hole at the end of memory, too small

to use– Only one small hole - less waste

• When all processes are blocked, swap out a process and bring in another

• New process may be smaller than swapped out process

• Another hole

Variable Sized Partitions (2)Variable Sized Partitions (2)

• Eventually have lots of holes (fragmentation)• Solutions:

– Coalesce - Join adjacent holes into one large hole

– Compaction - From time to time go through memory and move all hole into one free block (c.f. disk de-fragmentation)

Effect of Dynamic PartitioningEffect of Dynamic Partitioning

RelocationRelocation

• No guarantee that process will load into the same place in memory

• Instructions contain addresses– Locations of data

– Addresses for instructions (branching)

• Logical address - relative to beginning of program• Physical address - actual location in memory (this

time)• Automatic conversion using base address

PagingPaging

• Split memory into equal sized, small chunks -page frames

• Split programs (processes) into equal sized small chunks - pages

• Allocate the required number page frames to a process

• Operating System maintains list of free frames• A process does not require contiguous page frames• Use page table to keep track

Logical and Physical Logical and Physical Addresses - PagingAddresses - Paging

Virtual MemoryVirtual Memory

• Demand paging– Do not require all pages of a process in memory

– Bring in pages as required

• Page fault– Required page is not in memory

– Operating System must swap in required page

– May need to swap out a page to make space

– Select page to throw out based on recent history

ThrashingThrashing

• Too many processes in too little memory• Operating System spends all its time swapping• Little or no real work is done• Disk light is on all the time

• Solutions– Good page replacement algorithms

– Reduce number of processes running

– Fit more memory

BonusBonus

• We do not need all of a process in memory for it to run• We can swap in pages as required• So - we can now run processes that are bigger than total

memory available!

• Main memory is called real memory• User/programmer sees much bigger memory - virtual

memory

Page Table StructurePage Table Structure

SegmentationSegmentation

• Paging is not (usually) visible to the programmer• Segmentation is visible to the programmer• Usually different segments allocated to program and

data• May be a number of program and data segments

Advantages of SegmentationAdvantages of Segmentation

• Simplifies handling of growing data structures• Allows programs to be altered and recompiled

independently, without re-linking and re-loading• Lends itself to sharing among processes• Lends itself to protection• Some systems combine segmentation with paging

SyllabusSyllabus

1. Introduction



4. Input & Output











.....................................................................chapter 2

.....................chapter 3

........................................chapter 4

................................................................chapter 6

........................................chapter 7

.....................................chapter 8

........chapter 9

.chapter 10

.........................chapter 11

..............................................chapter 14

Arithmetic & Logic UnitArithmetic & Logic Unit

• Does the calculations• Everything else in the computer is there to service

this unit• Handles integers• May handle floating point (real) numbers• May be separate FPU (maths co-processor)• May be on chip separate FPU (486DX +)

Integer RepresentationInteger Representation

• Only have 0 & 1 to represent everything• Positive numbers stored in binary

– e.g. 41=00101001

• No minus sign• No decimal point• Instead we can use either:

– Sign-Magnitude

– Two’s compliment

Sign-MagnitudeSign-Magnitude

• Left most bit is sign bit• 0 means positive• 1 means negative• +18 = 00010010• -18 = 10010010• Problems

– Need to consider both sign and magnitude in arithmetic

– Two representations of zero (+0 and -0)

Two’s ComplimentTwo’s Compliment

• +3 = 00000011• +2 = 00000010• +1 = 00000001• +0 = 00000000• -1 = 11111111• -2 = 11111110• -3 = 11111101

BenefitsBenefits

• One representation of zero• Arithmetic works easily (see later)• Negating is fairly easy

– 3 = 00000011

– Boolean complement gives 11111100

– Add 1 to LSB 11111101

Geometric Depiction of Twos Geometric Depiction of Twos Complement IntegersComplement Integers

Negation Special Case 1Negation Special Case 1

• 0 = 00000000• Bitwise not 11111111• Add 1 to LSB +1• Result 1 00000000• Overflow is ignored, so:• - 0 = 0

Negation Special Case 2Negation Special Case 2

• -128 = 10000000• bitwise not 01111111• Add 1 to LSB +1• Result 10000000• So:• -(-128) = -128 X• Monitor MSB (sign bit)• It should change during negation• In this case 128 is too big to be represented

Range of NumbersRange of Numbers

• 8 bit 2s compliment– +127 = 01111111 = 27 -1

– -128 = 10000000 = -27

• 16 bit 2s compliment– +32767 = 011111111 11111111 = 215 - 1

– -32768 = 100000000 00000000 = -215

Conversion Between Conversion Between LengthsLengths

• Positive number pack with leading zeros• +18 = 00010010• +18 = 00000000 00010010• Negative numbers pack with leading ones• -18 = 10010010• -18 = 11111111 10010010• i.e. pack with MSB (sign bit)

Addition and SubtractionAddition and Subtraction

• Normal binary addition• Monitor sign bit for overflow

• Take twos compliment of subtrahend and add to minuend– i.e. a - b = a + (-b)

• So we only need addition and complement circuits

Hardware for Addition Hardware for Addition and Subtractionand Subtraction

MultiplicationMultiplication

• Complex• Work out partial product for each digit• Take care with place value (column)• Add partial products

Multiplication ExampleMultiplication Example

1011 Multiplicand (11 dec)

x 1101 Multiplier (13 dec)

1011 Partial products

0000 Note: if multiplier bit is 1 copy

1011 multiplicand (place value)

1011 otherwise zero

10001111 Product (143 dec)

Note: need double length result

Unsigned Binary MultiplicationUnsigned Binary Multiplication

1 0 1 1

1 1 0 10 0 0 0

If this digit is a one then

add multiplicand

Shift A and Q right

Initial Values


1 0 1 1

1 1 0 10 0 0 0


add multiplicand

Shift A and Q right

First Cycle

1 0 1 1


1 0 1 1

1 1 0 11 0 1 1


add multiplicand

Shift A and Q right

First Cycle

1 0 1 1


1 0 1 1

1 1 1 00 1 0 1


add multiplicand

Shift A and Q right

First Cycle

1 0 1 1


1 0 1 1

1 1 1 00 1 0 1


add multiplicand

Shift A and Q right

Second Cycle

0 0 0 0


1 0 1 1

1 1 1 10 0 1 0


add multiplicand

Shift A and Q right

Second Cycle

0 0 0 0


1 0 1 1

1 1 1 10 0 1 0


add multiplicand

Shift A and Q right

Third Cycle

1 0 1 1


1 0 1 1

1 1 1 11 1 0 1


add multiplicand

Shift A and Q right

Third Cycle

1 0 1 1


1 0 1 1

1 1 1 10 1 1 0


add multiplicand

Shift A and Q right

Third Cycle

1 0 1 1


1 0 1 1

1 1 1 10 1 1 0


add multiplicand

Shift A and Q right

Fourth Cycle

1 0 1 1


1 0 1 1

1 1 1 10 0 0 1


add multiplicand

Shift A and Q right

Fourth Cycle

1 0 1 1

1


1 0 1 1

1 1 1 1


add multiplicand

Shift A and Q right

Fourth Cycle

1 0 1 1

1 0 0 0

Flowchart for Unsigned Flowchart for Unsigned Binary MultiplicationBinary Multiplication

Multiplying Negative Multiplying Negative NumbersNumbers

• This does not work!• Solution 1

– Convert to positive if required

– Multiply as above

– If signs were different, negate answer

• Solution 2– Booth’s algorithm

Example of Booth’s AlgorithmExample of Booth’s Algorithm•Multiplier and multiplicand placed in Q and M•Additional 1 bit register Q-1

•Results appear in A and Q•A and Q initialised to 0•Q0 and Q-1 examined

•If both same (0-0 or 1-1)Shift A and Q right

If differentIf 0-1 then Add MIf 1-0 then Subtract MShift A and Q right

•Shift preserves the sign

An-1 An-2 and remains in An-1

DivisionDivision

• More complex than multiplication• Negative numbers are really bad!• Based on long division

001111

1011

00001101

100100111011001110

1011

1011100

Quotient

Dividend

Remainder

PartialRemainders

Divisor

Real NumbersReal Numbers

• Numbers with fractions• Could be done in pure binary

– 1001.1010 = 24 + 20 +2-1 + 2-3 =9.625

• Where is the binary point?• Fixed?

– Very limited

• Moving?– How do you show where it is?

Floating PointFloating Point

• +/- .significand x 2exponent

• Misnomer – not really a floating point• Point is actually fixed between sign bit and body of

mantissa• Exponent indicates place value (point position)

Sig

n bi

t

BiasedExponent

Significand or Mantissa

Floating Point ExamplesFloating Point Examples

• Mantissa is stored in 2s compliment• Exponent is in excess or biased notation

– e.g. Excess (bias) 128 means

– 8 bit exponent field

– Pure value range 0-255

– Subtract 128 to get correct value

– Range -128 to +127

FP RangesFP Ranges

• For a 32 bit number– 8 bit exponent

– +/- 2256 1.5 x 1077

• Accuracy– The effect of changing lsb of mantissa

– 23 bit mantissa 2-23 1.2 x 10-7

– About 6 decimal places

Expressible NumbersExpressible Numbers

IEEE 754IEEE 754

• Standard for floating point storage• 32 and 64 bit standards• 8 and 11 bit exponent respectively• Extended formats (both mantissa and exponent) for

intermediate results

FP Arithmetic +/-FP Arithmetic +/-

• Check for zeros• Align significands (adjusting exponents)• Add or subtract significands• Normalize

FP Arithmetic x/FP Arithmetic x/

• Check for zero• Add/subtract exponents • Multiply/divide significands (watch sign)• Normalize• Round• All intermediate results should be in double length

storage

FloatingFloatingPointPoint

MultiplicationMultiplication

FloatingFloatingPointPoint

DivisionDivision

SyllabusSyllabus

1. Introduction



4. Input & Output











.....................................................................chapter 2

.....................chapter 3

........................................chapter 4

................................................................chapter 6

........................................chapter 7

.....................................chapter 8

........chapter 9

.chapter 10

.........................chapter 11

..............................................chapter 14

What is an instruction set?What is an instruction set?

• The complete collection of instructions that are understood by a CPU

• Machine Code – set of binary digits• Usually represented by assembly codes

Elements of an InstructionElements of an Instruction

• Operation code (Op code)– Do this

• Source Operand reference– To this

• Result Operand reference– Put the answer here

• Next Instruction Reference– When you have done that, do this...

Operands – Where From?Operands – Where From?• Main memory (or virtual memory or cache)• CPU register• I/O device

Instruction RepresentationInstruction Representation

• In machine code each instruction has a unique bit pattern

• For human consumption (well, programmers anyway) a symbolic representation is used– e.g. ADD, SUB, LOAD

• Operands can also be represented in this way– ADD A,B

Instruction TypesInstruction Types

• Data processing• Data storage (main memory)• Data movement (I/O)• Program flow control

Number of Addresses (a)Number of Addresses (a)

• 3 addresses– Operand 1, Operand 2, Result

– a = b + c;

– May be a fourth - next instruction (usually implicit)

– Not common

– Needs very long “words” to hold everything

Number of Addresses (b)Number of Addresses (b)

• 2 addresses– One address doubles as operand and result

– a = a + b

– Reduces length of instruction

– Requires some extra work• Temporary storage to hold some results

Number of Addresses (c)Number of Addresses (c)

• 1 address– Implicit second address

– Usually a register (accumulator)

– Common on early machines

Number of Addresses (d)Number of Addresses (d)

• 0 (zero) addresses– All addresses implicit

– Uses a stack

– e.g. push a

– push b

– add

– pop c

– c = a + b

How Many Addresses is good?How Many Addresses is good?

• More addresses– More complex (powerful?) instructions

– More registers• Inter-register operations are quicker

– Fewer instructions per program

• Fewer addresses– Less complex (powerful?) instructions

– More instructions per program

– Faster fetch/execution of instructions

Design Decisions (1)Design Decisions (1)

• Operation repertoire– How many operations?

– What can they do?

– How complex are they?

• Data types • Instruction formats

– Length of op code field

– Number of addresses/instruction• Registers

– Number of CPU registers available

– Which operations can be performed on which registers?

• Addressing modes (later…)• RISC v CISC - whole lecture on its own

Types of OperandTypes of Operand

• Addresses• Numbers

– Integer/floating point

• Characters– ASCII etc.

• Logical Data– Bits or flags

• (Aside: Is there any difference between numbers and characters? )

Specific Data TypesSpecific Data Types

• General - arbitrary binary contents• Integer - single binary value• Ordinal - unsigned integer• Unpacked BCD - One digit per byte• Packed BCD - 2 BCD digits per byte• Near Pointer - 32 bit offset within segment• Bit field• Byte String• Floating Point• …………

Types of OperationTypes of Operation

• Data Transfer• Arithmetic• Logical• Conversion• I/O• System Control• Transfer of Control

Data TransferData Transfer

• Specify– Source

– Destination

– Amount of data

• May be different instructions for different movements– e.g. IBM 370

• L ; 32 bits from memory to register• LH ; 16 bits from memory to register• LE ; 32 bits from memory to fp register• Etc…

• Or one instruction and different addresses– e.g. VAX

• LD ;16 bits from anywhere to anywhere

ArithmeticArithmetic

• Add, Subtract, Multiply, Divide• Signed Integer• Floating point ?• May include

– Increment (a++)

– Decrement (a--)

– Negate (-a)

LogicalLogical

• Bitwise operations• AND, OR, NOT

ConversionConversion

• E.g. Binary to Decimal

Input/OutputInput/Output

• May be specific instructions• May be done using data movement instructions

(memory mapped)• May be done by a separate controller (DMA)

Systems ControlSystems Control

• Privileged instructions• CPU needs to be in specific state • For operating systems or compiler use

Transfer of ControlTransfer of Control

• Branch– e.g. branch to x if result is zero

• Skip– e.g. increment and skip if zero

– ISZ Register1

– Branch xxxx

– ADD A

• Subroutine call– c.f. interrupt call

Byte OrderByte Order

• What order do we read numbers that occupy more than one byte?

• e.g. (numbers in hex to make it easy to read)• 12345678 can be stored in 4x8bit locations as follows

• Address Value (1) Value(2)• 184 12 78• 185 34 56• 186 56 34• 187 78 12

• i.e. read top down or bottom up?

Byte Order NamesByte Order Names

• The problem is called Endian• The system on the left has the least significant byte in

the lowest address

• This is called big-endian• The system on the right has the least significant byte

in the highest address

• This is called little-endian

12 34 56 78184 185 186 187

12 34 56 78187 186 185 184

Standard…What Standard?Standard…What Standard?

• Pentium (80x86), VAX are little-endian• IBM 370, Moterola 680x0 (Mac), and most RISC are

big-endian• Internet is big-endian

– Makes writing Internet programs on PC more awkward!

– WinSock provides htoi and itoh (Host to Internet & Internet to Host) functions to convert

SyllabusSyllabus

1. Introduction



4. Input & Output











.....................................................................chapter 2

.....................chapter 3

........................................chapter 4

................................................................chapter 6

........................................chapter 7

.....................................chapter 8

........chapter 9

.chapter 10

.........................chapter 11

..............................................chapter 14

Addressing ModesAddressing Modes

• Immediate• Direct• Indirect• Register• Register Indirect• Displacement (Indexed) • Stack

Immediate AddressingImmediate Addressing

• Operand is part of instruction• Operand = address field• e.g. ADD 5

– Add 5 to contents of accumulator

– 5 is operand

• No memory reference to fetch data• Fast• Limited range

OperandOpcode

Instruction

Direct AddressingDirect Addressing

• Address field contains address of operand• Effective address (EA) = address field (A)• e.g. ADD A

– Add contents of cell A to accumulator

– Look in memory at address A for operand• Single memory reference to access data

• No additional calculations to work out effective address• Limited address space

Address AOpcode

Instruction

Memory

Operand

Indirect Addressing (1)Indirect Addressing (1)

• Memory cell pointed to by address field contains the address of (pointer to) the operand

• EA = (A)– Look in A, find address (A) and look there for operand

• e.g. ADD (A)

– Add contents of cell pointed to by contents of A to accumulator

Address AOpcodeInstruction

Memory

Operand

Pointer to operand

Indirect Addressing (2)Indirect Addressing (2)

• Large address space • 2n where n = word length• May be nested, multilevel, cascaded

– e.g. EA = (((A)))• Draw the diagram yourself

• Multiple memory accesses to find operand• Hence slower

Register Addressing (1)Register Addressing (1)

• Operand is held in register named in address filed• EA = R• Limited number of registers• Very small address field needed

– Shorter instructions

– Faster instruction fetch

Register Address ROpcode

Instruction

Registers

Operand

Register Addressing (2)Register Addressing (2)

• No memory access• Very fast execution• Very limited address space• Multiple registers helps performance

– Requires good assembly programming or compiler writing

• c.f. Direct addressing

Register Indirect AddressingRegister Indirect Addressing

• C.f. indirect addressing• EA = (R)• Operand is in memory cell pointed to by contents of

register R• Large address space (2n)• One fewer memory access than indirect addressing

Register Address ROpcode

Instruction

Memory

OperandPointer to Operand

Registers

Displacement AddressingDisplacement Addressing

• EA = A + (R)• Address field hold two values

– A = base value

– R = register that holds displacement

– or vice versa

Register ROpcode

Instruction

Memory

OperandPointer to Operand

Registers

Address A

+

Relative AddressingRelative Addressing

• A version of displacement addressing• R = Program counter, PC• EA = A + (PC)• i.e. get operand from A cells from current location

pointed to by PC• c.f locality of reference & cache usage

Base-Register AddressingBase-Register Addressing

• A holds displacement• R holds pointer to base address• R may be explicit or implicit• e.g. segment registers in 80x86

Indexed AddressingIndexed Addressing

• A = base• R = displacement• EA = A + R• Good for accessing arrays

– EA = A + R

– R++

CombinationsCombinations

• Postindex• EA = (A) + (R)

• Preindex• EA = (A+(R))

• (Draw the diagrams)

Stack AddressingStack Addressing

• Operand is (implicitly) on top of stack• e.g.

– ADD Pop top two items from stackand add

Instruction FormatsInstruction Formats

• Layout of bits in an instruction• Includes opcode• Includes (implicit or explicit) operand(s)• Usually more than one instruction format in an

instruction set

Instruction LengthInstruction Length

• Affected by and affects:– Memory size

– Memory organization

– Bus structure

– CPU complexity

– CPU speed

• Trade off between powerful instruction repertoire and saving space

Allocation of BitsAllocation of Bits

• Number of addressing modes• Number of operands• Register versus memory• Number of register sets• Address range• Address granularity

SyllabusSyllabus

1. Introduction



4. Input & Output











.....................................................................chapter 2

.....................chapter 3

........................................chapter 4

................................................................chapter 6

........................................chapter 7

.....................................chapter 8

........chapter 9

.chapter 10

.........................chapter 11

..............................................chapter 14

CPU StructureCPU Structure

• CPU must:– Fetch instructions

– Interpret instructions

– Fetch data

– Process data

– Write data

RegistersRegisters

• CPU must have some working space (temporary storage)• Called registers• Number and function vary between processor designs• One of the major design decisions• Top level of memory hierarchy

User Visible RegistersUser Visible Registers

• General Purpose• Data• Address• Condition Codes

General Purpose RegistersGeneral Purpose Registers

• May be true general purpose• May be restricted• May be used for data or addressing• Data

– Accumulator

• Addressing– Segment

• Make them general purpose– Increase flexibility and programmer options

– Increase instruction size & complexity

• Make them specialized– Smaller (faster) instructions

– Less flexibility

How Many GP Registers?How Many GP Registers?

• Between 8 - 32• Fewer = more memory references• More does not reduce memory references and takes

up processor real estate

How big?How big?

• Large enough to hold full address• Large enough to hold full word• Often possible to combine two data registers

– C programming

– long int a;

Condition Code RegistersCondition Code Registers

• Sets of individual bits– e.g. result of last operation was zero

• Can be read (implicitly) by programs– e.g. Jump if zero

• Can not (usually) be set by programs

Control & Status RegistersControl & Status Registers

• Program Counter• Instruction Decoding Register• Memory Address Register• Memory Buffer Register

• Revision: what do these all do?

Program Status WordProgram Status Word

• A set of bits - Includes Condition Codes– Sign - of last result– Zero – set when result of last op was zero– Carry – set if last op resulted in a carry– Equal – set if a logical compare was true– Overflow – set to indicate arithmetic overflow– Interrupt enable/disable

• Supervisor Mode• Intel ring zero

• Kernel mode

• Allows privileged instructions to execute

• Used by operating system

• Not available to user programs

Other RegistersOther Registers

• May have registers pointing to:– Process control blocks (see O/S)

– Interrupt Vectors (see O/S)

• N.B. CPU design and operating system design are closely linked

Example Register OrganizationsExample Register Organizations

Instruction CycleInstruction Cycle

Indirect CycleIndirect Cycle

• May require memory access to fetch operands• Indirect addressing requires more memory accesses• Can be thought of as additional instruction subcycle

Instruction Cycle with IndirectInstruction Cycle with Indirect

Instruction Cycle State DiagramInstruction Cycle State Diagram

Data Flow (Instruction Fetch)Data Flow (Instruction Fetch)• Depends on CPU design – in general:

– Fetch• PC contains address of next instruction• Address moved to MAR• Address placed on address bus• Control unit requests memory read• Result placed on data bus, copied to MBR, then to IR• Meanwhile PC incremented by 1

Data Flow (Data Fetch)Data Flow (Data Fetch)

• IR is examined• If indirect addressing, indirect cycle is performed

– Right most N bits of MBR transferred to MAR

– Control unit requests memory read

– Result (address of operand) moved to MBR

Data Flow (Execute)Data Flow (Execute)

• May take many forms• Depends on instruction being executed• May include

– Memory read/write

– Input/Output

– Register transfers

– ALU operations

Data Flow (Interrupt)Data Flow (Interrupt)– Current PC saved to allow resumption after interrupt

– Contents of PC copied to MBR

– Special memory location (e.g. stack pointer) loaded to MAR

– MBR written to memory

– PC loaded with address of interrupt handling routine

– Next instruction (first of interrupt handler) can be fetched

PrefetchPrefetch

• Fetch accessing main memory• Execution usually does not access main memory• Can fetch next instruction during execution of current

instruction• Called instruction prefetch

Improved PerformanceImproved Performance

• But not doubled:– Fetch usually shorter than execution

• Prefetch more than one instruction?

– Any jump or branch means that prefetched instructions are not the required instructions

• Add more stages to improve performance

PipeliningPipelining

• Fetch instruction• Decode instruction• Calculate operands (i.e. EAs)• Fetch operands• Execute instructions• Write result

• Overlap these operations

Timing of PipelineTiming of Pipeline

Branch in a PipelineBranch in a Pipeline

Dealing with BranchesDealing with Branches

• Multiple Streams• Prefetch Branch Target• Loop buffer• Branch prediction• Delayed branching

Multiple StreamsMultiple Streams

• Have two pipelines• Prefetch each branch into a separate pipeline• Use appropriate pipeline

• Leads to bus & register contention• Multiple branches lead to further pipelines being

needed

Prefetch Branch TargetPrefetch Branch Target

• Target of branch is prefetched in addition to instructions following branch

• Keep target until branch is executed• Used by IBM 360/91

Loop BufferLoop Buffer

• Very fast memory• Maintained by fetch stage of pipeline• Check buffer before fetching from memory• Very good for small loops or jumps• c.f. cache• Used by CRAY-1

Branch PredictionBranch Prediction• Predict never taken

– Assume that jump will not happen

– Always fetch next instruction

• Predict always taken– Assume that jump will happen

– Always fetch target instruction

• Predict by Opcode– Some instructions more likely to result in a jump – Can get up to 75% success

• Taken/Not taken switch– Based on previous history– Good for loops

• Delayed Branch– Do not take jump until you have to

– Rearrange instructions

Branch Prediction State DiagramBranch Prediction State Diagram

SyllabusSyllabus

1. Introduction



4. Input & Output











.....................................................................chapter 2

.....................chapter 3

........................................chapter 4

................................................................chapter 6

........................................chapter 7

.....................................chapter 8

........chapter 9

.chapter 10

.........................chapter 11

..............................................chapter 14

Micro-OperationsMicro-Operations

• A computer executes a program• Fetch/execute cycle• Each cycle has a number of steps

– see pipelining

• Called micro-operations• Each step does very little• Atomic operation of CPU

Constituent Elements of Constituent Elements of Program ExecutionProgram Execution

Fetch - 4 RegistersFetch - 4 Registers

• Memory Address Register (MAR) – Connected to address bus– Specifies address for read or write op

• Memory Buffer Register (MBR) – Connected to data bus– Holds data to write or last data read

• Program Counter (PC) – Holds address of next instruction to be fetched

• Instruction Register (IR) – Holds last instruction fetched

Fetch SequenceFetch Sequence

• Address of next instruction is in PC• Address (MAR) is placed on address bus• Control unit issues READ command• Result (data from memory) appears on data bus• Data from data bus copied into MBR• PC incremented by 1 (in parallel with data fetch from memory)• Data (instruction) moved from MBR to IR• MBR is now free for further data fetches

Fetch Sequence (symbolic)Fetch Sequence (symbolic)

t1: MAR <- (PC)

t2: MBR <- (memory)

PC <- (PC) +1

t3: IR <- (MBR)

(tx = time unit/clock cycle)

• Or (conceptually identical)t1: MAR <- (PC)

t2: MBR <- (memory)

t3: PC <- (PC) +1

IR <- (MBR)

Rules for Clock Cycle GroupingRules for Clock Cycle Grouping

• Proper sequence must be followed– MAR <- (PC) must precede MBR <- (memory)

• Conflicts must be avoided– Must not read & write same register at same time

– MBR <- (memory) & IR <- (MBR) must not be in same cycle

• Also: PC <- (PC) +1 involves addition– Use ALU

– May need additional micro-operations

Indirect Address CycleIndirect Address Cycle

MAR <- (IRaddress) - address field of IR

MBR <- (memory)

IRaddress <- (MBRaddress)

• MBR contains an address• IR is now in same state as if direct addressing had

been used

Interrupt CycleInterrupt Cycle

t1: MBR <-(PC)

t2: MAR <- save-address

PC <- routine-address

t3: memory <- (MBR)

• This is a minimum– May be additional micro-ops to get addresses

– N.B. saving context is done by interrupt handler routine, not micro-ops

Execute Cycle (ADD)Execute Cycle (ADD)

• Different for each instruction• e.g. ADD R1,X - add the contents of location X to

Register 1 , result in R1t1: MAR <- (IRaddress)

t2: MBR <- (memory)

t3: R1 <- R1 + (MBR)

• Note no overlap of micro-operations

Execute Cycle (ISZ)Execute Cycle (ISZ)

• ISZ X - increment and skip if zerot1: MAR <- (IRaddress)

t2: MBR <- (memory)

t3: MBR <- (MBR) + 1

t4: memory <- (MBR)

if (MBR) == 0 then PC <- (PC) + 1

• Notes:– if is a single micro-operation

– Micro-operations done during t4

Execute Cycle (BSA)Execute Cycle (BSA)

• BSA X - Branch and save address– Address of instruction following BSA is saved in X

– Execution continues from X+1t1: MAR <- (IRaddress)

MBR <- (PC)

t2: PC <- (IRaddress)

memory <- (MBR)

t3: PC <- (PC) + 1

Functional RequirementsFunctional Requirements• Define basic elements of processor• Describe micro-operations processor performs• Determine functions control unit must perform

Basic Elements of ProcessorBasic Elements of Processor• ALU• Registers• Internal data paths• External data paths• Control Unit

Types of Micro-operationTypes of Micro-operation

• Transfer data between registers• Transfer data from register to external• Transfer data from external to register• Perform arithmetic or logical ops

Functions of Control UnitFunctions of Control Unit

• Sequencing– Causing the CPU to step through a series of micro-

operations

• Execution– Causing the performance of each micro-op

• This is done using Control Signals

Control SignalsControl Signals• Clock

– One micro-instruction (or set of parallel micro-instructions) per clock cycle

• Instruction register– Op-code for current instruction

– Determines which micro-instructions are performed

• Flags– State of CPU

– Results of previous operations

• From control bus– Interrupts

– Acknowledgements

Control Signals - outputControl Signals - output

• Within CPU– Cause data movement

– Activate specific functions

• Via control bus– To memory

– To I/O modules

Example Control Signal Example Control Signal Sequence - FetchSequence - Fetch

• MAR <- (PC)– Control unit activates signal to open gates between PC

and MAR

• MBR <- (memory)– Open gates between MAR and address bus

– Memory read control signal

– Open gates between data bus and MBR

Internal OrganizationInternal Organization

• Usually a single internal bus• Gates control movement of data onto and off the bus• Control signals control data transfer to and from

external systems bus• Temporary registers needed for proper operation of

ALU

Hardwired Hardwired Implementation (1)Implementation (1)

• Control unit inputs• Flags and control bus

– Each bit means something

• Instruction register– Op-code causes different control signals for each

different instruction

– Unique logic for each op-code

– Decoder takes encoded input and produces single output

– n binary inputs and 2n outputs

Hardwired Hardwired Implementation (2)Implementation (2)

• Clock– Repetitive sequence of pulses

– Useful for measuring duration of micro-ops

– Must be long enough to allow signal propagation

– Different control signals at different times within instruction cycle

– Need a counter with different control signals for t1, t2 etc.

Problems With Hard Problems With Hard Wired DesignsWired Designs

• Complex sequencing & micro-operation logic• Difficult to design and test• Inflexible design• Difficult to add new instructions

• Better to use a microprogrammed controller……

computer systems & networks i – computer systems roger webb (13dj01) [email protected]

Documents