computer architecture ele 475 / cos 475 slide deck 1

Post on 19-Feb-2022

11 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

ComputerArchitectureELE475/COS475

SlideDeck1:IntroductionandInstructionSetArchitectures

DavidWentzlaffDepartmentofElectricalEngineering

PrincetonUniversity

1

WhatisComputerArchitecture?Application

2

WhatisComputerArchitecture?

Physics

Application

3

WhatisComputerArchitecture?

Physics

Application

Gaptoolargetobridgeinonestep

4

WhatisComputerArchitecture?

Initsbroadestdefinition,computerarchitectureisthedesignoftheabstraction/implementationlayersthatallowustoexecuteinformationprocessingapplicationsefficientlyusingmanufacturingtechnologies

Physics

Application

Gaptoolargetobridgeinonestep

5

WhatisComputerArchitecture?

Initsbroadestdefinition,computerarchitectureisthedesignoftheabstraction/implementationlayersthatallowustoexecuteinformationprocessingapplicationsefficientlyusingmanufacturingtechnologies

Physics

Application

Gaptoolargetobridgeinonestep

6

AbstractionsinModernComputingSystems

Physics

Devices

CircuitsGates

Register-TransferLevelMicroarchitecture

InstructionSetArchitecture

OperatingSystem/VirtualMachines

ProgrammingLanguage

Algorithm

Application

7

AbstractionsinModernComputingSystems

Physics

Devices

CircuitsGates

Register-TransferLevelMicroarchitecture

InstructionSetArchitecture

OperatingSystem/VirtualMachines

ProgrammingLanguage

Algorithm

Application

ComputerArchitecture(ELE475)

8

ComputerArchitectureisConstantlyChanging

Physics

Devices

CircuitsGates

Register-TransferLevelMicroarchitecture

InstructionSetArchitecture

OperatingSystem/VirtualMachines

ProgrammingLanguage

Algorithm

Application ApplicationRequirements:•  Suggesthowtoimprovearchitecture•  Providerevenuetofunddevelopment

TechnologyConstraints:•  Restrictwhatcanbedoneefficiently•  Newtechnologiesmakenewarch

possible

9

ComputerArchitectureisConstantlyChanging

Physics

Devices

CircuitsGates

Register-TransferLevelMicroarchitecture

InstructionSetArchitecture

OperatingSystem/VirtualMachines

ProgrammingLanguage

Algorithm

Application ApplicationRequirements:•  Suggesthowtoimprovearchitecture•  Providerevenuetofunddevelopment

TechnologyConstraints:•  Restrictwhatcanbedoneefficiently•  Newtechnologiesmakenewarch

possible

Architectureprovidesfeedbacktoguideapplicationandtechnologyresearchdirections

10

ComputersThen…

IASMachine.DesigndirectedbyJohnvonNeumann.FirstbootedinPrincetonNJin1952SmithsonianInstitutionArchives(SmithsonianImage95-06151) 11

ComputersNow

12

Robots

SupercomputersAutomobiles

Laptops

Set-topboxes

Smartphones

ServersMediaPlayers

SensorNets

Routers

CamerasGames

AudioAssistants

[fromKurzweil]

MajorTechnologyGenerations Bipolar

nMOS

CMOS

pMOS

Relays

VacuumTubes

Electromechanical

13

SequentialProcessorPerformance

14FromHennessyandPattersonEd.5ImageCopyright©2011,ElsevierInc.AllrightsReserved.

SequentialProcessorPerformance

RISC

15FromHennessyandPattersonEd.5ImageCopyright©2011,ElsevierInc.AllrightsReserved.

SequentialProcessorPerformance

RISC

Move to multi-processor

16FromHennessyandPattersonEd.5ImageCopyright©2011,ElsevierInc.AllrightsReserved.

CourseAdministrationInstructor: Prof.DavidWentzlaff(wentzlaf@princeton.edu) Office:EQuadB228

OfficeHours:Mon.1:30-2:30pmEQuadB228TA: AngLi(angl@princeton.edu)

OfficeHours:TBDEQuadF210Lectures: Monday&Wednesday11am-12:20pmEQuadB205Text: ComputerArchitecture:AQuantitativeApproach

HennesseyandPatterson,5thEdition(2012) ModernProcessorDesign:FundamentalsofSuperscalar Processors(2004) JohnP.ShenandMikkoH.Lipasti

Prerequisite: ELE375&ELE206CourseWebpage:https://eleclass.princeton.edu/classes/ele475/spring_2018/ 17

CourseStructure

•  Midterm(20%)•  FinalExam(30%)•  Labs(25%)

–  4Designlabs(Verilog)•  DesignProject(20%)

– UsingOpenPiton–  Insmallgroups

•  ClassParticipation(5%)•  UngradedProblemSets(5PS’s)(0%)

–  Veryusefulforexampreparation18

CourseContentComputerOrganization(COS/ELE375)

ComputerOrganization•  BasicPipelinedProcessor

~50,000Transistors

PhotoofBerkeleyRISCI,©UniversityofCalifornia(Berkeley)19

CourseContentComputerArchitecture(ELE475)

IntelNehalemProcessor,OriginalCorei7,ImageCreditIntel:http://download.intel.com/pressroom/kits/corei7/images/Nehalem_Die_Shot_3.jpg

20

CourseContentComputerArchitecture(ELE475)

IntelNehalemProcessor,OriginalCorei7,ImageCreditIntel:http://download.intel.com/pressroom/kits/corei7/images/Nehalem_Die_Shot_3.jpg

~700,000,000Transistors21

CourseContentComputerArchitecture(ELE475)

IntelNehalemProcessor,OriginalCorei7,ImageCreditIntel:http://download.intel.com/pressroom/kits/corei7/images/Nehalem_Die_Shot_3.jpg

~700,000,000Transistors

ComputerOrganization(ELE375)Processor

22

CourseContentComputerArchitecture(ELE475)

IntelNehalemProcessor,OriginalCorei7,ImageCreditIntel:http://download.intel.com/pressroom/kits/corei7/images/Nehalem_Die_Shot_3.jpg

~700,000,000Transistors

ComputerOrganization(ELE375)Processor

•  InstructionLevelParallelism–  Superscalar–  VeryLongInstructionWord(VLIW)

•  LongPipelines(PipelineParallelism)

•  AdvancedMemoryandCaches•  DataLevelParallelism

–  Vector–  GPU

•  ThreadLevelParallelism–  Multithreading–  Multiprocessor–  Multicore–  Manycore

23

Architecturevs.Microarchitecture

“Architecture”/InstructionSetArchitecture:•  Programmervisiblestate(Memory&Register)•  Operations(Instructionsandhowtheywork)•  ExecutionSemantics(interrupts)•  Input/Output•  DataTypes/SizesMicroarchitecture/Organization:•  TradeoffsonhowtoimplementISAforsomemetric(Speed,Energy,Cost)

•  Examples:Pipelinedepth,numberofpipelines,cachesize,siliconarea,peakpower,executionordering,buswidths,ALUwidths

24

SoftwareDevelopments

25

upto1955 Librariesofnumericalroutines-Floatingpointoperations-Transcendentalfunctions-Matrixmanipulation,equationsolvers,...

1955-60 HighlevelLanguages-Fortran1956OperatingSystems--Assemblers,Loaders,Linkers,Compilers-Accountingprogramstokeeptrackofusageandcharges

SoftwareDevelopments

26

upto1955 Librariesofnumericalroutines-Floatingpointoperations-Transcendentalfunctions-Matrixmanipulation,equationsolvers,...

1955-60 HighlevelLanguages-Fortran1956OperatingSystems--Assemblers,Loaders,Linkers,Compilers-Accountingprogramstokeeptrackofusageandcharges

Machinesrequiredexperiencedoperators•  Mostuserscouldnotbeexpectedtounderstandtheseprograms,muchlesswritethem•  Machineshadtobesoldwithalotofresidentsoftware

CompatibilityProblematIBM

27

Byearly1960’s,IBMhad4incompatiblelinesofcomputers!

701 ⇒ 7094650 ⇒ 7074702 ⇒ 70801401 ⇒ 7010

Eachsystemhaditsown• Instructionset• I/OsystemandSecondaryStorage: magnetictapes,drumsanddisks• assemblers,compilers,libraries,...• marketnichebusiness,scientific,realtime,...

CompatibilityProblematIBM

28

Byearly1960’s,IBMhad4incompatiblelinesofcomputers!

701 ⇒ 7094650 ⇒ 7074702 ⇒ 70801401 ⇒ 7010

Eachsystemhaditsown• Instructionset• I/OsystemandSecondaryStorage: magnetictapes,drumsanddisks• assemblers,compilers,libraries,...• marketnichebusiness,scientific,realtime,...

CompatibilityProblematIBM

29

Byearly1960’s,IBMhad4incompatiblelinesofcomputers!

701 ⇒ 7094650 ⇒ 7074702 ⇒ 70801401 ⇒ 7010

Eachsystemhaditsown• Instructionset• I/OsystemandSecondaryStorage: magnetictapes,drumsanddisks• assemblers,compilers,libraries,...• marketnichebusiness,scientific,realtime,...

⇒ IBM360

30

IBM360:AGeneral-PurposeRegister(GPR)Machine

• ProcessorState–  16General-Purpose32-bitRegisters

• maybeusedasindexandbaseregister• Register0hassomespecialproperties

–  4FloatingPoint64-bitRegisters–  AProgramStatusWord(PSW)

•  PC,Conditioncodes,Controlflags

•  A32-bitmachinewith24-bitaddresses–  Butnoinstructioncontainsa24-bitaddress!

•  DataFormats–  8-bitbytes,16-bithalf-words,32-bitwords,64-bitdouble-words

31

IBM360:AGeneral-PurposeRegister(GPR)Machine

• ProcessorState–  16General-Purpose32-bitRegisters

• maybeusedasindexandbaseregister• Register0hassomespecialproperties

–  4FloatingPoint64-bitRegisters–  AProgramStatusWord(PSW)

•  PC,Conditioncodes,Controlflags

•  A32-bitmachinewith24-bitaddresses–  Butnoinstructioncontainsa24-bitaddress!

•  DataFormats–  8-bitbytes,16-bithalf-words,32-bitwords,64-bitdouble-words

TheIBM360iswhybytesare8-bitslongtoday!

32

IBM360:InitialImplementations Model 30 . . . Model 70

Storage 8K - 64 KB 256K - 512 KB Datapath 8-bit 64-bit Circuit Delay 30 nsec/level 5 nsec/level Local Store Main Store Transistor Registers Control Store Read only 1µsec Conventional circuits

IBM 360 instruction set architecture (ISA) completely hid the underlying technological differences between various models. Milestone: The first true ISA designed as portable hardware-software interface!

33

IBM360:InitialImplementations Model 30 . . . Model 70

Storage 8K - 64 KB 256K - 512 KB Datapath 8-bit 64-bit Circuit Delay 30 nsec/level 5 nsec/level Local Store Main Store Transistor Registers Control Store Read only 1µsec Conventional circuits

IBM 360 instruction set architecture (ISA) completely hid the underlying technological differences between various models. Milestone: The first true ISA designed as portable hardware-software interface!

With minor modifications it still survives today!

IBM360:Over50yearslater…ThezSeriesz14Microprocessor

•  5.2GHzinIBM14nmSOICMOStechnology•  6.1billiontransistorsin696mm2•  64-bitvirtualaddressing

–  originalS/360was24-bit,andS/370was31-bitextension•  10-coredesign•  6-fetch/cycle•  10-issue/cycleout-of-ordersuperscalarpipeline•  Out-of-ordermemoryaccesses•  Redundantdatapaths

–  everyinstructionperformedintwoparalleldatapathsandresultscompared

•  128KBL1I-cache,128KBL1D-cacheon-chip•  2MBprivateI-cacheL2percore•  4MBprivateD-cacheL2percore•  On-Chip128MBeDRAML3cache•  Upto672MBeDRAML4

ImageCredit:IBMCourtesyofInternationalBusiness

MachinesCorporation,©InternationalBusinessMachinesCorporation. 34

SameArchitectureDifferentMicroarchitecture

AMDPhenomX4•  X86InstructionSet•  QuadCore•  125W•  Decode3Instructions/Cycle/Core•  64KBL1ICache,64KBL1DCache•  512KBL2Cache•  Out-of-order•  2.6GHz

IntelAtom•  X86InstructionSet•  SingleCore•  2W•  Decode2Instructions/Cycle/Core•  32KBL1ICache,24KBL1DCache•  512KBL2Cache•  In-order•  1.6GHz

ImageCredit:Intel

ImageCredit:AMD35

DifferentArchitectureDifferentMicroarchitecture

AMDPhenomX4•  X86InstructionSet•  QuadCore•  125W•  Decode3Instructions/Cycle/Core•  64KBL1ICache,64KBL1DCache•  512KBL2Cache•  Out-of-order•  2.6GHz

IBMPOWER7•  PowerInstructionSet•  EightCore•  200W•  Decode6Instructions/Cycle/Core•  32KBL1ICache,32KBL1DCache•  256KBL2Cache•  Out-of-order•  4.25GHz

ImageCredit:IBMImageCredit:AMD

CourtesyofInternationalBusinessMachinesCorporation,©InternationalBusinessMachinesCorporation.

36

WhereDoOperandsComefromAndWhereDoResultsGo?

37

WhereDoOperandsComefromAndWhereDoResultsGo?

38

ALU

WhereDoOperandsComefromAndWhereDoResultsGo?

39

ALU

Mem

ory

Processor

WhereDoOperandsComefromAndWhereDoResultsGo?

40

ALU

Mem

ory

WhereDoOperandsComefromAndWhereDoResultsGo?

41

WhereDoOperandsComefromAndWhereDoResultsGo?

42

…TOS

ALU

Processor

Mem

ory

Stack

WhereDoOperandsComefromAndWhereDoResultsGo?

43

…TOS

ALU

Processor

Mem

ory

ALU

Processor

Mem

ory

Stack Accumulator

WhereDoOperandsComefromAndWhereDoResultsGo?

44

…TOS

ALU

Processor

Mem

ory

ALU

Processor

Mem

ory

ALU

Processor

Mem

ory

…Stack Accumulator

Register-Memory

WhereDoOperandsComefromAndWhereDoResultsGo?

45

…TOS

ALU

Processor

Mem

ory

ALU

Processor

Mem

ory

ALU

Processor

Mem

ory

…Stack Accumulator

Register-Memory

Register-Register

ALU

Processor

Mem

ory

WhereDoOperandsComefromAndWhereDoResultsGo?

46

…TOS

ALU

Processor

Mem

ory

ALU

Processor

Mem

ory

ALU

Processor

Mem

ory

…Stack Accumulator

Register-Memory

Register-Register

0 1 2or3NumberExplicitlyNamedOperands:

ALU

Processor

Mem

ory

2or3

Stack-BasedInstructionSetArchitecture(ISA)

•  Burrough’sB5000(1960)•  Burrough’sB6700•  HP3000•  ICL2900•  Symbolics3600•  InmosTransputerModern•  Forthmachines•  JavaVirtualMachine•  Intelx87FloatingPointUnit

47

…TOS

ALU

Processor

Mem

ory

EvaluationofExpressions

48

(a + b * c) / (a + d * c - e) /

+

* + a e

-

a c

d c

* b

EvaluationofExpressions

49

(a + b * c) / (a + d * c - e) /

+

* + a e

-

a c

d c

* b

Reverse Polish a b c * + a d c * + e - /

EvaluationofExpressions

50

(a + b * c) / (a + d * c - e) /

+

* + a e

-

a c

d c

* b

Reverse Polish a b c * + a d c * + e - /

Evaluation Stack

EvaluationofExpressions

51

(a + b * c) / (a + d * c - e) /

+

* + a e

-

a c

d c

* b

Reverse Polish a b c * + a d c * + e - /

Evaluation Stack push a

a

EvaluationofExpressions

52

(a + b * c) / (a + d * c - e) /

+

* + a e

-

a c

d c

* b

Reverse Polish a b c * + a d c * + e - /

Evaluation Stack a

b

EvaluationofExpressions

53

(a + b * c) / (a + d * c - e) /

+

* + a e

-

a c

d c

* b

Reverse Polish a b c * + a d c * + e - /

Evaluation Stack a

push b

b

EvaluationofExpressions

54

(a + b * c) / (a + d * c - e) /

+

* + a e

-

a c

d c

* b

Reverse Polish a b c * + a d c * + e - /

Evaluation Stack a

c b

EvaluationofExpressions

55

(a + b * c) / (a + d * c - e) /

+

* + a e

-

a c

d c

* b

Reverse Polish a b c * + a d c * + e - /

Evaluation Stack a

push c

c b

EvaluationofExpressions

56

(a + b * c) / (a + d * c - e) /

+

* + a e

-

a c

d c

* b

Reverse Polish a b c * + a d c * + e - /

Evaluation Stack a

c b

EvaluationofExpressions

57

(a + b * c) / (a + d * c - e) /

+

* + a e

-

a c

d c

* b

Reverse Polish a b c * + a d c * + e - /

Evaluation Stack a

multiply

* b * c

c b

EvaluationofExpressions

58

(a + b * c) / (a + d * c - e) /

+

* + a e

-

a c

d c

* b

Reverse Polish a b c * + a d c * + e - /

Evaluation Stack a

b * c

EvaluationofExpressions

59

a

(a + b * c) / (a + d * c - e) /

+

* + a e

-

a c

d c

* b

Reverse Polish a b c * + a d c * + e - /

add Evaluation Stack

b * c

EvaluationofExpressions

60

a

(a + b * c) / (a + d * c - e) /

+

* + a e

-

a c

d c

* b

Reverse Polish a b c * + a d c * + e - /

add

+

Evaluation Stack

b * c

EvaluationofExpressions

61

a

(a + b * c) / (a + d * c - e) /

+

* + a e

-

a c

d c

* b

Reverse Polish a b c * + a d c * + e - /

add

+

Evaluation Stack

b * c a + b * c

Hardwareorganizationofthestack

•  Stackispartoftheprocessorstate⇒ stackmustbeboundedandsmall≈ numberofRegisters,notthesizeofmainmemory

•  Conceptuallystackisunbounded

⇒apartofthestackisincludedintheprocessorstate;therestiskeptinthemainmemory

62

StackOperationsandImplicitMemoryReferences

• Supposethetop2elementsofthestackarekeptinregistersandtherestiskeptinthememory.

Eachpushoperation⇒ 1memoryreferencepopoperation ⇒ 1memoryreference

NoGood!• BetterperformancebykeepingthetopNelementsinregisters,andmemoryreferencesaremadeonlywhenregisterstackoverflowsorunderflows.

Issue-whentoLoad/Unloadregisters?

63

StackOperationsandImplicitMemoryReferences

• Supposethetop2elementsofthestackarekeptinregistersandtherestiskeptinthememory.

Eachpushoperation⇒ 1memoryreferencepopoperation ⇒ 1memoryreference

NoGood!• BetterperformancebykeepingthetopNelementsinregisters,andmemoryreferencesaremadeonlywhenregisterstackoverflowsorunderflows.

Issue-whentoLoad/Unloadregisters?

64

StackOperationsandImplicitMemoryReferences

• Supposethetop2elementsofthestackarekeptinregistersandtherestiskeptinthememory.

Eachpushoperation⇒ 1memoryreferencepopoperation ⇒ 1memoryreference

NoGood!• BetterperformancebykeepingthetopNelementsinregisters,andmemoryreferencesaremadeonlywhenregisterstackoverflowsorunderflows.

Issue-whentoLoad/Unloadregisters?

65

StackSizeandMemoryReferences

66

program stack (size = 2) memory refs push a R0 a push b R0 R1 b push c R0 R1 R2 c, ss(a) * R0 R1 sf(a) + R0 push a R0 R1 a push d R0 R1 R2 d, ss(a+b*c) push c R0 R1 R2 R3 c, ss(a) * R0 R1 R2 sf(a) + R0 R1 sf(a+b*c) push e R0 R1 R2 e,ss(a+b*c) - R0 R1 sf(a+b*c) / R0

a b c * + a d c * + e - /

StackSizeandMemoryReferences

67

program stack (size = 2) memory refs push a R0 a push b R0 R1 b push c R0 R1 R2 c, ss(a) * R0 R1 sf(a) + R0 push a R0 R1 a push d R0 R1 R2 d, ss(a+b*c) push c R0 R1 R2 R3 c, ss(a) * R0 R1 R2 sf(a) + R0 R1 sf(a+b*c) push e R0 R1 R2 e,ss(a+b*c) - R0 R1 sf(a+b*c) / R0

a b c * + a d c * + e - /

4 stores, 4 fetches (implicit)

StackSizeandExpressionEvaluation

68

program stack (size = 4) push a R0 push b R0 R1 push c R0 R1 R2 * R0 R1 + R0 push a R0 R1 push d R0 R1 R2 push c R0 R1 R2 R3 * R0 R1 R2 + R0 R1 push e R0 R1 R2 - R0 R1 / R0

a b c * + a d c * + e - /

StackSizeandExpressionEvaluation

69

program stack (size = 4) push a R0 push b R0 R1 push c R0 R1 R2 * R0 R1 + R0 push a R0 R1 push d R0 R1 R2 push c R0 R1 R2 R3 * R0 R1 R2 + R0 R1 push e R0 R1 R2 - R0 R1 / R0

a b c * + a d c * + e - /

a and c are “loaded” twice ⇒not the best use of registers!

MachineModelSummary

70

…TOS

ALU

Processor

Mem

ory

ALU

Processor

…Mem

ory

ALU

Processor

Mem

ory

…Stack Accumulator

Register-Memory

Register-Register

ALU

Processor

Mem

ory

MachineModelSummary

71

…TOS

ALU

Processor

Mem

ory

ALU

Processor

…Mem

ory

ALU

Processor

Mem

ory

…Stack Accumulator

Register-Memory

Register-Register

ALU

Processor

Mem

ory

C=A+B

MachineModelSummary

72

…TOS

ALU

Processor

Mem

ory

ALU

Processor

…Mem

ory

ALU

Processor

Mem

ory

…Stack Accumulator

Register-Memory

Register-Register

ALU

Processor

Mem

ory

C=A+B

PushAPushBAddPopC

LoadAAddBStoreC

LoadR1,AAddR3,R1,BStoreR3,C

LoadR1,ALoadR2,BAddR3,R1,R2StoreR3,C

ClassesofInstructions•  DataTransfer

–  LD,ST,MFC1,MTC1,MFC0,MTC0•  ALU

–  ADD,SUB,AND,OR,XOR,MUL,DIV,SLT,LUI•  ControlFlow

–  BEQZ,JR,JAL,TRAP,ERET•  FloatingPoint

–  ADD.D,SUB.S,MUL.D,C.LT.D,CVT.S.W,•  Multimedia(SIMD)

–  ADD.PS,SUB.PS,MUL.PS,C.LT.PS•  String

–  REPMOVSB(x86)73

AddressingModes:HowtoGetOperandsfromMemory

74

AddressingMode

Instruction Function

Register AddR4,R3,R2 Regs[R4]<-Regs[R3]+Regs[R2]**

Immediate AddR4,R3,#5 Regs[R4]<-Regs[R3]+5**

Displacement AddR4,R3,100(R1) Regs[R4]<-Regs[R3]+Mem[100+Regs[R1]]

RegisterIndirect

AddR4,R3,(R1) Regs[R4]<-Regs[R3]+Mem[Regs[R1]]

Absolute AddR4,R3,(0x475) Regs[R4]<-Regs[R3]+Mem[0x475]

MemoryIndirect

AddR4,R3,@(R1) Regs[R4]<-Regs[R3]+Mem[Mem[R1]]

PCrelative AddR4,R3,100(PC) Regs[R4]<-Regs[R3]+Mem[100+PC]

Scaled AddR4,R3,100(R1)[R5] Regs[R4]<-Regs[R3]+Mem[100+Regs[R1]+Regs[R5]*4]

**Maynotactuallyaccessmemory!

DataTypesandSizes•  Types

–  BinaryInteger–  BinaryCodedDecimal(BCD)–  FloatingPoint

•  IEEE754•  CrayFloatingPoint•  IntelExtendedPrecision(80-bit)

–  PackedVectorData–  Addresses

•  Width–  BinaryInteger(8-bit,16-bit,32-bit,64-bit)–  FloatingPoint(32-bit,40-bit,64-bit,80-bit)–  Addresses(16-bit,24-bit,32-bit,48-bit,64-bit)

75

ISAEncodingFixedWidth:EveryInstructionhassamewidth•  Easytodecode(RISCArchitectures:MIPS,PowerPC,SPARC,ARM…)Ex:MIPS,everyinstruction4-bytesVariableLength:Instructionscanvaryinwidth•  Takeslessspaceinmemoryandcaches(CISCArchitectures:IBM360,x86,Motorola68k,VAX…)Ex:x86,instructions1-byteupto17-bytesMostlyFixedorCompressed:•  Ex:MIPS16,THUMB(onlytwoformats2and4bytes)•  PowerPCandsomeVLIWs(Storeinstructionscompressed,

decompressintoInstructionCache(Very)LongInstructionWord:•  Multipleinstructionsinafixedwidthbundle•  Ex:Multiflow,HP/STLx,TIC6000 76

ISAEncodingFixedWidth:EveryInstructionhassamewidth•  Easytodecode(RISCArchitectures:MIPS,PowerPC,SPARC,ARM…)Ex:MIPS,everyinstruction4-bytesVariableLength:Instructionscanvaryinwidth•  Takeslessspaceinmemoryandcaches(CISCArchitectures:IBM360,x86,Motorola68k,VAX…)Ex:x86,instructions1-byteupto17-bytesMostlyFixedorCompressed:•  Ex:MIPS16,THUMB(onlytwoformats2and4bytes)•  PowerPCandsomeVLIWs(Storeinstructionscompressed,

decompressintoInstructionCache(Very)LongInstructionWord:•  Multipleinstructionsinafixedwidthbundle•  Ex:Multiflow,HP/STLx,TIC6000 77

x86(IA-32)InstructionEncoding

78

ImmediateDisplacementScale,Index,BaseModR/MOpcodeInstruction

Prefixes

0,1,2,or4bytes

0,1,2,or4bytes

1byte(ifneeded)

1byte(ifneeded)

1,2,or3bytes

UptofourPrefixes(1byteeach)

x86andx86-64InstructionFormatsPossibleinstructions1to18byteslong

MIPS64InstructionEncoding

79ImageCopyright©2011,ElsevierInc.AllrightsReserved.

RealWorldInstructionSets

80

Arch Type #Oper #Mem DataSize #Regs AddrSize Use

Alpha Reg-Reg 3 0 64-bit 32 64-bit Workstation

ARM Reg-Reg 3 0 32/64-bit 16 32/64-bit CellPhones,Embedded

MIPS Reg-Reg 3 0 32/64-bit 32 32/64-bit Workstation,Embedded

SPARC Reg-Reg 3 0 32/64-bit 24-32 32/64-bit Workstation

TIC6000 Reg-Reg 3 0 32-bit 32 32-bit DSP

IBM360 Reg-Mem 2 1 32-bit 16 24/31/64 Mainframe

x86 Reg-Mem 2 1 8/16/32/64-bit

4/8/24 16/32/64 PersonalComputers

VAX Mem-Mem 3 3 32-bit 16 32-bit Minicomputer

Mot.6800 Accum. 1 1/2 8-bit 0 16-bit Microcontroler

WhytheDiversityinISAs?

TechnologyInfluencedISA•  Storageisexpensive,tightencodingimportant•  ReducedInstructionSetComputer

–  Removeinstructionsuntilwholecomputerfitsondie•  Multicore/Manycore

–  TransistorsnotturningintosequentialperformanceApplicationInfluencedISA•  InstructionsforApplications

–  DSPinstructions•  CompilerTechnologyhasimproved

–  SPARCRegisterWindowsnolongerneeded–  Compilercanregisterallocateeffectively

81

Recap

82

Physics

Devices

CircuitsGates

Register-TransferLevelMicroarchitecture

InstructionSetArchitecture

OperatingSystem/VirtualMachines

ProgrammingLanguage

Algorithm

Application

ComputerArchitecture(ELE475)

Recap

•  ISAvsMicroarchitecture•  ISACharacteristics

– MachineModels–  Encoding–  DataTypes–  Instructions–  AddressingModes

83

Physics

Devices

CircuitsGates

Register-TransferLevelMicroarchitecture

InstructionSetArchitecture

OperatingSystem/VirtualMachines

ProgrammingLanguage

Algorithm

Application

ComputerArchitectureLecture1

NextClass:MicrocodeandReviewofPipelining

84

Acknowledgements•  Theseslidescontainmaterialdevelopedandcopyrightby:

–  Arvind(MIT)–  KrsteAsanovic(MIT/UCB)–  JoelEmer(Intel/MIT)–  JamesHoe(CMU)–  JohnKubiatowicz(UCB)–  DavidPatterson(UCB)–  ChristopherBatten(Cornell)

•  MITmaterialderivedfromcourse6.823•  UCBmaterialderivedfromcourseCS252&CS152•  CornellmaterialderivedfromcourseECE4750

85

Copyright©2018DavidWentzlaff

86

top related