lecture 01: introduc/on - github pageslecture 01: introduc/on cse 564 computer architecture summer...
TRANSCRIPT
![Page 1: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/1.jpg)
Lecture01:Introduc/on
CSE564ComputerArchitectureSummer2017
DepartmentofComputerScienceandEngineeringYonghongYan
[email protected]/~yan
1
![Page 2: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/2.jpg)
CopyrightandAcknowledgement• Mostslideswereadaptedfromlecturesnotesofthetwotextbookswithcopyrightofpublisherortheoriginal
authorsincludingElsevierInc,MorganKaufmann,DavidA.PaIersonandJohnL.Hennessy.• Someslideswereadaptedfromthefollowingcourses:
– UCBerkeleycourse“ComputerScience252:GraduateComputerArchitecture”ofDavidE.CullerCopyright2005UCB• hIp://people.eecs.berkeley.edu/~culler/courses/cs252-s05/
– GreatIdeasinComputerArchitecture(MachineStructures)byRandyKatzandBernhardBoser• hIp://inst.eecs.berkeley.edu/~cs61c/fa16/
• Ialsorefertothefollowingcoursesandlecturenoteswhenpreparingmaterialsforthiscourse– ComputerScience152:ComputerArchitectureandEngineering,Spring2016byDr.GeorgeMichelogiannakisfrom
UCBerkeley• hIp://www-inst.eecs.berkeley.edu/~cs152/sp16/
– ComputerScience252:GraduateComputerArchitecture,Fall2015byProf.KrsteAsanovićfromUCBerkeley• hIp://www-inst.eecs.berkeley.edu/~cs252/fa15/
– ComputerScienceS250:VLSISystemsDesign,Spring2016byProf.JohnWawrzynekfromUCBerkeley• hIp://www-inst.eecs.berkeley.edu/~cs250/sp16/
– ComputerSystemArchitecture,Fall2005byDr.JoelEmerandProf.ArvindfromMIT• hIp://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-823-computer-system-architecture-
fall-2005/– SynthesisLecturesonComputerArchitecture
• hIp://www.morganclaypool.com/toc/cac/1/1
• Theusesoftheslidesofthiscourseareforeduca/onalpurposesonlyandshouldbeusedonlyinconjunc/onwiththetextbook.Deriva/vesoftheslidesmustacknowledgethecopyrightno/cesofthisandtheoriginals.Permissionforcommercialpurposesshouldbeobtainedfromtheoriginalcopyrightholderandthesuccessivecopyrightholdersincludingmyself. 2
![Page 3: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/3.jpg)
Contents
• Computersandcomputercomponents• Computerarchitecturesandgreatideasinhistoryandnow• Performance
3
![Page 4: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/4.jpg)
TheComputerRevolu/on
• Progressincomputertechnology– UnderpinnedbyMoore’sLaw
• Makesnovelapplicaconsfeasible– Computersinautomobiles– Cellphones– Humangenomeproject– WorldWideWeb– SearchEngines
• Computersarepervasive
4
![Page 5: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/5.jpg)
ClassesofComputers
• PersonalMobileDevice(PMD)– e.g.smartphones,tabletcomputers– Emphasisonenergyefficiencyandreal-cme
• DesktopCompucng– Emphasisonprice-performance
• Servers– Emphasisonavailability,scalability,throughput
• Clusters/WarehouseScaleComputers– Usedfor“SogwareasaService(SaaS)”– Emphasisonavailabilityandprice-performance– Sub-class:Supercomputers,emphasis:floacng-pointperformanceand
fastinternalnetworks
• EmbeddedComputers– Emphasis:price
5
![Page 6: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/6.jpg)
ThePostPCEra
6
![Page 7: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/7.jpg)
ThePostPCEra
• PersonalMobileDevice(PMD)– BaIeryoperated– ConnectstotheInternet– Hundredsofdollars– Smartphones,tablets,electronicglasses
• Cloudcompucng– WarehouseScaleComputers(WSC)– SogwareasaService(SaaS)– PorconofsogwarerunonaPMDandaporconruninthe
Cloud– AmazonandGoogle
7
![Page 8: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/8.jpg)
OldSchoolComputer
8
![Page 9: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/9.jpg)
NewSchoolComputer(#1)
PersonalMobileDevices
99
![Page 10: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/10.jpg)
NewSchool“Computer”(#2)
1010
![Page 11: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/11.jpg)
ComponentsofaComputer
• Samecomponentsforallkindsofcomputer– Desktop,server,
embedded
• Input/outputincludes– User-interfacedevices
• Display,keyboard,mouse– Storagedevices
• Harddisk,CD/DVD,flash– Networkadapters
• Forcommunicacngwithothercomputers
TheBIGPicture
![Page 12: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/12.jpg)
InsidetheProcessor(CPU)
• Funcconalunits:performscomputacons• Datapath:performsoperaconsondata• Control:sequencesdatapath,memory,...• Cachememory
– SmallfastSRAMmemoryforimmediateaccesstodata
Apple A5
12
![Page 13: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/13.jpg)
ASafePlaceforData
• Volaclemainmemory– Losesinstrucconsanddatawhenpoweroff
• Non-volaclesecondarymemory– Magneccdisk– Flashmemory– Opccaldisk(CDROM,DVD)
![Page 14: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/14.jpg)
Contents
• Computersandcomputercomponents• Computerarchitecturesandgreatideasinhistoryandnow
• Performance
14
![Page 15: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/15.jpg)
Whatis“ComputerArchitecture”?
15
Applica'ons
InstrucconSetArchitecture
Compiler
OperacngSystem
Firmware
I/OsystemInstr.SetProc.
DigitalDesignCircuitDesign
Datapath&Control
Layout&fabSemiconductorMaterials
![Page 16: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/16.jpg)
16
TheInstruc/onSet:aCri/calInterface
instrucconset
sogware
hardware
• Propercesofagoodabstraccon– Laststhroughmanygeneracons(portability)– Usedinmanydifferentways(generality)– Providesconvenientfuncconalitytohigherlevels– Permitsanefficientimplementaconatlowerlevels
![Page 17: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/17.jpg)
17
ElementsofanISA
• Setofmachine-recognizeddatatypes– bytes,words,integers,floacngpoint,strings,...
• Operaconsperformedonthosedatatypes– Add,sub,mul,div,xor,move,….
• Programmablestorage– regs,PC,memory
• Methodsofidencfyingandobtainingdatareferencedbyinstruccons(addressingmodes)– Literal,reg.,absolute,relacve,reg+offset,…
• Format(encoding)oftheinstruccons– Opcode,operandfields,…
![Page 18: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/18.jpg)
ComputerArchitectureHowthingsareputtogetherindesignandimplementa/on
• Capabilices&PerformanceCharacterisccsofPrincipalFuncconalUnits
– (e.g.,Registers,ALU,Shigers,LogicUnits,...)• Waysinwhichthesecomponentsareinterconnected• Informaconflowsbetweencomponents• Logicandmeansbywhichsuchinformaconflowiscontrolled.
• ChoreographyofFUstorealizetheISA
18
![Page 19: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/19.jpg)
Great Ideas in Computer Architectures
1. Design for Moore’s Law
2. Use abstraction to simplify design
3. Make the common case fast
4. Performance via parallelism
5. Performance via pipelining
6. Performance via prediction
7. Hierarchy of memories
8. Dependability via redundancy
19
![Page 20: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/20.jpg)
GreatIdea:“Moore’sLaw”
GordonMoore,FounderofIntel• 1965:sincetheintegratedcircuitwasinvented,thenumberof
transistors/inch2inthesecircuitsroughlydoubledeveryyear;thistrendwouldconcnuefortheforeseeablefuture
• 1975:revised-circuitcomplexitydoubleseverytwoyears
20Imagecredit:Intel
![Page 21: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/21.jpg)
MicroprocessorTransistorCounts1971-2011&Moore'sLaw
21
hcps://en.wikipedia.org/wiki/Transistor_count
![Page 22: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/22.jpg)
Moore’sLawtrends• Moretransistors=↑opportunicesforexploicngparallelisminthe
instrucconlevel(ILP)– Pipeline,superscalar,VLIW(VeryLongInstrucconWord),SIMD(Single
InstrucconMulcpleData)orvector,speculacon,branchprediccon• Generalpathofscaling
– Widerinstrucconissue,longerpiepline– Morespeculacon– Moreandlargerregistersandcache
• Increasingcircuitdensity~=increasingfrequency~=increasingperformance
• Transparenttousers– AneasyjobofgevngbeIerperformance:buyingfasterprocessors(higher
frequency)
• Wehaveenjoyedthisfreelunchforseveraldecades,however(TBD)…
22
![Page 23: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/23.jpg)
GreatIdea:PipelineFundamentalExecu/onCycle
23
Instruc'onFetch
Instruc'onDecode
OperandFetch
Execute
ResultStore
NextInstruc'on
Obtaininstrucconfromprogramstorage
Determinerequiredacconsandinstrucconsize
Locateandobtainoperanddata
Computeresultvalueorstatus
Depositresultsinstorageforlateruse
Determinesuccessorinstruccon
Processor
regs
F.U.s
Memory
program
Data
vonNeumanboIleneck
![Page 24: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/24.jpg)
PipelinedInstruc/onExecu/on
24
I n s t r. O r d e r
Time (clock cycles)
Reg ALU
DMem Ifetch Reg
Reg ALU
DMem Ifetch Reg
Reg ALU
DMem Ifetch Reg
Reg ALU
DMem Ifetch Reg
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7 Cycle 5
![Page 25: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/25.jpg)
GreatIdea:Abstrac/on(LevelsofRepresenta/on/Interpreta/on)
lw $t0,0($2)lw $t1,4($2)sw $t1,0($2)sw $t0,4($2)
HighLevelLanguageProgram(e.g.,C)
AssemblyLanguageProgram(e.g.,MIPS)
MachineLanguageProgram(MIPS)
HardwareArchitectureDescrip/on(e.g.,blockdiagrams)
Compiler
Assembler
MachineInterpreta4on
temp=v[k];v[k]=v[k+1];v[k+1]=temp;
0000 1001 1100 0110 1010 1111 0101 1000 1010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111 !
LogicCircuitDescrip/on(CircuitSchema/cDiagrams)
ArchitectureImplementa4on
Anythingcanberepresentedasanumber,
i.e.,dataorinstruccons
25
![Page 26: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/26.jpg)
TheMemoryAbstrac/on
• Associaconof<name,value>pairs– typicallynamedasbyteaddresses– ogenvaluesalignedonmulcplesofsize
• SequenceofReadsandWrites• Writebindsavaluetoanaddress• ReadofaddrreturnsmostrecentlywriIenvalueboundtothataddress
26
address (name) command (R/W)
data (W)
data (R)
done
![Page 27: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/27.jpg)
Processor-DRAMMemoryGap(latency)
27
µProc 60%/yr. (2X/1.5yr)
DRAM 9%/yr. (2X/10 yrs)
1!
10!
100!
1000!19
80!
1981
!
1983
!19
84!
1985
!19
86!
1987
!19
88!
1989
!19
90!
1991
!19
92!
1993
!19
94!
1995
!19
96!
1997
!19
98!
1999
!20
00!
DRAM
CPU!19
82!
Processor-Memory Performance Gap: (grows 50% / year)
Perf
orm
ance
Time
“Joy’s Law”
![Page 28: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/28.jpg)
ThePrincipleofLocality
• ThePrincipleofLocality:– Programaccessarelacvelysmallporconoftheaddressspace
atanyinstantofcme.• TwoDifferentTypesofLocality:
– TemporalLocality(LocalityinTime):Ifanitemisreferenced,itwilltendtobereferencedagainsoon(e.g.,loops,reuse)
– SpacalLocality(LocalityinSpace):Ifanitemisreferenced,itemswhoseaddressesareclosebytendtobereferencedsoon(e.g.,straightlinecode,arrayaccess)
• Last30years,HWreliedonlocalityforspeed
28
P MEM$
![Page 29: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/29.jpg)
Greatidea:MemoryHierarchyLevelsoftheMemoryHierarchy
29
CPU Registers 100s Bytes << 1s ns
Cache 10s-100s K Bytes ~1 ns $1s/ MByte
Main Memory M Bytes 100ns- 300ns $< 1/ MByte
Disk 10s G Bytes, 10 ms (10,000,000 ns) $0.001/ MByte
Capacity Access Time Cost
Tape infinite sec-min $0.0014/ MByte
Registers
Cache
Memory
Disk
Tape
Instr. Operands
Blocks
Pages
Files
Staging Xfer Unit
prog./compiler 1-8 bytes
cache cntl 8-128 bytes
OS 512-4K bytes
user/operator Mbytes
Upper Level
Lower Level
faster
Larger
![Page 30: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/30.jpg)
JimGray’sStorageLatencyAnalogy:HowFarAwayistheData?
30Registers
On Chip Cache On Board Cache
Main Memory
Disk
1
2 10
100
Tape /Optical Robot
10 9
10 6
Lansing
This Campus
This Room My Head
10 min
1.5 hr
2 Years
1 min
Pluto
2,000 Years
Andromeda
(ns)
JimGrayTuringAwardB.S.Cal1966Ph.D.Cal1969!
![Page 31: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/31.jpg)
TheCacheDesignSpace
• Severalinteraccngdimensions– cachesize– blocksize– associacvity– replacementpolicy– write-throughvswrite-back
• Theopcmalchoiceisacompromise– dependsonaccesscharacterisccs
• workload• use(I-cache,D-cache,TLB)
– dependsontechnology/cost• Simplicityogenwins
31
Associativity
Cache Size
Block Size
Bad
Good Less More
Factor A Factor B
![Page 32: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/32.jpg)
GreatIdea:Parallelism
32
![Page 33: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/33.jpg)
DefiningComputerArchitecture
• “Old”viewofcomputerarchitecture:– InstrucconSetArchitecture(ISA)design– i.e.decisionsregarding:
• registers,memoryaddressing,addressingmodes,instrucconoperands,availableoperacons,controlflowinstruccons,instrucconencoding
• “Real”computerarchitecture:– Specificrequirementsofthetargetmachine– Designtomaximizeperformancewithinconstraints:cost,
power,andavailability– IncludesISA,microarchitecture,hardware
33
![Page 34: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/34.jpg)
ComputerArchitectureTopics
34
Instruction Set Architecture
Pipelining, Hazard Resolution, Superscalar, Reordering, Prediction, Speculation, Vector, Dynamic Compilation
Addressing, Protection, Exception Handling
L1 Cache
L2/L3 Cache
DRAM
Disks, WORM, Tape
Coherence, Bandwidth, Latency
Emerging Technologies Interleaving Bus protocols
RAID
VLSI
Input/Output and Storage
Memory Hierarchy
Pipelining and Instruction Level Parallelism
Network Communication
Oth
er P
roce
ssor
s
![Page 35: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/35.jpg)
WhyisArchitectureExci/ngToday?
35
CPUClockS
peed+15%
/year
CPUSpeedFlat
![Page 36: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/36.jpg)
Problemsoftradi/onalILPscaling
• Fundamentalcircuitlimitacons1– delays⇑asissuequeues⇑andmulc-portregisterfiles⇑– increasingdelayslimitperformancereturnsfromwiderissue
• Limitedamountofinstruccon-levelparallelism1
– inefficientforcodeswithdifficult-to-predictbranches
• Powerandheatstallclockfrequencies
36
[1]Thecaseforasingle-chipmulcprocessor,K.Olukotun,B.Nayfeh,L.Hammond,K.Wilson,andK.Chang,ASPLOS-VII,1996.
![Page 37: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/37.jpg)
ILPimpacts
37
![Page 38: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/38.jpg)
Simula/onsof8-issueSuperscalar
38
![Page 39: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/39.jpg)
Power/heatdensitylimitsfrequency
39
• Somefundamentalphysicallimitsarebeingreached
![Page 40: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/40.jpg)
Wewillhavethis…
40
![Page 41: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/41.jpg)
41
Revolu/onishappeningnow• Chipdensityis
concnuingincrease~2xevery2years– Clockspeedisnot– Numberofprocessor
coresmaydoubleinstead
• ThereisliIleornohiddenparallelism(ILP)tobefound
• Parallelismmustbeexposedtoandmanagedbysogware– Nofreelunch
Source:Intel,Microsog(SuIer)andStanford(Olukotun,Hammond)
![Page 42: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/42.jpg)
SingleProcessorPerformance
RISC
Move to multi-processor
42
![Page 43: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/43.jpg)
IBMBG/L
ASCIWhite Pacific
EDSAC1 UNIVAC1
IBM7090 CDC6600
IBM360/195 CDC7600
Cray1
CrayX-MP Cray2
TMCCM-2
TMCCM-5 CrayT3D
ASCIRed
1950 1960 1970 1980 1990 2000 2010
1KFlop/s
1MFlop/s
1GFlop/s
1TFlop/s
1PFlop/s
Scalar
Super Scalar
Parallel
Vector
1941 1 (Floating Point operations / second, Flop/s) 1945 100 1949 1,000 (1 KiloFlop/s, KFlop/s) 1951 10,000 1961 100,000 1964 1,000,000 (1 MegaFlop/s, MFlop/s) 1968 10,000,000 1975 100,000,000 1987 1,000,000,000 (1 GigaFlop/s, GFlop/s) 1992 10,000,000,000 1993 100,000,000,000 1997 1,000,000,000,000 (1 TeraFlop/s, TFlop/s) 2000 10,000,000,000,000 2005 131,000,000,000,000 (131 Tflop/s)
Super Scalar/Vector/Parallel
(103)
(106)
(109)
(1012)
(1015)
2XTransistors/ChipEvery1.5Years
Thetrends
43
![Page 44: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/44.jpg)
Recentmul/coreprocessors
44
![Page 45: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/45.jpg)
RecentmanycoreGPUprocessors
45
��
An�Overview�of�the�GK110�Kepler�Architecture�Kepler�GK110�was�built�first�and�foremost�for�Tesla,�and�its�goal�was�to�be�the�highest�performing�parallel�computing�microprocessor�in�the�world.�GK110�not�only�greatly�exceeds�the�raw�compute�horsepower�delivered�by�Fermi,�but�it�does�so�efficiently,�consuming�significantly�less�power�and�generating�much�less�heat�output.��
A�full�Kepler�GK110�implementation�includes�15�SMX�units�and�six�64�bit�memory�controllers.��Different�products�will�use�different�configurations�of�GK110.��For�example,�some�products�may�deploy�13�or�14�SMXs.��
Key�features�of�the�architecture�that�will�be�discussed�below�in�more�depth�include:�
� The�new�SMX�processor�architecture�� An�enhanced�memory�subsystem,�offering�additional�caching�capabilities,�more�bandwidth�at�
each�level�of�the�hierarchy,�and�a�fully�redesigned�and�substantially�faster�DRAM�I/O�implementation.�
� Hardware�support�throughout�the�design�to�enable�new�programming�model�capabilities�
�
Kepler�GK110�Full�chip�block�diagram�
��
Streaming�Multiprocessor�(SMX)�Architecture�
Kepler�GK110)s�new�SMX�introduces�several�architectural�innovations�that�make�it�not�only�the�most�powerful�multiprocessor�we)ve�built,�but�also�the�most�programmable�and�power�efficient.��
�
SMX:�192�single�precision�CUDA�cores,�64�double�precision�units,�32�special�function�units�(SFU),�and�32�load/store�units�(LD/ST).�
��
Kepler�Memory�Subsystem�/�L1,�L2,�ECC�
Kepler&s�memory�hierarchy�is�organized�similarly�to�Fermi.�The�Kepler�architecture�supports�a�unified�memory�request�path�for�loads�and�stores,�with�an�L1�cache�per�SMX�multiprocessor.�Kepler�GK110�also�enables�compiler�directed�use�of�an�additional�new�cache�for�read�only�data,�as�described�below.�
�
�
64�KB�Configurable�Shared�Memory�and�L1�Cache�
In�the�Kepler�GK110�architecture,�as�in�the�previous�generation�Fermi�architecture,�each�SMX�has�64�KB�of�on�chip�memory�that�can�be�configured�as�48�KB�of�Shared�memory�with�16�KB�of�L1�cache,�or�as�16�KB�of�shared�memory�with�48�KB�of�L1�cache.�Kepler�now�allows�for�additional�flexibility�in�configuring�the�allocation�of�shared�memory�and�L1�cache�by�permitting�a�32KB�/�32KB�split�between�shared�memory�and�L1�cache.�To�support�the�increased�throughput�of�each�SMX�unit,�the�shared�memory�bandwidth�for�64b�and�larger�load�operations�is�also�doubled�compared�to�the�Fermi�SM,�to�256B�per�core�clock.�
48KB�Read�Only�Data�Cache�
In�addition�to�the�L1�cache,�Kepler�introduces�a�48KB�cache�for�data�that�is�known�to�be�read�only�for�the�duration�of�the�function.�In�the�Fermi�generation,�this�cache�was�accessible�only�by�the�Texture�unit.�Expert�programmers�often�found�it�advantageous�to�load�data�through�this�path�explicitly�by�mapping�their�data�as�textures,�but�this�approach�had�many�limitations.��
• ~3kcores
![Page 46: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/46.jpg)
CurrentTrendsinArchitecture
• CannotconcnuetoleverageInstruccon-Levelparallelism(ILP)– Singleprocessorperformanceimprovementendedin2003
• Newmodelsforperformance:– Data-levelparallelism(DLP)– Thread-levelparallelism(TLP)– Heterogeneity
• Theserequireexplicitrestructuringoftheapplicacon
46
![Page 47: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/47.jpg)
Parallelism
• Classesofparallelisminapplicacons:– Data-LevelParallelism(DLP)– Task-LevelParallelism(TLP)
• Classesofarchitecturalparallelism:– Instruccon-LevelParallelism(ILP)– Vectorarchitectures/GraphicProcessorUnits(GPUs)– Thread-LevelParallelism– Heterogeneity
47
![Page 48: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/48.jpg)
ArchitecturalChallenges
• Massive(ca.4X)increaseinconcurrency– Mulccore(4-<100)àManycores(100s–1ks)
• Heterogeneity– System-level(accelerators)vschiplevel(embedded)
• Computepowerandmemoryspeedchallenges(twowalls)– 500xcomputepowerand30xmemoryof2PFHW– Memoryaccess'melagsfurtherbehind
48
• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2
Course Motivation: Research Perspective
���
�����
�����
� ���� ������ ����
��� ��� ��� ��� ���� ���� �� � �� � ����
��� ��� ��� ������� �� ��
�������
���������
����������
� �����
������
!"�#���
"��$������������
���%�����
&�'(�)
(�*�!+(
�,����$����
'�����'�+-��
�����'.�� �
�/�0��1'���
&��2$���-�������
'.�� �
(�$3
�������
�4��5��2�6�&0�.&���7��8�9��1:*�4��9 ��'.�� ���������$�� �4�6����� ����;����� ��2��<�� �4��9 ��'"<�����5$�� <��5$�� �")
��������
ECE 5950 Course Overview 18 / 35Data$Processing$in$Exascale1class$Computing$Systems$$|$$April$27,$2011$$|$$CRM$4"
Three"Eras"of"Processor"Performance"
Single4Core""Era"
Single1thread$$Performance$
?$
Time$
we#are#here#
o"
Enabled$by:$� ���������$� Voltage$Scaling$� MicroArchitecture$
$
Constrained$by:$Power$Complexity$
Multi4Core""Era"
Throughput$$Performance$
Time$(##of#Processors)#
we#are#here#
o"
Enabled$by:$� ���������$� Desire$for$Throughput$� 20$years$of$SMP$arch$
$
Constrained$by:$Power$Parallel$SW$availability$Scalability$
Heterogeneous"Systems"Era"
Targeted$Application$$
Performance$
Time$(Data1parallel#exploitation)#
we#are#here#
o"
Enabled$by:$� ���������$� Abundant$data$parallelism$� Power$efficient$GPUs$
$
Currently)constrained$by:$Programming$models$Communication$overheads$
Source:ChuckMoore,DataProcessinginExaScale-ClassComputerSystems,Salishan,April2011
![Page 49: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/49.jpg)
Exercise:InspectISAforsum
• cp~yan/sum.c~(copysum.cfilefrommyhomefoldertoyourhomefolder)
• gcc-save-tempssum.c–osum• ./sum102400
• visum.c• visum.s• Orcheckfrom:
– hIps://passlab.github.io/CSE564/exercises/sum/
• ViewthemfromHdrive• Othersystemcommands:
– cat/proc/cpuinfotoshowtheCPUand#cores– topcommandtoshowsystemusageandmemory
49
![Page 50: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/50.jpg)
Backup
50
![Page 51: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/51.jpg)
New-SchoolMachineStructures
• ParallelRequestsAssignedtocomputere.g.,Search“cats”
• ParallelThreadsAssignedtocoree.g.,Lookup,Ads
• ParallelInstruccons>[email protected].,5pipelinedinstruccons
• ParallelData>[email protected].,Addof4pairsofwords
• HardwaredescripconsAllgatesfuncconinginparallel
atsamecme
51
SmartPhone
Warehouse-Scale
Computer
SoLwareHardware
HarnessParallelism&AchieveHighPerformance
LogicGates
Core Core…
Memory(Cache)
Input/Output
Computer
MainMemory
Core
InstrucconUnit(s)
FuncconalUnit(s)
A3+B3A2+B2A1+B1A0+B0
51
![Page 52: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/52.jpg)
CopingwithFailures
• 4disks/server,50,000servers• Failurerateofdisks:2%to10%/year
– Assume4%annualfailurerate• Onaverage,howogendoesadiskfail?
a) 1/monthb) 1/weekc) 1/dayd) 1/hour
52
![Page 53: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/53.jpg)
CopingwithFailures
• 4disks/server,50,000servers• Failurerateofdisks:2%to10%/year
– Assume4%annualfailurerate• Onaverage,howogendoesadiskfail?
a) 1/monthb) 1/weekc) 1/dayd) 1/hour 50,000x4=200,000disks
200,000x4%=8000disksfail365daysx24hours=8760hours
53
![Page 54: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/54.jpg)
GreatIdea:DependabilityviaRedundancy
• Redundancysothatafailingpiecedoesn’tmakethewholesystemfail
1+1=2 1+1=2 1+1=1
1+1=22of3agree
FAIL!
Increasingtransistordensityreducesthecostofredundancy54
![Page 55: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/55.jpg)
GreatIdea:DependabilityviaRedundancy
• Appliestoeverythingfromdatacenterstostoragetomemorytoinstructors– Redundantdatacenterssothatcanlose1datacenterbut
Internetservicestaysonline– Redundantdiskssothatcanlose1diskbutnotlosedata
(RedundantArraysofIndependentDisks/RAID)– Redundantmemorybitsofsothatcanlose1bitbutnodata
(ErrorCorreccngCode/ECCMemory)
55
![Page 56: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/56.jpg)
UnderstandingComputerArchitecture
56de.pinterest.com
![Page 57: Lecture 01: Introduc/on - GitHub PagesLecture 01: Introduc/on CSE 564 Computer Architecture Summer 2017 Department of Computer Science and Engineering Yonghong Yan yan@oakland.edu](https://reader030.vdocuments.us/reader030/viewer/2022040300/5e68a390f3d2512a6e3defb7/html5/thumbnails/57.jpg)
EndofMoore’sLaw?
57
Costpertransistorisrisingastransistorsizecon/nuesto
shrink