architectural musings - ibm
TRANSCRIPT
![Page 1: Architectural Musings - IBM](https://reader031.vdocuments.us/reader031/viewer/2022020916/61a44dd3e0ffa07aff3114d4/html5/thumbnails/1.jpg)
1
Architectural Musings Rethinking Computer Systems Architecture & Evaluation
Christopher Vick [email protected]
March 23, 2014
![Page 2: Architectural Musings - IBM](https://reader031.vdocuments.us/reader031/viewer/2022020916/61a44dd3e0ffa07aff3114d4/html5/thumbnails/2.jpg)
2
§ Vision Talk
§ How should we analyze, reason about and evaluate Computer System Architecture in the 21st century?
§ What can history tell us about these questions? § What does this mean for the research community?
§ Mobile computing and current technologies fundamentally
change key parameters and constraints for computer system architecture
§ Vast new opportunities for research of great interest to and great relevance for industry
Introduction
![Page 3: Architectural Musings - IBM](https://reader031.vdocuments.us/reader031/viewer/2022020916/61a44dd3e0ffa07aff3114d4/html5/thumbnails/3.jpg)
3
Outline § Computer System Architecture § Then (Circa 1970)
§ Scarce Resources & Bottlenecks § Optimizations § Evaluation
§ Now (Mobile Computing Platforms) § Scarce Resources & Bottlenecks § Optimizations? § Evaluation?
§ Questions?
![Page 4: Architectural Musings - IBM](https://reader031.vdocuments.us/reader031/viewer/2022020916/61a44dd3e0ffa07aff3114d4/html5/thumbnails/4.jpg)
4
COMPUTER SYSTEM ARCHITECTURE
![Page 5: Architectural Musings - IBM](https://reader031.vdocuments.us/reader031/viewer/2022020916/61a44dd3e0ffa07aff3114d4/html5/thumbnails/5.jpg)
5
Computer System Architecture § Hardware
§ The 5 classic components (Patterson & Hennessy) § Input, Output, Memory, Datapath, Control
§ Software § System Virtual Machine (Hypervisor, VM, or VMM) § Operating System § Compilers & Tools
§ Definitions § The way components fit together § The arrangement of the various devices in a complete computer system or
network § The instruction set plus a model of the execution of the instruction set
(Amdahl et al)
§ Computer System Architecture § The selection and combination of hardware and software components to
assemble an effective computer system
![Page 6: Architectural Musings - IBM](https://reader031.vdocuments.us/reader031/viewer/2022020916/61a44dd3e0ffa07aff3114d4/html5/thumbnails/6.jpg)
6
Application Programs
Virtual Machine
Libraries
Multicore Execution Unit
Operating System
Interconnect
Drivers Memory Manager Scheduler
IO Devices Memory
Hypercall Interface
Software
Hardware
Combination
![Page 7: Architectural Musings - IBM](https://reader031.vdocuments.us/reader031/viewer/2022020916/61a44dd3e0ffa07aff3114d4/html5/thumbnails/7.jpg)
7
Effective § An optimization problem
§ Many variables § Selection of hardware/software components § Selection of interfaces/interconnects
§ Many constraints § Physical, sociological, technical & cost constraints
§ Scarce Resources and Bottlenecks § Maximize utilization of scarce resources § Minimize impact of bottlenecks
§ Evaluation § How do you measure effectiveness? § What effect does the evaluation have on the optimization?
![Page 8: Architectural Musings - IBM](https://reader031.vdocuments.us/reader031/viewer/2022020916/61a44dd3e0ffa07aff3114d4/html5/thumbnails/8.jpg)
8
THEN (CIRCA 1970)
Photo 1
Photo 2
![Page 9: Architectural Musings - IBM](https://reader031.vdocuments.us/reader031/viewer/2022020916/61a44dd3e0ffa07aff3114d4/html5/thumbnails/9.jpg)
9
Scarce Resources § CPU Cycles
§ CPUs expensive § Slow clock rates
§ Memory Locations § Random Access Memory expensive § Address/Data paths into CPU expensive
§ Skilled Programmers § Relatively new discipline § Poor language and tools support
Photo 3
![Page 10: Architectural Musings - IBM](https://reader031.vdocuments.us/reader031/viewer/2022020916/61a44dd3e0ffa07aff3114d4/html5/thumbnails/10.jpg)
10
Bottlenecks § Programmer Productivity
§ Software development slow and expensive § Low level programming paradigms
§ Memory Latency § RAM latency gated overall speed (~2-3 MHz) § Small RAM backed by vastly slower storage
§ I/O Bandwidth § Limited CPU connectivity § Crude communication mechanisms
Photo 4
![Page 11: Architectural Musings - IBM](https://reader031.vdocuments.us/reader031/viewer/2022020916/61a44dd3e0ffa07aff3114d4/html5/thumbnails/11.jpg)
11
Optimizations § Time Sharing
§ Effective sharing of limited resource
§ Virtual Memory § Effective sharing, and backing with cheaper alternative
§ Hardware Improvements § Smaller features provide more resource and faster clock § Large Scale Integration § Better signaling to improve bandwidth
§ High Level Programming Languages § Broadens productive programmer community § Abstracts away some hardware complexity
![Page 12: Architectural Musings - IBM](https://reader031.vdocuments.us/reader031/viewer/2022020916/61a44dd3e0ffa07aff3114d4/html5/thumbnails/12.jpg)
12
Evaluation § Started with primitive measures
§ MIPS § SLOC
§ Worked towards more sophisticated evaluation tools § Hennessey & Patterson very influential § SPEC CPU § TPM § Defect rate
§ Cost is always a factor
![Page 13: Architectural Musings - IBM](https://reader031.vdocuments.us/reader031/viewer/2022020916/61a44dd3e0ffa07aff3114d4/html5/thumbnails/13.jpg)
13
Examples § Digital PDP 11
§ 16-bit address space § Orthogonal instruction set § Memory mapped I/O § Unix, DOS, many others
§ IBM System 370
§ 24-bit address space § Virtual Memory § VMS, VM/370, DOS/VS § Backward compatibility with System 360
Photo 5
Photo 6
![Page 14: Architectural Musings - IBM](https://reader031.vdocuments.us/reader031/viewer/2022020916/61a44dd3e0ffa07aff3114d4/html5/thumbnails/14.jpg)
14
NOW (MOBILE COMPUTING)
![Page 15: Architectural Musings - IBM](https://reader031.vdocuments.us/reader031/viewer/2022020916/61a44dd3e0ffa07aff3114d4/html5/thumbnails/15.jpg)
15
Scarce Resources § Energy
§ Fixed Energy Budget for mobile devices § Thermal issues at all scales § Tradeoff between performance and energy § Shrinks no longer significantly improving consumption
§ Memory Bandwidth § Providing bandwidth is expensive § Memory interconnect consumes significant energy
![Page 16: Architectural Musings - IBM](https://reader031.vdocuments.us/reader031/viewer/2022020916/61a44dd3e0ffa07aff3114d4/html5/thumbnails/16.jpg)
16
Bottlenecks § Memory Latency
§ Increasing gap between CPU speed and DRAM latency § Physical distance to DRAM devices a factor
§ Concurrency § Shortage of programmers who can handle this § Inadequate language/tools support
§ I/O Bandwidth/Latency § Wireless bandwidth lower than wired § Consumes large amounts of energy
Photo 7
![Page 17: Architectural Musings - IBM](https://reader031.vdocuments.us/reader031/viewer/2022020916/61a44dd3e0ffa07aff3114d4/html5/thumbnails/17.jpg)
17
Example § Samsung Galaxy S5
§ Processor: 2.5 GHz Qualcomm® Snapdragon™ 801 (Quad Core)
§ GPU: Qualcomm® Adreno 330 § OS: Android™ 4.4.2 § Memory RAM: 2 GB DDR2 § Memory Storage: 16/32/64 GB onboard storage § Display: 5” AMOLED 1920 x 1080 HD § Network: LTE Cat 4, CDMA, UMTS/HSPA,
GSM/GPRS/EDGE § Battery: 2600 mAh § Camera (Main): 16.5 megapixel, Ultra HD § Dimensions: 142 x 73 x 8.1mm
§ This is a General Purpose Computer!
![Page 18: Architectural Musings - IBM](https://reader031.vdocuments.us/reader031/viewer/2022020916/61a44dd3e0ffa07aff3114d4/html5/thumbnails/18.jpg)
18
Optimizations? § Multi-core
§ Aggressive addition of cores and threads § Hardware concurrency outstripping software § New Concurrent Programming Models/Tools?
§ Memory Subsystem § Significant contributor to total energy consumption § Adding bandwidth is expensive § New technologies addressing some energy issues
§ Wireless bandwidth enhancements (LTE Advanced,etc.) § Solutions from desktop/server or embedded worlds
may not directly apply in mobile space!
![Page 19: Architectural Musings - IBM](https://reader031.vdocuments.us/reader031/viewer/2022020916/61a44dd3e0ffa07aff3114d4/html5/thumbnails/19.jpg)
19
Memory System Energy § Retaining data (one second)
§ DRAM: ~1-10 pJ/bit self-refresh § SRAM: 1200+ pJ/bit, and rising over time [ITRS 2009]
§ 4 pJ/bit (45nm LP, standby) [Barasinski et al., ESSCIRC ‘08] § Flash, PCM, STT RAM…: Zero !
§ Moving Data § 32-bit value:
§ Recompute: 60 pJ (Razor) § Send 1mm: 10 pJ § Retain in cache for 1 ms: 38 pJ § Retain in DRAM for 1 second: 32+ pJ
Photo 8
![Page 20: Architectural Musings - IBM](https://reader031.vdocuments.us/reader031/viewer/2022020916/61a44dd3e0ffa07aff3114d4/html5/thumbnails/20.jpg)
20
§ Move less! § Caches physically close to CPU § Locality, locality, locality (the first rule of chip real estate)
§ Retain less! § Power off unused caches lines [Kaxiras et al., ISCA ‘01] § “Drowsy” caches [Flautner et al., ISCA ‘02] § … with compiler analysis
[Zhang et al., Trans. Emb. Comp. Sys. 4(3) 2005] § Don’t refresh unused DRAM § … e.g. with garbage collection [Chen et al., CODES+ISSS ‘03]
Reducing Memory System Energy
![Page 21: Architectural Musings - IBM](https://reader031.vdocuments.us/reader031/viewer/2022020916/61a44dd3e0ffa07aff3114d4/html5/thumbnails/21.jpg)
21
§ Maintaining the illusion of a single flat memory address space is too expensive § On-chip caches can be major consumers of area and energy § Coherence protocols are expensive and difficult to scale
• Alternative: software-managed memory hierarchies – Tightly-coupled memory (TCM), scratchpads – Do not require tag memory, address comparison logic – More area- and energy-efficient – Help bridge gap between bandwidth and throughput
Extending the Memory Model
![Page 22: Architectural Musings - IBM](https://reader031.vdocuments.us/reader031/viewer/2022020916/61a44dd3e0ffa07aff3114d4/html5/thumbnails/22.jpg)
22
§ Different programming paradigm: software explicitly orchestrates all transfers between on-chip and off-chip memory areas
§ Major implications on memory management § Scratchpad allocation strategies § Data partitioning strategies § Dynamic relocation between scratchpad and DRAM to track the
program’s locality characteristics
§ Opportunities for compile-time and runtime optimization § Challenges in both Hardware and Software!
New Challenges and Opportunities
![Page 23: Architectural Musings - IBM](https://reader031.vdocuments.us/reader031/viewer/2022020916/61a44dd3e0ffa07aff3114d4/html5/thumbnails/23.jpg)
23
Evaluation § Energy/Power
§ Both matter § MIPS/Watt § Battery life § Hard to measure and lacking in precision
§ Performance § Currently rather primitive
§ Linpack, CaffeineMark, CoreMark, Quadrant § SPEC CPU § Following similar track to early PC evaluation, so should get more sophisticated
§ Need to more accurately measure/reflect the utility of the device § Balancing peak performance, throughput, battery life, etc.
§ Cost
![Page 24: Architectural Musings - IBM](https://reader031.vdocuments.us/reader031/viewer/2022020916/61a44dd3e0ffa07aff3114d4/html5/thumbnails/24.jpg)
Thank You
![Page 25: Architectural Musings - IBM](https://reader031.vdocuments.us/reader031/viewer/2022020916/61a44dd3e0ffa07aff3114d4/html5/thumbnails/25.jpg)
25
Photo Copyright Notices