![Page 1: ECE 259 / CPS 221 Advanced Computer Architecture II](https://reader031.vdocuments.us/reader031/viewer/2022012407/616a229211a7b741a34f2e1e/html5/thumbnails/1.jpg)
ECE 259 / CPS 221Advanced Computer Architecture II(Parallel Computer Architecture)
Evaluation – Metrics, Simulation, and Workloads
Copyright 2010 Daniel J. SorinDuke University
![Page 2: ECE 259 / CPS 221 Advanced Computer Architecture II](https://reader031.vdocuments.us/reader031/viewer/2022012407/616a229211a7b741a34f2e1e/html5/thumbnails/2.jpg)
Outline
• Metrics
• Methodologies– Modeling– Simulation
2(C) 2010 Daniel J. Sorin ECE 259 / CPS 221
• Workloads
![Page 3: ECE 259 / CPS 221 Advanced Computer Architecture II](https://reader031.vdocuments.us/reader031/viewer/2022012407/616a229211a7b741a34f2e1e/html5/thumbnails/3.jpg)
Performance Metrics
• How do we tell if our design is good?
• Performance metrics– Clock speed (gigahertz)? No! Why not?– Instructions per cycle? No! Why not? (tougher qu estion)– Database transactions per second?
3(C) 2010 Daniel J. Sorin ECE 259 / CPS 221
• What’s important? Depends on workload …– Latency ���� interactive computing– Throughput ���� batch jobs/queries– Availability ���� enterprise applications– Power ���� mobile computing … and everything else now, too– Cost-efficiency ���� everything but perhaps supercomputing
![Page 4: ECE 259 / CPS 221 Advanced Computer Architecture II](https://reader031.vdocuments.us/reader031/viewer/2022012407/616a229211a7b741a34f2e1e/html5/thumbnails/4.jpg)
Metrics and Units
• Latency (an aspect of performance)– Response time
• Throughput (another aspect of performance)– Transactions per cycle (e.g., TPM-C or TPM-H)
• Availability– How many “nines” (e.g., 5 nines = 99.999% available)
• Power: watts
4(C) 2010 Daniel J. Sorin ECE 259 / CPS 221
• Power: watts• Energy: joules• Hybrid metrics capture more than one aspect
– Cost-efficiency: dollars-seconds– Power-delay (energy-delay): watts-seconds (joules-s econds)– Performability: combines performance with availabil ity
![Page 5: ECE 259 / CPS 221 Advanced Computer Architecture II](https://reader031.vdocuments.us/reader031/viewer/2022012407/616a229211a7b741a34f2e1e/html5/thumbnails/5.jpg)
Secondary Metrics
• Metrics that we can use for insight, debugging, etc .– Quantify specific aspects of system, not holistic b ehavior
• Examples– Instructions per cycle (IPC)– Cache hit rates– Average memory request latency
5(C) 2010 Daniel J. Sorin ECE 259 / CPS 221
– Average memory request latency– Average network link utilization– Fraction of directory requests that require 3 hops– Etc.
• You can use these metrics to explain results– Otherwise, results are just inscrutable, unjustifie d numbers
![Page 6: ECE 259 / CPS 221 Advanced Computer Architecture II](https://reader031.vdocuments.us/reader031/viewer/2022012407/616a229211a7b741a34f2e1e/html5/thumbnails/6.jpg)
Comparing to Prior Work
• How does your idea compare to prior work?– This is how we show that our idea is worthy of publ ication– E.g., 50% better throughput on TPC-C, but with 20% more power
• Why is comparison difficult?– Impossible to exactly reproduce experimental setup
6(C) 2010 Daniel J. Sorin ECE 259 / CPS 221
• Example differences in experimental setup– Different system model
» Different ISA, microarchitecture, network, etc.– Different workloads (or same workloads compiled dif ferently)
» Different OS» Even for same exact application, can have different jobs
running in the background (e.g., kernel daemons)– Different simulator (or different configuration of simulator)
» Assumptions about latencies, bandwidths, etc.
![Page 7: ECE 259 / CPS 221 Advanced Computer Architecture II](https://reader031.vdocuments.us/reader031/viewer/2022012407/616a229211a7b741a34f2e1e/html5/thumbnails/7.jpg)
Fair Comparisons
• Ideally, we’d make perfectly fair comparisons– Compare “apples and apples”
• If impossible, then give benefit of doubt to prior work– Assumptions about prior work should be optimistic
7(C) 2010 Daniel J. Sorin ECE 259 / CPS 221
• Assumptions about our work should be pessimistic– Don’t assume that our 4MB cache can be accessed in 1 cycle– Find the worst-case scenario for our system– Assume that future trends will be less favorable th an is likely
• Show that, even in our worst case, we still do well– Otherwise, readers will be less convinced
![Page 8: ECE 259 / CPS 221 Advanced Computer Architecture II](https://reader031.vdocuments.us/reader031/viewer/2022012407/616a229211a7b741a34f2e1e/html5/thumbnails/8.jpg)
Cost Effective Computing (Wood & Hill)
• DISCUSSION
8(C) 2010 Daniel J. Sorin ECE 259 / CPS 221
![Page 9: ECE 259 / CPS 221 Advanced Computer Architecture II](https://reader031.vdocuments.us/reader031/viewer/2022012407/616a229211a7b741a34f2e1e/html5/thumbnails/9.jpg)
Outline
• Metrics
• Evaluation Methodologies– Modeling– Simulation
9(C) 2010 Daniel J. Sorin ECE 259 / CPS 221
• Workloads
![Page 10: ECE 259 / CPS 221 Advanced Computer Architecture II](https://reader031.vdocuments.us/reader031/viewer/2022012407/616a229211a7b741a34f2e1e/html5/thumbnails/10.jpg)
Ways to Evaluate New Architectures
Tradeoff between three desired features
10(C) 2010 Daniel J. Sorin ECE 259 / CPS 221
Building
![Page 11: ECE 259 / CPS 221 Advanced Computer Architecture II](https://reader031.vdocuments.us/reader031/viewer/2022012407/616a229211a7b741a34f2e1e/html5/thumbnails/11.jpg)
Building
• Construct a hardware prototype– ASIC vs. FPGA
• Advantages+ Way cool to show off hardware to friends+ Runs quickly
• Disadvantages– Takes long time (grad student time!) to build
11(C) 2010 Daniel J. Sorin ECE 259 / CPS 221
– Takes long time (grad student time!) to build– Expensive– Not flexible (esp. ASIC)
ASICs generally too labor intensive for research st udies, but FPGAs are viable options in many cases
![Page 12: ECE 259 / CPS 221 Advanced Computer Architecture II](https://reader031.vdocuments.us/reader031/viewer/2022012407/616a229211a7b741a34f2e1e/html5/thumbnails/12.jpg)
Modeling
• Mathematically model the system– Use probabilities and/or queuing models (see ECE 25 5/257)
• Advantages+ Very flexible+ Very quick to develop+ Runs quickly
12(C) 2010 Daniel J. Sorin ECE 259 / CPS 221
+ Runs quickly
• Disadvantages– Cannot capture effects of system details– Architects are skeptical of models
Generally OK for back of the envelope estimates
![Page 13: ECE 259 / CPS 221 Advanced Computer Architecture II](https://reader031.vdocuments.us/reader031/viewer/2022012407/616a229211a7b741a34f2e1e/html5/thumbnails/13.jpg)
Simulating
• Write a program that mimics system behavior
• Advantages+ Very flexible+ Relatively quick to develop
• Disadvantages
13(C) 2010 Daniel J. Sorin ECE 259 / CPS 221
• Disadvantages– Runs slowly (e.g., 30,000 times slower than hardwar e)
Method of choice for most architectural research
![Page 14: ECE 259 / CPS 221 Advanced Computer Architecture II](https://reader031.vdocuments.us/reader031/viewer/2022012407/616a229211a7b741a34f2e1e/html5/thumbnails/14.jpg)
Simulation Challenges
SimulatorWorkload
System
Performance metrics
14(C) 2010 Daniel J. Sorin ECE 259 / CPS 221
System description
Tough problems associated with each arrow!
![Page 15: ECE 259 / CPS 221 Advanced Computer Architecture II](https://reader031.vdocuments.us/reader031/viewer/2022012407/616a229211a7b741a34f2e1e/html5/thumbnails/15.jpg)
Applications to Simulate
• We care how system does on important applications
• We’ll talk about this in a few slides …
15(C) 2010 Daniel J. Sorin ECE 259 / CPS 221
![Page 16: ECE 259 / CPS 221 Advanced Computer Architecture II](https://reader031.vdocuments.us/reader031/viewer/2022012407/616a229211a7b741a34f2e1e/html5/thumbnails/16.jpg)
Describing Simulated System
• How detailed must our simulator be?• Model every transistor in the processor?
– Would take too long
• Abstract away details of processor organization?– Could miss important effects of processor features– Could achieve wrong conclusion
16(C) 2010 Daniel J. Sorin ECE 259 / CPS 221
• Need balance– Model in detail only where necessary– E.g., model memory system in detail, but abstract d isks
![Page 17: ECE 259 / CPS 221 Advanced Computer Architecture II](https://reader031.vdocuments.us/reader031/viewer/2022012407/616a229211a7b741a34f2e1e/html5/thumbnails/17.jpg)
Analytic Model of Shared Memory System
• Queuing model can capture behavior of system• Optional reading from ISCA 1998: "Analytic
Evaluation of Shared-Memory Parallel Systems with ILP Processors“
– Models processor cores as request generators– Models cache coherent memory system (directory prot ocol) as
queuing system where requests (customers) access
17(C) 2010 Daniel J. Sorin ECE 259 / CPS 221
queuing system where requests (customers) access – Outputs average utilizations, throughputs, waiting t imes, etc.
![Page 18: ECE 259 / CPS 221 Advanced Computer Architecture II](https://reader031.vdocuments.us/reader031/viewer/2022012407/616a229211a7b741a34f2e1e/html5/thumbnails/18.jpg)
Some Processor Simulators
uniproc
Not full-system Full-system
SimpleScalarSimics (Virtutech)
Simics+GEMS (Wisconsin)
Simics + Flexus (CMU)
18(C) 2010 Daniel J. Sorin ECE 259 / CPS 221
MP
Simics + Flexus (CMU)
M5 (Michigan)
PTLsim (Suny-Binghamton)
ASIM (Intel)
SimOS (Stanford)
Liberty (Princeton)
SESC (UCSC/Illinois)
RSIM (Rice)
Wisconsin Wind Tunnel
![Page 19: ECE 259 / CPS 221 Advanced Computer Architecture II](https://reader031.vdocuments.us/reader031/viewer/2022012407/616a229211a7b741a34f2e1e/html5/thumbnails/19.jpg)
Simics
• Simics is a full-system commercial simulator
• PRESENTATION
19(C) 2010 Daniel J. Sorin ECE 259 / CPS 221
![Page 20: ECE 259 / CPS 221 Advanced Computer Architecture II](https://reader031.vdocuments.us/reader031/viewer/2022012407/616a229211a7b741a34f2e1e/html5/thumbnails/20.jpg)
RAMP
• PRESENTATION
20(C) 2010 Daniel J. Sorin ECE 259 / CPS 221
![Page 21: ECE 259 / CPS 221 Advanced Computer Architecture II](https://reader031.vdocuments.us/reader031/viewer/2022012407/616a229211a7b741a34f2e1e/html5/thumbnails/21.jpg)
Outline
• Metrics
• Methodologies– Modeling– Simulation
21(C) 2010 Daniel J. Sorin ECE 259 / CPS 221
• Workloads
![Page 22: ECE 259 / CPS 221 Advanced Computer Architecture II](https://reader031.vdocuments.us/reader031/viewer/2022012407/616a229211a7b741a34f2e1e/html5/thumbnails/22.jpg)
Workloads
• We care how system does on important applications
• But who defines “important”? (I do!)
• Types of applications– Scientific (genomics, weather simulation, protein f olding)
22(C) 2010 Daniel J. Sorin ECE 259 / CPS 221
– Scientific (genomics, weather simulation, protein f olding)– Commercial (database, web serving, application serv ing)– Desktop (office productivity software, multimedia)– Portable (voice recognition)– ???
![Page 23: ECE 259 / CPS 221 Advanced Computer Architecture II](https://reader031.vdocuments.us/reader031/viewer/2022012407/616a229211a7b741a34f2e1e/html5/thumbnails/23.jpg)
DEC/Compaq/Intel (?) Workload Analysis
• Commercial workloads are different from scientific
• PRESENTATION
23(C) 2010 Daniel J. Sorin ECE 259 / CPS 221
![Page 24: ECE 259 / CPS 221 Advanced Computer Architecture II](https://reader031.vdocuments.us/reader031/viewer/2022012407/616a229211a7b741a34f2e1e/html5/thumbnails/24.jpg)
“Simulating $2M Server on $2K PC”
• Commercial workloads are different from scientific• Simulating them requires extra work
• PRESENTATION
24(C) 2010 Daniel J. Sorin ECE 259 / CPS 221