memory system characterization of commercial workloads authors: luiz andré barroso (google, dec;...
TRANSCRIPT
MEMORY SYSTEM CHARACTERIZATION OF COMMERCIAL WORKLOADS
Authors:Luiz André Barroso (Google, DEC; worked on Piranha)Kourosh Gharachorloo (Compaq, DEC; worked on Dash and Flash)Edouard Bugnion (one of the original founders of VMware; also worked on SimOS)
Presented by: David Eitel, March 31, 2010
Types of Commercial Applications Online Transaction Processing (OLTP) Decision Support Systems (DSS) Web Index Search (WIS)
Source: S. Brin and L. Page. “The Anatomy of a Large-Scale Hypertextual Web Search Engine.”
Benchmarks
Oracle Database Engine TPC-B Banking Benchmark for OLTP TPC-D Benchmark for DSS (read-only
queries) AltaVista
Sources: http://georgiaconsortium.org/images/Banking-Coins.jpg,http://greencanada.files.wordpress.com/2009/04/databases.jpg, http://sixrevisions.com/web_design/popular-search-engines-in-the-90s-then-and-now/
Monitoring Results
Source: Fig. 4
OLTP has more complex queries than DSS/AV Important to have low-latency to non-primary caches
because OLTP working set is very large. Cache misses for DSS are low – misses on large
database tables.
Big CPI!
Lots of Bcachemisses
Breakdown of the execution time
misses
Sum of single- and dual-issue cycles
Pipeline and address translation related stalls
>75%memstalls
Scache = secondary cacheBcache = board-level cache
Simulation Results for OLTP
Source: Fig. 5
Associativity
Cache Size
Data capacity/Conflict misses
INST = instruction executionCACHE = stalls within cache hierarchyMEM = memory system stalls
Idle time increases with bigger caches.
The I/O latency cannot be hidden with faster processing rates.
Faster processing rates with a more efficient memory system = more commits ready for the log writer (I/O).
OLTP benefits from larger Bcaches.
More Simulation Results (OLTP and DSS)
DSS works well with current sized caches because the working sets are small (few misses in on-chip caches)
Replacement/instr miss rate are not affected by line size good for larger cache sizes.
False sharing increases with cache line size.
What would be different if increased latency and bandwidth were accounted for when line size increases?
Are the results NOT valid because
size(database) = size(main memory)?Sources: Fig. 7 and Fig. 8
Important Things to Remember As # processors increases, communication stalls
increase (see Fig. 6) O/S activity & I/O latencies do not greatly affect
the behavior of database engines. OLTP has instruction & data locality helped by
off-chip caches DSS and WIS have working sets that fit in
memory sensitive to on-chip caches
Source: http://www.stress-treatment-21.com/wp-content/uploads/2009/05/thinking-monkey.bmp
Discussion Questions
What are some new commercial applications that have developed since this paper was written?
How much have the issues in this paper been addressed in recent architecture designs?
What should we focus on in the “parallel” future to increase performance for commercial applications?
Could we change commercial workloads to function more like scientific workloads to obtain performance gains?
Source: http://www.vosibilities.com/wp-content/uploads/2009/05/bpm-questions-you-should-ask-your-bpms-vendor1.jpg