memory system characterization of big data workloads martin dimitrov, karthik kumar, patrick lu,...
TRANSCRIPT
![Page 1: Memory System Characterization of Big Data Workloads Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d205503460f949f4dd5/html5/thumbnails/1.jpg)
Memory System Characterization of Big Data Workloads Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm
![Page 2: Memory System Characterization of Big Data Workloads Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d205503460f949f4dd5/html5/thumbnails/2.jpg)
INTEL CONFIDENTIAL2
Why big data memory characterization?
• Workloads, Methodology and Metrics
• Measurements and results
• Conclusion and outlook
Agenda
![Page 3: Memory System Characterization of Big Data Workloads Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d205503460f949f4dd5/html5/thumbnails/3.jpg)
INTEL CONFIDENTIAL3
Why big data memory characterization?
• Studies show exponential data growth to come.
• Big Data: information from unstructured data
• Primary technologies are Hadoop and NoSQL
![Page 4: Memory System Characterization of Big Data Workloads Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d205503460f949f4dd5/html5/thumbnails/4.jpg)
INTEL CONFIDENTIAL4
Large data volumes can put pressure on the memory subsystem
Optimizations tradeoff CPU cycles to reduce load on memory, ex: compression
Why big data memory characterization?
Important to understand memory usages of big data
PowerMemory consumes upto 40% of total server power
PerformanceMemory latency,
capacity, bandwidth are important
![Page 5: Memory System Characterization of Big Data Workloads Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d205503460f949f4dd5/html5/thumbnails/5.jpg)
INTEL CONFIDENTIAL5
Why big data memory characterization?
How do latency-hiding optimizations apply to big data workloads?
DRAM scaling is hitting limits
Emerging memories have higher latency
Focus on latency hiding optimizations
![Page 6: Memory System Characterization of Big Data Workloads Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d205503460f949f4dd5/html5/thumbnails/6.jpg)
INTEL CONFIDENTIAL6
Executive Summary
• Provide insight into memory access characteristics of big data applications
• Examine implications on prefetchability, compressibility, cacheability
• Understand impact on memory architectures for big data usage models
![Page 7: Memory System Characterization of Big Data Workloads Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d205503460f949f4dd5/html5/thumbnails/7.jpg)
INTEL CONFIDENTIAL7
• Why big data memory characterization?
Workloads, Methodology and Metrics
• Measurements and results
• Conclusion and outlook
Agenda
![Page 8: Memory System Characterization of Big Data Workloads Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d205503460f949f4dd5/html5/thumbnails/8.jpg)
INTEL CONFIDENTIAL8
Big Data workloads
• Sort
• WordCount
• Hive Join
• Hive Aggregation
• NoSQL indexing
We analyze these workloads using hardware DIMM traces, performance counter monitoring, and performance measurements
![Page 9: Memory System Characterization of Big Data Workloads Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d205503460f949f4dd5/html5/thumbnails/9.jpg)
INTEL CONFIDENTIAL9
General Characterization
Memory footprint from DIMM trace• Memory in GB touched atleast once by the application
• Amount of memory to keep the workload „in memory“
EMON:• CPI
• Cache behavior: L1, L2, LLC MPI
• Instruction and Data TLB MPI
Understand how the workloads use memory
![Page 10: Memory System Characterization of Big Data Workloads Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d205503460f949f4dd5/html5/thumbnails/10.jpg)
INTEL CONFIDENTIAL10
Cache Line Working Set Characterization
1. For each cache line, compute number of times it is referenced
2. Sort cache lines by their number of references
3. Select a footprint size, say X MB
4. What fraction of total references is contained in X MB of the hottest cache lines?
Identifies the hot working set of application
![Page 11: Memory System Characterization of Big Data Workloads Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d205503460f949f4dd5/html5/thumbnails/11.jpg)
INTEL CONFIDENTIAL11
Cache Simulation
Run workload through a LRU cache simulator and vary the cache size
Considers temporal nature, not only spatial• Streaming through regions larger than cache size
• Eviction and replacement policies impact cacheability
• Focus on smaller sub-regions
Hit rates indicate potential for cacheability in tiered memory architecture
![Page 12: Memory System Characterization of Big Data Workloads Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d205503460f949f4dd5/html5/thumbnails/12.jpg)
INTEL CONFIDENTIAL12
Entropy
• Compressibility and Predictability important
• Signal with high information content – harder to compress and difficult to predict
• Entropy helps understand this behavior. For a set of cache lines K:
Lower entropy more compressibility, predictability
![Page 13: Memory System Characterization of Big Data Workloads Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d205503460f949f4dd5/html5/thumbnails/13.jpg)
INTEL CONFIDENTIAL13
Entropy - example
Footprint: 640B
References: 100
References/line: 10
Footprint: 640B
References: 100
References/line: 10
Footprint: 640B
References: 100
References/line: 10
<<64 byte cache: 10%
192 byte cache: 30%
Entropy: 1
64 byte cache: 19%
192 byte cache: 57%
Entropy: 0.785
64 byte cache: 91%
192 byte cache: 93%
Entropy: 0.217
(A) (B) (C)
Lower entropy more compressibility, predictability
![Page 14: Memory System Characterization of Big Data Workloads Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d205503460f949f4dd5/html5/thumbnails/14.jpg)
INTEL CONFIDENTIAL14
Correlation and Trend Analysis
Examine trace for trendsEg: increasing trend in upper physical address ranges Aggressively prefetch to an upper cache
• With s = 64, l=1000, test function f mimics ascending stride through memory of 1000 cache lines
• Negative correlation with f indicates decreasing trend
High correlation strong trend predict, prefetch
![Page 15: Memory System Characterization of Big Data Workloads Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d205503460f949f4dd5/html5/thumbnails/15.jpg)
INTEL CONFIDENTIAL15
• Why big data memory characterization?
• Big Data Workloads
• Methodology and Metrics
Measurements and results
• Conclusion and outlook
Agenda
![Page 16: Memory System Characterization of Big Data Workloads Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d205503460f949f4dd5/html5/thumbnails/16.jpg)
INTEL CONFIDENTIAL16
General Characterization
• NoSQL and sort have highest footprints
• Hadoop Compression reduces footprints and improves execution time
![Page 17: Memory System Characterization of Big Data Workloads Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d205503460f949f4dd5/html5/thumbnails/17.jpg)
INTEL CONFIDENTIAL17
General Characterization
• Sort has highest cache miss rates (transform large volume from one representation to another)
• Compression helps reduce LLC misses
L2 MPKI
![Page 18: Memory System Characterization of Big Data Workloads Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d205503460f949f4dd5/html5/thumbnails/18.jpg)
INTEL CONFIDENTIAL18
General Characterization
• Workloads have high peak bandwidths
• Sort has ~10x larger footprint than wordcount, but lower DTLB MPKI: memory references not well contained within page granularities, and are widespread
![Page 19: Memory System Characterization of Big Data Workloads Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d205503460f949f4dd5/html5/thumbnails/19.jpg)
INTEL CONFIDENTIAL19
Cache Line Working Set Characterization
Hottest 100MB contains 20% of all references
NoSQL has most spread among its cache lines
Sort has 60% references in 120GB footprint within 1GB
![Page 20: Memory System Characterization of Big Data Workloads Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d205503460f949f4dd5/html5/thumbnails/20.jpg)
INTEL CONFIDENTIAL20
Cache Simulation
Percentage cache hits higher than percentage references from footprint analysis
Big Data workloads operate on smaller memory regions at a time
![Page 21: Memory System Characterization of Big Data Workloads Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d205503460f949f4dd5/html5/thumbnails/21.jpg)
INTEL CONFIDENTIAL21
Entropy
Big Data workloads have higher entropy (>13) than SPEC workloads (>7) they are less compressible, predictable
from [Shao et al 2013]
![Page 22: Memory System Characterization of Big Data Workloads Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d205503460f949f4dd5/html5/thumbnails/22.jpg)
INTEL CONFIDENTIAL22
Normalized Correlation
• Hive aggregation has high correlation magnitudes (+,-)
• Enabling prefetchers has higher correlation in general
Potential for effective prediction and prefetching schemes for workloads like Hive aggregation
![Page 23: Memory System Characterization of Big Data Workloads Martin Dimitrov, Karthik Kumar, Patrick Lu, Vish Viswanathan, Thomas Willhalm](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649d205503460f949f4dd5/html5/thumbnails/23.jpg)
INTEL CONFIDENTIAL23
• Big Data workloads are memory intensive
• Potential for latency hiding techniques like cacheability and predictability to be successful• Large 4th level cache can benefit big data workloads
• Future work • Including more workloads in the study
• Scaling dataset sizes, etc
Take Aways & Next Steps