analyzing storage system workloads paul g. sikalinda, pieter s. kritzinger {psikalin,...
Post on 18-Dec-2015
215 Views
Preview:
TRANSCRIPT
ANALYZING STORAGE SYSTEM WORKLOADS
Paul G. Sikalinda, Pieter S. Kritzinger
{psikalin, psk}@cs.uct.ac.za, DNA Research Group
Computer Science DepartmentUniversity of Cape Town,
and Lourens O. Walters.
Lourens.Walters@s1.comMosaic Software
Rondebosch
Cape Town Republic of South Africa.
Presentation Outline
Introduction
Motivation and Objectives
Storage Systems
Storage System Workloads
The Storage System Workload Analyzed
Statistical Methodology
Workload Analysis Results
Conclusions
Future Work
2
3
– specification of …– design of …– modelling of …– building of …– security of …– *workload analysis of …– correctness analysis of …– performance analysis of … concurrent computing systems (CCS).
Introduction
The DNA Group specializes, among other things, in using theory, formal methods and software tools in the:
Introduction (cont’d)
RP
RQPROCESSOR
ANALYZING STORAGE SYSTEM WORKLOADS
•Start Address•Operation Type•Request Size•Timestamps•Etc.
Motivation and Objectives
A lot of effort is being spent in improving the I/O subsystem because it is a bottleneck in current computer systems.-In design, performance and correctness evaluation of storage systems the workload modelling is an important component.
Common assumption not correct:-Uniform distribution of start addresses,-Exponential inter-arrival times.
Therefore storage system workload analysis should be done to come up with correct models.
6
Motivation and Objectives (cont’d)
-Designing storage systems.
-Designing I/O optimization techniques (read caching, write caching, pre-fetching, I/O parallelism, I/O rescheduling) to improve performance.
-Understanding application behavior and requirements.
-Deciding to pool storage system resources (SSPs).
-Implementing intelligent storage systems.
etc.
7
Motivation and Objectives (cont’d)
Our aim was to analyze storage system workloads in terms of
(a) inter-arrival times, (b) sizes and (c) “seek distances” of I/O requests
and provide statistics for these parameters to be used to:
(a) derive models for storage system evaluation and
(b) design optimization techniques (read caching, I/O parallelism etc. )
8
Storage Systems
Enterprise Storage System (ESS)
9
Host/Bus adapter
Cache
Array controller
Path to disks
Path to cache
Path to controller
Path to host
Disk drives
Storage Systems (cont’d)
ESS are powerful disk storage systems with the following capabilities:
-High performance*,
-Large capacity and availability
-Protection against physical drive failure can be provided using RAID methods.
*But can not still match the processor speeds because of mechanical processes in the disk drives.
10
Storage System Workloads
I/O Request Servicing and workload classification:-Logical Workloads (File System Workloads)
-Storage System Workloads (Physical I/O Traffic)
11
Operating System
File System
Application Software
Disk System
I/O request
I/O request
Storage System Workloads (cont’d)
Workload Parameters:
-Logical Volume Number
-*Start Address (seek distances)
-*Request Size
-Operation Type (i.e., read or write)
-*Time Stamp (inter-arrival times)
12
The Storage System Workload Analyzed
We analyzed inter-arrival times, request sizes, and ”seek distances” of I/O requests from a system running a web search engine deviation.
Got the I/O trace files from Storage Performance Council (SPC). (http://www.storageperformance.org)
13
Statistical Methodology
-Visual Techniques:
-Histogram and
-ECDF graphs.-Key Data Statistics
-Sample mean,
-Variance and standard deviation,
-Coefficient of skew, kurtosis, and variation,
-Five number data summaries (minimum, lower quartile, median, upper quartile, maximum).
-Lower and upper outlier limits
14
Results 1: inter-arrival times (µm)
Sample Size 1055448
Five Number Summary (126, 242, 1695, 4487, 100100)
Sample Mean 2985.761
Sample Variance 12508927
Standard Deviation 3536.796
Coefficient of Variation 1.184554
Coefficient of Skew 2.142186
Coefficient of Kurtosis 8.884555
Upper Outlier 26142
15
Results 1: inter-arrival times
-Highly variable data. Range (126, 100100 microseconds)
-Coefficient of kurtosis shows that the distribution is heavy tailed.
16
Results 2: Request sizes (bytes)
Sample Size 1055449
Five Number Summary (512, 8192, 8192, 24580, 1138000)
Sample Mean 15510
Sample Variance 102017528
Standard Deviation 10100.37
Coefficient of Variation 0.6512577
Coefficient of Skew 3.441212
Coefficient of Kurtosis 287.6503
Upper Outlier 106520
17
Results 2: Request sizes
Distribution peaks – 8192 (60%), 16384(10%), 24576 (9%) and 32768 (20%).
Reason:OS Filesystem Block - 8192 bytes
18
Results 3: Seek distances (blocks)
Sample Size 1055448
Five Number Summary (-34926160, -8581248, 6.4, 8580496, 34910700)
Sample Mean 27.95
Sample Variance 170691900000000
Standard Deviation 13064910
Coefficient of Skew 0
Coefficient of Variation 467398.8
Upper Outlier 51482656
Lower Outlier -51482528
19
Conclusions
(1) Analyzing storage system workloads is necessary to properly model the workloads:-To model Web inter-arrival time, Weibull, lognormal, beta, gamma, exponential probability density functions should be considered.
-To model Web data size and seek distance using probability mass function is more appropriate.
*We intend to use the models in simulations of ESS.
21
Conclusions (cont’d)
(2) The analysis results are useful when designing optimization techniques of storage system. E.g.,
-Cache management block size – 8192 bytes.
-I/O rescheduling and background tasking would be ideal for the workload.
-The storage system handling the workload we analyzed can be optimized to handle the symmetrical behavior*.
*The results are not broadly applicable.
22
Conclusions (cont’d)
(3) Other conclusions:
-Request sizes influenced by filesystem in use.
-Seek distances are not always uniform distributed.
*In summary, we have provided statistics about the parameters for the storage system workload that we analyzed and have shown how we can use them to derive models and design I/O optimization techniques.
23
Future Work
-Rigorously find a probability density function matching a given data set of inter-arrival times.
- Analyze the storage system workloads in terms of other parameters (e.g., logical volume numbers and operation types)
24
top related