the performance impact of kernel prefetching on buffer cache replacement algorithms (acm sigmetric...

25
The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms ( ( ACM SIGMETRIC ’05 ACM SIGMETRIC ’05 ) ACM International Confere ) ACM International Confere nce on Measurement & Modeling of Computer Syste nce on Measurement & Modeling of Computer Syste ms ms Ali R. Butt, Chris Gniady, Y. Charlie Hu Ali R. Butt, Chris Gniady, Y. Charlie Hu Purdue University Purdue University Presented by Hsu Hao Chen

Upload: lincoln-houchins

Post on 31-Mar-2015

231 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms (ACM SIGMETRIC 05 ) ACM International Conference on Measurement & Modeling

The Performance Impact of Kernel Prefetching on Buffer

Cache Replacement Algorithms

((ACM SIGMETRIC ’05ACM SIGMETRIC ’05 ) ACM International Conferen) ACM International Conference on Measurement & Modeling of Computer Systece on Measurement & Modeling of Computer Syste

msms

Ali R. Butt, Chris Gniady, Y. Charlie HuAli R. Butt, Chris Gniady, Y. Charlie Hu

Purdue UniversityPurdue University

Presented by Hsu Hao Chen

Page 2: The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms (ACM SIGMETRIC 05 ) ACM International Conference on Measurement & Modeling

Outline

Introduction Motivation Replacement Algorithm

OPT LRU LRU-2 2Q LIRS LRFU MQ ARC

Performance Evaluation Conclusion

Page 3: The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms (ACM SIGMETRIC 05 ) ACM International Conference on Measurement & Modeling

Introduction Improving file system performance

Design effective block replacement algorithms for the buffer cache

Almost all buffer cache replacement algorithms have been proposed and studied comparatively without taking into account file system prefetching which exists in all modern operating systems

Cache hit ratio is used as sole performance metric The actual number of disk I/O requests? The actual running time of applications?

Page 4: The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms (ACM SIGMETRIC 05 ) ACM International Conference on Measurement & Modeling

Introduction (Cont.)

Various kernel components on the path from file system operation to the disk

Kernel Prefetching in Linux Beneficial for sequential accesses

Page 5: The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms (ACM SIGMETRIC 05 ) ACM International Conference on Measurement & Modeling

Motivation The goal of buffer replacement algorithm

Minimize the number of disk I/O Reduce the running time of the applications

ExampleWithout prefetching,

Belady results in 16 misses

LRU results in 23 misses

With prefetching, Beladys is not optimal!

Page 6: The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms (ACM SIGMETRIC 05 ) ACM International Conference on Measurement & Modeling

Replacement Algorithm OPT

Evicts the block that will be referenced farthest in the future Often used for comparative studies

Prefetched blocks are assumed to be accessed most recently, OPT can immediately determine wrong or right prefetches

Page 7: The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms (ACM SIGMETRIC 05 ) ACM International Conference on Measurement & Modeling

Replacement Algorithm LRU

Replaces the page that has not been accessed for the longest time

Prefetched blocks are inserted in the MRU just like regular blocks

Page 8: The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms (ACM SIGMETRIC 05 ) ACM International Conference on Measurement & Modeling

Replacement Algorithm LRU pathological case

the working set size is larger than the cache The application has a looping access pattern

In this case, LRU will replace all blocks before they are used again

Page 9: The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms (ACM SIGMETRIC 05 ) ACM International Conference on Measurement & Modeling

Replacement Algorithm LRU-2

Try to avoid the pathological cases of LRU LRU-K replaces a block based on the Kth-to-the-last

reference Authors recommended K=2 LRU-2 can quickly remove cold blocks from the cache Each block access requires log(N) operations to manipulate a

priority queueN is the number of blocks in the cache

Page 10: The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms (ACM SIGMETRIC 05 ) ACM International Conference on Measurement & Modeling

Replacement Algorithm 2Q

Proposed Achieve similar page replacement performance to LRU-2 Low overehad way (constant LRU)

All missed blocks in A1in queue Address of replaced blocks in A1out queue Re-referenced blocks in Am queue

Prefetched blocks are treated as on-demand blocks and if prefetched block is evicted from A1in queue before on-demand access, it is simply discarded

Page 11: The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms (ACM SIGMETRIC 05 ) ACM International Conference on Measurement & Modeling

Replacement Algorithm 2Q

Page 12: The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms (ACM SIGMETRIC 05 ) ACM International Conference on Measurement & Modeling

Replacement Algorithm LIRS (Low Inter-reference Recency Set)

LIR block : if accessed again since inserted on the LRU stack

HIR block : referenced less frequently Insert prefetched blocks into the cache that maintains

HIR blocks

Page 13: The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms (ACM SIGMETRIC 05 ) ACM International Conference on Measurement & Modeling

Replacement Algorithm LRFU (Least Recently/Frequently Used)

Replaces the block with the smallest C(x) value

Prefetched blocks are treated as the most recently accessed Problem: how to assign the initial weight (c(x)) Solution: a prefetched flag is set

When the block is accessed on-demand Initial value

every block x, at every time t , λ a tunable parameter

Initially, assign a value C(x)=0

Page 14: The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms (ACM SIGMETRIC 05 ) ACM International Conference on Measurement & Modeling

Replacement Algorithm MQ (Multi-Queue)

Use m LRU queues (typically m=8) Q0,Q1,….Qm-1, where Qi contains blocks that have been at

least 2i times but no more than 2i+1-1 times recently Not increments the reference counter when a block is

prefetched

Page 15: The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms (ACM SIGMETRIC 05 ) ACM International Conference on Measurement & Modeling

Replacement Algorithm MQ (Multi-Queue)

Page 16: The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms (ACM SIGMETRIC 05 ) ACM International Conference on Measurement & Modeling

Replacement Algorithm

ARC (Adaptive Replacement Cache) Maintains two LRU lists

Pages that have been referenced only once (L1) Pages that have been referenced at least twice (L2)

Each list has same length c as cache Cache contains tops of both lists: T1 and T2

L-1

T1T2

L-2

|T1| + |T2| = c

Page 17: The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms (ACM SIGMETRIC 05 ) ACM International Conference on Measurement & Modeling

Replacement Algorithm

ARC attempts to maintain a Buffer size B_T1 for list T1

When cache is full, ARC replacement if |T1| > B_T1

LRU page from T1 otherwise

LRU page from T2

if prefetched block is already in the ghost queue, it is not moved to the second queue, but to the first queue

Page 18: The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms (ACM SIGMETRIC 05 ) ACM International Conference on Measurement & Modeling

Performance Evaluation

Simulation Environment implement a buffer cache simulator functionally (prefetching, I/O clustering) Linux With DiskSim, they simulate the I/O time of

applicationsApplication

Sequential access

Random access

Multi1 : workload in a code development environment

Multi2 : workload in a graphic development and simulation

Multi2 : workload in a database and a web index server

Page 19: The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms (ACM SIGMETRIC 05 ) ACM International Conference on Measurement & Modeling

Performance Evaluation (Cont.)cscope (sequential)

Hit ratio # of clustered disk requests Execution time

Page 20: The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms (ACM SIGMETRIC 05 ) ACM International Conference on Measurement & Modeling

Performance Evaluation (Cont.)cscope (sequential)

Hit ratio # of clustered disk requests Execution time

Page 21: The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms (ACM SIGMETRIC 05 ) ACM International Conference on Measurement & Modeling

Performance Evaluation (Cont.)glimpse

(sequential)

Hit ratio # of clustered disk requests Execution time

Page 22: The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms (ACM SIGMETRIC 05 ) ACM International Conference on Measurement & Modeling

Performance Evaluation (Cont.)tph-h (random)

Hit ratio # of clustered disk requests Execution time

Page 23: The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms (ACM SIGMETRIC 05 ) ACM International Conference on Measurement & Modeling

Performance Evaluation (Cont.)tph-r (random)

Hit ratio # of clustered disk requests Execution time

Page 24: The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms (ACM SIGMETRIC 05 ) ACM International Conference on Measurement & Modeling

Performance Evaluation (Cont.)

Concurrent applications Multi1 : hit ratios and disk requests with or without prefetching

exhibit similar behavior as cscope Multi2 : behavior is similar to multi1, but prefetching does not

improve the execution time (CPU-bound viewperf) Multi3 : behavior is similar to tpc-h

Synchronous vs. asynchronous prefetching

Number and size of disk I/O (cscope at 128MB cache size)

With prefetching, number of requests is at least 30% lower than without prefetching except OPT, especially when asynchronous prefetching is used

Page 25: The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms (ACM SIGMETRIC 05 ) ACM International Conference on Measurement & Modeling

Conclusion

Kernel prefetching performance can have significant impact different replacement algorithms

Application file access patterns importance for prefetching disk data Sequential access Random access

With prefetching or without prefetching, hit ratio is not sole performance metric