fse2016 - cacheoptimizer: helping developers configure caching frameworks for hibernate-based...
TRANSCRIPT
1
CacheOptimizer: Helping Developers Configure Caching Frameworks for Hibernate-based
Database-Centric Web Applications
Mohamed Nasser, Parminder Flora
Tse-Hsun(Peter) Chen Ahmed E. HassanWeiyi Shang
– Over 1 billion page views per day– 44 billion SQL executions per day
– 8 billion minutes online everyday– Over 1.2 million photos a sec at
peak
Modern Database-Centric Web Applications: Millions of Users, Billions of Transactions
Gmail’s 25 to 55 minutes outage affected 42 million users.
Azure service was interrupted for 11hrs, affecting Azure users world-wide.
Down time of large-scale applications is very costly
Jan 24th Nov 19th Oct 28th
Facebook went down for 35 minutes, losing $854,700.
2014
Gmail’s 25 to 55 minutes outage affected 42 million users.
Azure service was interrupted for 11hrs, affecting Azure users world-wide.
Down time of large systems is very costly
Jan 24th Nov 19th Oct 28th
Facebook went down for 35 minutes, losing $854,700.
2014
Often caused by performance problems
5
$1.6 billion loss for a one-second slowdown
6
Slow database access is often the performance bottleneck
7
Application-level caches improve performance
Hibernate
Application server
databaseUser
Need developers to manually tell the frameworks what should be cached!
Application-level caches
Over 67% of Java developers use Hibernate to access databases
8
22%67%
We focus on Hibernate due to its popularity, but our approach should be applicable to
other database technologies
An example class with Hibernate code
9
@Entity@Table(name = “group”)@Cacheablepublic class Group{
@Column(name=“id”)private int id;
@Column(name=“name”)String groupName;
String User findGroupById(id){query = “select g from
Group where g.id = id”;
query.execute().cache(); }
Group.javaUser class is
mapped to “group” table in DB
id is mapped to the column “id” in the
user table
Query-level cache(cache query
result)
Object-level cache (cache retrieval by id)
There can be thousands of possible cache configurations
10
Optimal cache configuration is often determined by how users use the application
Caching helps improve performance
11
Group g = findGroupByID(1);
Hibernatedatabase
App-levelcache
Application server
…
Group g = findGroupByID(1);
Group1
Hibernate App-levelcache
Sub-optimal cache configurations are harmful to performance
12
Group u = findGroupByID(1);
database
Application server
g.setName(“FSE”)
Group g = findGroupByID(1);
Group1
…
It is important to understand user behaviors in order to find the optimal cache
configuration
Problem: Understanding user behavior in production is very difficult
13
User
Hibernate
Application server
Optimal cache configuration evolves in production, which requires regular update
Instrumentation adds too much overhead!
Our solution: Recover user behaviors by analyzing readily-available logs
14
User
Source Code
Applicationserver Database
CacheOptimizer
Apply optimal cache config
Update executable
Overview of CacheOptimizer
15
Source Code
Database access
informationStatic analysis
Apply static analysis to extract database access information
16
@Get@Path(“/group/{id}”)Group getGroup(id){ getGroupById(id); …}
Group getGroupById(id){ select from Group g where g.id = id …}
Finding HTTP request handler methods by analyzing annotations
Apply inter-procedural data flow analysis to see if inputs from the HTTP request are used as querying criteria
Overview of CacheOptimizer
17
Source Code
Database access
information@Get@Path(‘/group/{id}’) select from Group u where g.id = id …
Static analysis
System running in production
Build
System
10.10.10.1 - - [11/Apr/2015:12:19:
30] 200 “GET /app/group/1 ” …
User database accesses
Example: Recovered database access
18
10.10.10.1 - - [11/Apr/2015:12:19:30] 200 “GET /app/group/1 ” 10.10.10.1 - - [11/Apr/2015:12:19:31] 200 “GET /app/group/2 ”10.10.10.1 - - [11/Apr/2015:12:19:32] 200 “GET /app/group/1 ”
@Get@Path(“/group/{id}”)Group getGroup(id){ … select from Group g where g.id = id …}
Read operation on Group table, record with id 1, time is 11/Apr/2015:12:19:30
Read operation on Group table, record with id 2, time is 11/Apr/2015:12:19:31
Read operation on Group table, record with id 1, time is 11/Apr/2015:12:19:32
Overview of CacheOptimizer
19
Source Code
Static analysis
System running in production
Build
System
10.10.10.1 - - [11/Apr/2015:12:19:
30] 200 “GET /app/group/1 ” …
User database accesses
Cache configuration
Database access
information@Get@Path(‘/group/{id}’) select from Group u where g.id = id …
Calculating optimal cache configuration via workload simulation
20
Incoming request
Cache hit
Invalidated cache
Read group with id 1
Update group with id 1
Cache consideration
No longer considered for
caching
TimeMiss ratio is ½ (one cache hit)
We keep track of the cache miss ratio for each potential cache location
Studied applications
Performance benchmarking
e-commence application> 35K LOC
Medical record application> 3.8M LOC
Simple open-sourceapplication for a pet clinic
3.3K LOC
21
• We use JMeter tests to simulate user behaviours
• Database is pre-populated with hundreds of MB of data
Comparing throughput improvements under different cache configs
22
• CacheAll: Enable all caches
• Default: Cache configurations that are already added in the application (what developers think should be cached)
• CacheOptimizer: The optimal cache config discovered using CacheOptimizer
We compare three different cache configurations against having no cache (baseline)
CacheOptimizer gives significant improvements over other configs
23
0%
50%
100%
150%
0%10%20%30%40%50%
% o
f thr
ough
put
impr
ovem
ent o
ver h
avin
g no
cac
he
CacheAll DefaultCacheOpt
0%
10%
20%
30%CacheAll DefaultCacheOpt
CacheAll DefaultCacheOpt
24
25
26
27
28
29
30
31
Tse-Hsun (Peter) Chen http://petertsehsun.github.io