fse2016 - cacheoptimizer: helping developers configure caching frameworks for hibernate-based...

1

CacheOptimizer: Helping Developers Configure Caching Frameworks for Hibernate-based

Database-Centric Web Applications

Mohamed Nasser, Parminder Flora

Tse-Hsun(Peter) Chen Ahmed E. HassanWeiyi Shang

– Over 1 billion page views per day– 44 billion SQL executions per day

– 8 billion minutes online everyday– Over 1.2 million photos a sec at

peak

Modern Database-Centric Web Applications: Millions of Users, Billions of Transactions

Gmail’s 25 to 55 minutes outage affected 42 million users.

Azure service was interrupted for 11hrs, affecting Azure users world-wide.

Down time of large-scale applications is very costly

Jan 24th Nov 19th Oct 28th

Facebook went down for 35 minutes, losing $854,700.

2014

Gmail’s 25 to 55 minutes outage affected 42 million users.

Azure service was interrupted for 11hrs, affecting Azure users world-wide.

Down time of large systems is very costly

Jan 24th Nov 19th Oct 28th

Facebook went down for 35 minutes, losing $854,700.

2014

Often caused by performance problems

5

$1.6 billion loss for a one-second slowdown

6

Slow database access is often the performance bottleneck

7

Application-level caches improve performance

Hibernate

Application server

databaseUser

Need developers to manually tell the frameworks what should be cached!

Application-level caches

Over 67% of Java developers use Hibernate to access databases

8

22%67%

We focus on Hibernate due to its popularity, but our approach should be applicable to

other database technologies

An example class with Hibernate code

9

@Entity@Table(name = “group”)@Cacheablepublic class Group{

@Column(name=“id”)private int id;

@Column(name=“name”)String groupName;

String User findGroupById(id){query = “select g from

Group where g.id = id”;

query.execute().cache(); }

Group.javaUser class is

mapped to “group” table in DB

id is mapped to the column “id” in the

user table

Query-level cache(cache query

result)

Object-level cache (cache retrieval by id)

There can be thousands of possible cache configurations

10

Optimal cache configuration is often determined by how users use the application

Caching helps improve performance

11

Group g = findGroupByID(1);

Hibernatedatabase

App-levelcache

Application server

…


Group1

Hibernate App-levelcache

Sub-optimal cache configurations are harmful to performance

12

Group u = findGroupByID(1);

database

Application server

g.setName(“FSE”)


Group1

…

It is important to understand user behaviors in order to find the optimal cache

configuration

Problem: Understanding user behavior in production is very difficult

13

User

Hibernate

Application server

Optimal cache configuration evolves in production, which requires regular update

Instrumentation adds too much overhead!

Our solution: Recover user behaviors by analyzing readily-available logs

14

User

Source Code

Applicationserver Database

CacheOptimizer

Apply optimal cache config

Update executable

Overview of CacheOptimizer

15

Source Code

Database access

informationStatic analysis

Apply static analysis to extract database access information

16

@Get@Path(“/group/{id}”)Group getGroup(id){ getGroupById(id); …}

Group getGroupById(id){ select from Group g where g.id = id …}

Finding HTTP request handler methods by analyzing annotations

Apply inter-procedural data flow analysis to see if inputs from the HTTP request are used as querying criteria


17

Source Code

Database access

information@Get@Path(‘/group/{id}’) select from Group u where g.id = id …

Static analysis

System running in production

Build

System

10.10.10.1 - - [11/Apr/2015:12:19:

30] 200 “GET /app/group/1 ” …

User database accesses

Example: Recovered database access

18

10.10.10.1 - - [11/Apr/2015:12:19:30] 200 “GET /app/group/1 ” 10.10.10.1 - - [11/Apr/2015:12:19:31] 200 “GET /app/group/2 ”10.10.10.1 - - [11/Apr/2015:12:19:32] 200 “GET /app/group/1 ”

@Get@Path(“/group/{id}”)Group getGroup(id){ … select from Group g where g.id = id …}

Read operation on Group table, record with id 1, time is 11/Apr/2015:12:19:30




19

Source Code

Static analysis

System running in production

Build

System

10.10.10.1 - - [11/Apr/2015:12:19:

30] 200 “GET /app/group/1 ” …

User database accesses

Cache configuration

Database access

information@Get@Path(‘/group/{id}’) select from Group u where g.id = id …

Calculating optimal cache configuration via workload simulation

20

Incoming request

Cache hit

Invalidated cache

Read group with id 1

Update group with id 1

Cache consideration

No longer considered for

caching

TimeMiss ratio is ½ (one cache hit)

We keep track of the cache miss ratio for each potential cache location

Studied applications

Performance benchmarking

e-commence application> 35K LOC

Medical record application> 3.8M LOC

Simple open-sourceapplication for a pet clinic

3.3K LOC

21

• We use JMeter tests to simulate user behaviours

• Database is pre-populated with hundreds of MB of data

Comparing throughput improvements under different cache configs

22

• CacheAll: Enable all caches

• Default: Cache configurations that are already added in the application (what developers think should be cached)

• CacheOptimizer: The optimal cache config discovered using CacheOptimizer

We compare three different cache configurations against having no cache (baseline)

CacheOptimizer gives significant improvements over other configs

23

0%

50%

100%

150%

0%10%20%30%40%50%

% o

f thr

ough

put

impr

ovem

ent o

ver h

avin

g no

cac

he

CacheAll DefaultCacheOpt

0%

10%

20%

30%CacheAll DefaultCacheOpt

CacheAll DefaultCacheOpt

31

Tse-Hsun (Peter) Chen http://petertsehsun.github.io

fse2016 - cacheoptimizer: helping developers configure caching frameworks for hibernate-based...

Software