improving the quality of large scale database-centric software systems by analyzing database access...
TRANSCRIPT
1
Improving the Quality of Large-Scale Database-Centric Software Systems by
Analyzing Database Access Code
Tse-Hsun(Peter) Chen
Supervised by Ahmed E. Hassan
2
Common performance problems in database-access code
DBMSCode
• Inefficient data access
3
Inefficient data access
for each userId{ … executeQuery(“select … where u.id = userId”);}
Code
SQL select … from user where u.id = 1 select … from user where u.id = 2…
4
Common performance problems in database-access code
DBMSCode
• Inefficient data access• Unneeded data access
5
Unneeded data access
DBMSCode
Requireuser data in
the code
Actual request sent
Team TableUser Tablejoin
6
Common performance problems in database-access code
DBMSCode
• Inefficient data access• Unneeded data access• Overly-strict isolation level
7
Overly-strict isolation level
DBMSCode
Reading onlyuser name
set read-write transactionselect … from user
8
Cannot find problems by only looking at the DBMS side
DBMS
select … from user where u.id = 1 select … from user where u.id = 2…for each userId{
… executeQuery(“select … where u.id = userId”);}
9
Database accesses are abstracted
DBMSCode
AbstractionLayers
Problems become more complex and frequent after adding abstraction layers
10
Accessing data using ORM incorrectly@EntityClass User{
…}
for each userId{ User u = findUserById(userId); u.getName();}
A database entity class
Objects
SQLselect … from user where u.id = 1 select … from user where u.id = 2…
11
Research statement
There are common problems in how developers write database-access code, and these problems are hard to diagnose by only looking at the DBMS. In addition, due to the use of different database abstraction layers, these problems become even harder to find.
By finding problems in the database access code and configuring the abstraction layers correctly, we can significantly improve the performance of database-centric systems.
12
Detecting inefficient data access code
Overview of the thesis
Detecting unneeded data access
Finding overly-strict isolation level
Finished work Future workUnder submission
13
Detecting unneeded data access
Detecting inefficient data access code
Overview of the thesis
Finding overly-strict isolation level
Finished work Future workUnder submission
Inefficient data access detection framework
Inefficient data access detection and ranking framework
Ranked according to performance impact
Ranked inefficient
data access code
Source Code
detection
ranking
14
Inefficient data access detection framework
Inefficient data access detection and ranking framework
Ranked according to performance impact
Ranked inefficient
data access code
Source Code
detection
ranking
15
Focus on one type of inefficient data access: one-by-one processing
Detecting one-by-one processingusing static analysis
16
First find all the functions that read/write from/to DBMS
Class User{getUserById()…getUserAddress()…
}
Identify the positions of all loopsfor each userId{
foo(userId)}
Check if the loop contains any database-accessing function
foo (userId){getUserById(userId)
}
Inefficient data access detection framework
Inefficient data access detection and ranking framework
Ranked according to performance impact
Ranked inefficient
data access code
Source Code
detection
ranking
17
Inefficient data access detection framework
Inefficient data access detection and ranking framework
Ranked according to performance impact
Ranked inefficient
data access code
Source Code
detection
ranking
18
Assessing inefficient data access impact by fixing the problem
ExecutionResponse time
for u in users{ update u}
Code with inefficient data access
users.updateAll()
Code without inefficient data access 19
Execute test suite 30 times
Response time after the fix
Avg. % improvement
Execution
Execute test suite 30 times
Inefficient data access causes large performance impacts
One-by-one processing0%
20%
40%
60%
80%
100%
20
% im
prov
emen
t in
resp
onse
tim
e
21
Detecting unneeded data access
Detecting inefficient data access code
Overview of the thesis
Finding overly-strict isolation level
Finished work Future workUnder submission
22
Detecting inefficient data access code
Overview of the thesis
Finding overly-strict isolation level
Finished work Future workUnder submission
Detecting unneeded data access
Intercepting system executions using byte code instrumentation
ORMDBMS
23
Intercept called functions in the code
Intercept ORM generated SQLs
Needed data
Requested data
diff
Mapping data access to called functions
24
@Entity@Table(name = “user”)public class User{
…
@Column(name=“name”)String userName;
@OneToManyList<Team> teams;public String getUserName(){
return userName;}
public void addTeam(Team t){ teams.add(t); }
User.java
We apply static analysis to find which database column a function is reading/modifying
24
Identifying unneeded data access from intercepted data
Only user name is needed in the application logic
Needed user name in code
Requested id, name, address, phone number
25
26
Detecting inefficient data access code
Overview of the thesis
Finding overly-strict isolation level
Finished work Future workUnder submission
Detecting unneeded data access
27
Detecting unneeded data access
Detecting inefficient data access code
Overview of the thesis
Finding overly-strict isolation level
Finished work Future workUnder submission
A real life example of transaction abstraction problem
28
All functions in foo will be executed in transactions
@Transactionalpublic class foo{
int getFooVal(){return foo.val;
}…
}
This function does not need to be executed in transactions
Where to put the transaction may affect system behavior and performance
Plans for finding transaction problems related abstraction
29
Bug Reports
Categories of transaction-related bugs
We plan to empirically study bug reports to find root causes and implement a
detection framework
30
31
32
33
34
35
36
37
38
http://petertsehsun.github.io