dal deck
DESCRIPTION
Peter Teitelbaum presented Database Abstraction Layers to the LAMySQL and LAPHP Groups on October 1, 2013. These are his slides.TRANSCRIPT
Database Abstraction LayersPHP & MySQL
Presented By Peter Teitelbaum
Topics
• Intro to Database Abstraction Layers (DAL)• Security• Caching• Scaling Reads• Logging & Error Handling
Note
Not a guide to functionality every abstraction layer should have.
View this as an à la carte menu of ways to solve common problems. Some solutions will likely be useful and others not
depending on the application.
Introduction DB Abstraction Layers
• Database Abstraction layer is a middleware layer between application and database
• In computing we often abstract resources – mem, storage, etc• A centralized layer supporting all DB communication gives you
great flexibility and control• Concepts are universal and can be used with other languages
or databases• Can be ultra lightweight
World’s simplest DALclass SampleDAL extends mysqli
{
} // END class
World’s simplest DAL - extendedclass SampleDAL extends mysqli
{
// Overload the parent
public function query($sql)
{
// Do something extra
return parent::query($sql);
} // END method
} // END class
Security
• Filter invalid SQL– Example: prohibit execution of SET statement to prevent changes to
MySQL session variables
• SQL Injection – regex for keywords like SLEEP, etc– You don’t have to reject, write it to a log for later review, monitoring,
etc.– Not as good as web application firewall but may buy some time
Filtering examplepublic function query($sql)
{
if (preg_match('/^\s*set\s+.i'), $sql)
{
throw new Exception('SET not permitted');
} // END if
return parent::query($sql);
} // END method
Scaling Reads
• Architecture review – master/slave topology
Master/Slave Topology
R/O VIP
Web servers
master
slaves
write
read
replication
Scaling Reads• Architecture review – master/slave topology• Separation of read and writes
– Looked at MySQL Proxy for redirecting reads – could not handle volume– If all queries pass through a central DAL, no need to rewrite the entire app
• Basic concept- SELECTs to slave(s), DML to master• Use a regex to detect/redirect SELECT statements to slave(s)• Also valuable if most of application sends reads to slave(s) but some parts
still read from master• Can log queries hitting the master rather than redirect• DIY load-balancing (load-balancing on a budget)
– Not the most efficient method – good when time/budget is very limiting– Use randomizer function – don’t use a loop or attempt round robin– Not real load-balancing and no way to know if a slave has failed
Redirecting SELECTs
Call query method
Is query a SELECT?
Open slave db connection
Execute query on slave
ReturnExecute query on master
Yes
No
Logging SELECTs
Call query method
Is query aSELECT?
ANDconnected to
master
Execute query Return
Log query
Yes
No
Load Balancing on a Budgetclass SampleDAL extends mysqli
{
public function __construct($host, $user, $pass, $db)
{
if ($host == ‘dbslave1.example.com’)
{
$slave_list = array(‘dbslave1.example.com’,
‘dbslave2.example.com’)
return parent::__construct($slave_list[mt_rand(0, count($slave_list))],
$user,
$pass,
$db);
} // END if
return parent::__construct($host, $user, $pass, $db)
} // END method
} // END class
Caching
• MySQL query cache is generally inefficient– Single mutex on cache shared by all threads– Cached result expires with changes to any of the joined tables
regardless of whether or not the result is affected.– Cache not shared across multiple slaves – each slave must re do the
same work– Real time data may not be required
Caching
• Create a read-through cache by adding a caching component into the DAL that will offload queries to a cached object store
• Memcache is most popular, others can be used.– In theory results could even be serialized, written to disk, using atime
to see if expired
Caching• Allow developers to specify a TTL for each query as a parameter,
inherit a default, or explicitly disable caching• Hash of SELECT statement for cache key – MD5 is good• Append query with TTL as comment. If query becomes slow,
increasing the TTL may be a better option than query tuning.• CMS system and front end presentation layer share the DAL?
Bypass caching for administrative users• Splitting a large join into several smaller queries can often be more
efficient. Only parts will be executed when they expire.• Delayed/lazy-loaded connections – don’t connect to DB until
needed• Empty result set cache
Caching – Query Method
Call query method
Is query cacheable? Create hash key Is result
cached?
Retrieve from cache and store as class property
Return
Valid DB connection?
Execute query
Connect to DB Append TTL as comment
Yes Yes
Yes
No
No No
Caching – get results method
Call get results method
Is result empty?
Are results cached?
Replace empty array with #
Retrieve results from DB
Is result cacheable?
Retrieve results from class property
Return resultIs result a #
Convert # to empty array
Write result into cache
Yes YesYes
Yes
No
No
No
No
Caching - Things to consider
• Queries with a full current timestamp will defeat caching– Maximum useful cache life will be 1 second– Consider rounding to a defined interval, 1min, 5min, 20min, etc.– Or consider using the NOW() function – queries are identical
regardless of when executed
• If the application receives a high volume of traffic, adding a reverse proxy cache like squid or varnish should be considered
Logging and Error Handling• Grab URL and append the SQL as a comment. May aide in identifying and locating
slow or problematic SQL.• Multiple slaves in a load-balanced pool? Grab the hostname – it may be valuable
as error output when debugging.• Check for errors to determine if query executed successfully. If not throw an
exception.• Smart error handling – look at error code for different resulting actions on
different failures. – Reconnect once on terminated connection – useful for connection pooling– Log specific errors
• Application level general log. If multiple applications are connecting to the same DB server, logging at the DAL will give visibility into a targeted application for debugging.
• Log queries that disable/bypass caching for review• Capture timestamp pre/post query execution
Conclusion
• Concepts are portable to other languages and databases• DALs don’t have to be heavy and require large overhead• DAL’s can increase performance• Examples extended mysqli – not necessary to
extend/overload, just map methods or create new ones
Thanks!
Questions/thoughts?