building a sql database that works
DESCRIPTION
TRANSCRIPT
Building a SQL Databasethat works
Josh BerkusPostgreSQL Experts, Inc.OpenSourceBridge 2009
How Not To Do Itfour popular methods
1. One Big Spreadsheet
2. EAV & E-Blob
ID Property Setting
407 Eyes Brown
407 Height 73in
407 Married? TRUE
408 Married? FALSE
408 Smoker FALSE
408 Age 37
409 Height 66in
ID Properties
407 <eyes=”brown”><height=”73”> <married=”1”><smoker=”1”>
408 <hair=”brown”><age=”49”> <married=”0”><smoker=”0”>
409 <age=”37”><height=”66”> <hat=”old”><teeth=”gold”>
3. Incremental Development
4. Leave It to the ORM
E.F. CoddDatabase Engineer, IBM 1970
IBM Databases Run Amok
1.losing data
2.duplicate data
3.wrong data
4.crappy performance
5.downtime for database redesign whenever anyone made an
application change
The Relational Model
All the Relational ModelYou Need to Know
in less than 10 minutes
Set (Bag) Theory
Relations
Relation(table, view, rowset)
Tuples
Relation(table, view, rowset)
Tuple (row)Tuple (row) Tuple (row)
Tuple (row)Tuple (row)
Attributes
Relation(table, view, rowset)
Tuple (row) Tuple (row)
Tuple (row)Tuple (row)
Tuple (row)Attribute
Attribute
Attribute
Attribute
Attribute
Attribute
Attribute
Attribute
Attribute
Attribute
Attribute
Attribute
Attribute
Attribute
Attribute
Attribute
Attribute
Attribute
Attribute
AttributeAttribute
Attribute Attribute
Domains (types)
Relation(table, view, rowset)
Tuple (row) Tuple (row)
Tuple (row)Tuple (row)
Tuple (row)INT
DATE
TEXT
DATE
TEXT
Attribute
Attribute
DATE
TEXT
Attribute
Attribute
DATE
TEXT
Attribute
Attribute
DATE
TEXT
INTINT
INT INT
Keys
Relation(table, view, rowset)
Tuple (row) Tuple (row)
Tuple (row)Tuple (row)
Tuple (row)Attribute
Attribute
Attribute
Attribute
AttributeKey
Attribute
Attribute
Attribute
Attribute
Attribute
Attribute
Attribute
Attribute
Attribute
Attribute
Attribute
Attribute
Attribute
Attribute
KeyKey
Key Key
Constraints
Relation(table, view, rowset)
Tuple (row) Tuple (row)
Tuple (row)Tuple (row)
Tuple (row)Attribute
Attribute
Attribute
Attribute
Attribute
Attribute
Attribute
Attribute
Attribute
Attribute
Attribute
Attribute
Attribute
Attribute
Attribute
Attribute
Attribute
Attribute
Attribute
AttributeAttribute
Attribute Attribute
Attribute > 5
Foreign Key Constraint
Derived Relation (query)
Problem 1:Losing Data
?
The Atomic Age
Users
Simple User Table
Admins
● name (text)● email (text)● login (text)● password (text)● status (char)
(user, inactive, admin)
Users
Non-Atomic Attributes
Admins
● name (text)● email (text)● login (text)● password (text)● status (char)
(user, inactive, admin)
What's Atomic?
The simplest form of a datum, which is not divisible without loss of information.
nameJosh Berkus
Statusa
SELECT SUBSTR(name,STRPOS(name, ' ')) ...
… WHERE status = 'a' or status = 'u' ...
Statusi
… WHERE ??? ...
What's Atomic?
The simplest form of a datum, which is not divisible without loss of information.
first_nameJosh
activeTRUE
last_nameBerkus
accessa
Users
Table Atomized!
Admins
● first_name (text)● last_name (text)● email (text)● login (text)● password (text)● active (boolean) ● access (char)
Atomic, Shmomic. Who Cares?
● Atomic Values:– retain data
– make joins easier
– make constraints easier
● Non-atomic Values:– make data loss more likely
– increase CPU usage
– make you more likely to forget something
Transactions
Splitting a Bulletin Board Threadthe hard way
INSERT INTO threads VALUES ( .... );If $dbh('success') then
for $these_posts.date > $cutdate loopUPDATE posts SET thread = $newthreadWHERE id = $these_posts.id;
if not $dbh('success') thenfor $these_posts.id > $last_id loop
UPDATE posts SET thread = $oldthreadWHERE id = $these_posts.id;DELETE FROM threadsWHERE id = $newthread;
Splitting a Bulletin Board Threadthe transactional way
BEGIN;INSERT INTO threads VALUES ( .... );$newthread = curval();UPDATE posts SET thread = $newthread
WHERE thread = $oldthreadAND date > $cutdate;
END;
Problem 2:Duplicate Data
nameJosh Berkus
Joshua Berkus
Josh Berkus
Berkus, Josh
user_nameJosh BerkusJosh Berkus
adminJosh BerkusBerkus
user_nameJosh Berkus
Joshua Berkus
Josh
Where are my Keys?
Key
Users
Candidate (Natural) Keys
Admins
● first_name (text)● last_name (text)● email (text)● login (text)● password (text)● active (boolean) ● access (char)
A Good Key
● Should have to be unique because the application requires it to be.
● Expresses a unique predicate which describes the tuple (row):
– user with login “jberkus”
– post from “jberkus” on “2009-05-02 13:41:22” in thread “Making your own wine”
● If you can't find a good key, your table design is missing data.
Key
Surrogate Key
Users
Admins
● first_name (text)● last_name (text)● email (text)● login (text)● password (text)● active (boolean) ● access (char)
● user_id (serial)
We All Just Want to Be Normal
Abby Normal
login level last_name
jberkus u Berkus
selena a Deckelman
login title posted level
jberkus Dinner? 09:28 u
selena Dinner? 09:37 u
jberkus Dinner? 09:44 a
How can I be “Normal”?1. Each piece of data only appears in one relation
– except as a “foreign key” attribute
● No “repeated” attributes
login level last_name
jberkus u Berkus
selena a Deckelman
login title posted
jberkus Dinner? 09:28
selena Dinner? 09:37
jberkus Dinner? 09:44
Problem 3:Wrong Data
Constraints
Users Run Amok
● first_name (text)● last_name (text)● email (text)● login (text)● password (text)● active (boolean) ● access (char)
Ensure Your Data is Consistentno matter where it came from
first_name last_name email login password active levelJosh Berkus TRUE aNULL Kelley k NULL TRUE uMark Twain samuel NULL I
S F gavin twitter FALSE x
[email protected] jberkus [email protected] [email protected]
Users Under Constraint
● first_name (text) length() > 1● last_name (text) length() > 1● email (text) ILIKE '%@%.%'● login (text) length() > 5● password (text) length() > 5● active (boolean) NOT NULL● access (char) IN ( 'a','u' )
note: email and other validators would, of course, be more complex
Foreign Keys
Users
Admins
Posts
attributelogin
Posts Table
● title (text) NOT NULLREFERENCES threads ( title )ON DELETE CASCADE ON UPDATE CASCADE
● posted (timestamp) NOT NULL● user (text) NOT NULL
REFERENCES users ( login )ON DELETE CASCADE ON UPDATE CASCADE
● content (text) NOT NULL
Beautiful Cascades
users.loginJosh Berkusjberkus
selena
posts.contentJosh Berkus
I'm going crazy!
What's up?
Why?
OSB! It's too much!
I told you so ...
jerkyboywww.pornking.com
www.whitehouse.com
www.whiteslavery.com
www.lolcats.com
Problem 4:Crappy Performance
Things We've Already Done
● Atomicization– less CPU on parsing, calculations
● Normalization– less data duplication
– smaller tables
● Transactions– more batches, less iteration
– less locking
Denormalized Derived Relationsmaterialized views for the win
Users
Admins
Posts
user_postcount
Problem 5:Database Changes Cause
Application Downtimeand vice-versa
Stuff We've Already Doneto make our data “agile”
● Atomicization
– data isn't in specific interface version formats● Normalization
– where to extend data is more obvious
– create a new table if you have to● Transactions
– prevent partial failures from changed schema
Views
Extending the Users Table● first_name (text)● last_name (text)● email (text)● login (text)● password (text)● active (boolean) ● access (char)● created (timestamp)● last_login (timestamp)
CREATE VIEW oldapp.users ASSELECT first_name || ' ' || last_name, email, login, password, active, accessFROM users;
The Restyou already know
● Write Migrations– deploy these in transactions, if supported
– if not, write rollback scripts
● Write Tests– use a realistic staging environment
Some Other Tips
● Pick a naming scheme– and stick to it
● Don't do your joins in the application– the database does them better
● Repurposing fields will bite you– sure you'll remember when that changed
● Don't micro-optimize– INT3, anyone?
More Information● me
– www.pgexperts.com
– it.toolbox.com/blogs/database-soup
● postgresql: www.postgresql.org● tutorial at OSCON
– monday, 8:30 am!
– see you in San Jose!
This presentation copyright 2009 Josh Berkus, licensed for distribution under the Creative Commons Attribution License, except for photos, most of which were stolen from other people's websites via images.google.com. Thanks, Google!