ncss: databases and sql · grant lname schultz schultz robinson varvel nemes barber herdman newton...
TRANSCRIPT
Python/sqlite3 DB Design API JOINs 2
Outline
1 Connecting to an SQLite database using Python
2 What is a ‘good’ database design?
3 A nice API
4 More on joins and subqueries...
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 3
Database Application Programming
• Python standard library contains a module for interacting withSQLite databases: sqlite3
• You work primarily with Connection and Cursor objects• The connect method creates a Connection object to a SQLite
database
1 >>> import sqlite3
2 >>> conn = sqlite3.connect('sports.db')3 >>> conn
4 <sqlite3.Connection object at 0x312720>
5 >>>
• Once you are finished, the connection should be closed
6 >>> conn.close()
7 >>>
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 4
Cursor Objects
• Cursors are the standard way to interact with a database fromwithin a programming language
• Cursor objects allow you to execute a query and iteratethrough the results of a query
• A Connection object allows you to obtain a cursor whichpoints into the database
1 >>> cur = conn.cursor()
2 >>> cur
3 <sqlite3.Cursor object at 0x387b00>
4 >>>
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 5
Executing Queries
• The cursor has an execute method which allows you toexecute one SQL query at a result
• The cursor object itself can then be iterated through to obtainthe resultant rows
1 >>> cur.execute('SELECT * FROM events;')2 >>> for row in cur:
3 ... print(row)
4
5 (0, '100m', 16, 'M', '09:10')6 (1, '200m', 16, 'M', '09:15')7 (2, '100m', 17, 'M', '09:00')8 (3, '100m', 17, 'F', '09:05')9 >>>
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 6
Executing Queries
• The fetchone method returns one row at a result, or None
• The previous is equivalent to the following
1 >>> cur.execute('SELECT * FROM events;')2 >>> while True:
3 ... row = cur.fetchone()
4 ... if row is None:
5 ... break
6 ... print(row)
7
8 (0, '100m', 16, 'M', '09:10')9 (1, '200m', 16, 'M', '09:15')
10 (2, '100m', 17, 'M', '09:00')11 (3, '100m', 17, 'F', '09:05')12 >>>
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 7
Data Types: Python vs. SQL
• We know SQLite has 5 data types
• However, Python has a lot more than these five ...• str, int, float, list, dict, set, tuple, ...
• How do we coerce Python data types to SQLite andvice-versa?
SQLite Python
NULL None
INTEGER int
REAL float
TEXT str
BLOB bytes
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 7
Data Types: Python vs. SQL
• We know SQLite has 5 data types
• However, Python has a lot more than these five ...• str, int, float, list, dict, set, tuple, ...
• How do we coerce Python data types to SQLite andvice-versa?
SQLite Python
NULL None
INTEGER int
REAL float
TEXT str
BLOB bytes
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 8
Joining tables
• Why do we need to join tables together?
• Why not store all data in just one huge table?id0012345678
fnameBarryBarryPrueAndrewMathewMaraScottAlecKarenGrant
lnameSchultzSchultzRobinsonVarvelNemesBarberHerdmanNewtonBarberOvzinsky
genderMMFMMFMMFM
age16161716131716161417
ename100m200m100m100m100m100m100m
100m200m
eage16161716131716
1417
egenderMMFMMFMM
M
at09:1009:1509:0509:1009:1009:0509:1009:10
09:00
result00:1500:4000:2000:17
00:2000:21
• Problem: A lot of data is repeated and too many NULLs...
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 8
Joining tables
• Why do we need to join tables together?
• Why not store all data in just one huge table?
id0012345678
fnameBarryBarryPrueAndrewMathewMaraScottAlecKarenGrant
lnameSchultzSchultzRobinsonVarvelNemesBarberHerdmanNewtonBarberOvzinsky
genderMMFMMFMMFM
age16161716131716161417
ename100m200m100m100m100m100m100m
100m200m
eage16161716131716
1417
egenderMMFMMFMM
M
at09:1009:1509:0509:1009:1009:0509:1009:10
09:00
result00:1500:4000:2000:17
00:2000:21
• Problem: A lot of data is repeated and too many NULLs...
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 8
Joining tables
• Why do we need to join tables together?
• Why not store all data in just one huge table?id0012345678
fnameBarryBarryPrueAndrewMathewMaraScottAlecKarenGrant
lnameSchultzSchultzRobinsonVarvelNemesBarberHerdmanNewtonBarberOvzinsky
genderMMFMMFMMFM
age16161716131716161417
ename100m200m100m100m100m100m100m
100m200m
eage16161716131716
1417
egenderMMFMMFMM
M
at09:1009:1509:0509:1009:1009:0509:1009:10
09:00
result00:1500:4000:2000:17
00:2000:21
• Problem: A lot of data is repeated and too many NULLs...
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 8
Joining tables
• Why do we need to join tables together?
• Why not store all data in just one huge table?id0012345678
fnameBarryBarryPrueAndrewMathewMaraScottAlecKarenGrant
lnameSchultzSchultzRobinsonVarvelNemesBarberHerdmanNewtonBarberOvzinsky
genderMMFMMFMMFM
age16161716131716161417
ename100m200m100m100m100m100m100m
100m200m
eage16161716131716
1417
egenderMMFMMFMM
M
at09:1009:1509:0509:1009:1009:0509:1009:10
09:00
result00:1500:4000:2000:17
00:2000:21
• Problem: A lot of data is repeated and too many NULLs...
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 9
Central goal: minimizing redundancy
• Central goal of database design: minimizing data redundancy
• There’s a large body of theoretical work on this
• Data Normalization — in a nutshell:• Keep a table about data of just one concept.
(such as persons or events)• Use foreign keys to link tables in your schema
people
idfnamelnamegenderage
results
eventpersonresult
events
idnameagegenderat
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 9
Central goal: minimizing redundancy
• Central goal of database design: minimizing data redundancy
• There’s a large body of theoretical work on this
• Data Normalization — in a nutshell:• Keep a table about data of just one concept.
(such as persons or events)• Use foreign keys to link tables in your schema
people
idfnamelnamegenderage
results
eventpersonresult
events
idnameagegenderat
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 10
A well-done database schema
• The Good:• each table is about a well defined concept only• updates affect (typically) a single row only
• The Bad:• many tables• tendency to introduce unique IDs• a lot of joins...
• The Ugly:• it’s not always possible and very experience-driven
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 11
Piecing it all together
• How are we going to integrate all of this into our NCSSBook?
• Model-View-Controller mentality
• Loading and writing data to and from a SQLite database• Database is stored in a single flatfile• Ease of setup – no server required• Convenient for version control
• Correct data modelling is important
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 12
Example: Account table
• What do you think about the following users table design?id
123456
user
steveelainemirandajessemathewtim
pass
1234abcdeffoobar1Two3156ytfv1h8joshua
[email protected]@[email protected]@[email protected]@iinet.net.au
school
St. Mary’sSt. Mary’sX Public SchoolY State HighY State HighMercedes College
city
SydneySydneyGlebeTownsvilleTownsvillePerth
friends
2,311,4,53,53,4,6,105,10
• friends should be a separate table rather than a CSV-string
• Better also have schools in a separate table
• Unencrypted passwords? Seriously!?
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 12
Example: Account table
• What do you think about the following users table design?id
123456
user
steveelainemirandajessemathewtim
pass
1234abcdeffoobar1Two3156ytfv1h8joshua
[email protected]@[email protected]@[email protected]@iinet.net.au
school
St. Mary’sSt. Mary’sX Public SchoolY State HighY State HighMercedes College
city
SydneySydneyGlebeTownsvilleTownsvillePerth
friends
2,311,4,53,53,4,6,105,10
• friends should be a separate table rather than a CSV-string
• Better also have schools in a separate table
• Unencrypted passwords? Seriously!?
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 13
Example: Login code
• What do you think about the following login code?1 import sqlite3
2 def login(username, password):
3 conn = sqlite3.connect('ncssbook.db')4 cur = conn.execute('SELECT user, pass FROM users')5 found = False
6 for row in cur:
7 if row[0] == username and row[1] == password:
8 found = True
9 conn.close()
10 return found
• Do not scan a table and filter in Python – use SQL!• row[0] or row[1] relies on the positions; use row["user"]
• Use just one connection for the whole program• Store passwords hashed or encrypted!
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 13
Example: Login code
• What do you think about the following login code?1 import sqlite3
2 def login(username, password):
3 conn = sqlite3.connect('ncssbook.db')4 cur = conn.execute('SELECT user, pass FROM users')5 found = False
6 for row in cur:
7 if row[0] == username and row[1] == password:
8 found = True
9 conn.close()
10 return found
• Do not scan a table and filter in Python – use SQL!• row[0] or row[1] relies on the positions; use row["user"]
• Use just one connection for the whole program• Store passwords hashed or encrypted!
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 14
Example: Login code – a better approach
1 import sqlite3
2
3 conn = sqlite3.connect('ncssbook.db')4
5 def login(username, password):
6 cur = conn.execute('''7 SELECT id
8 FROM users
9 WHERE user=? AND pass=?
10 ''', (username, password))
11 row = cur.fetchone()
12 user_id = None if row is None else row['id']13 conn.commit()
14 return user_id
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 15
Making others’ lives easier
• There are some operations that we’ll do a lot:• Create a (user) row in a table• Find a (user) row in a table• Update a (user) row in a table• Delete a (user) row in a table
• It’s painful to have to write SQL every time we need to dothis
• for every single type of table in the database!
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 16
Idea 1: Let’s use functions
1 def find_user(username, conn):
2 cur = conn.execute('''SELECT * FROM users
3 WHERE user=?''', (username,))
4 row = cur.fetchone()
5 return row
• What’s wrong with this?
• The user of the function has to process the tuple• Exposes changes in the database schema• This is ugly and prone to failure
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 16
Idea 1: Let’s use functions
1 def find_user(username, conn):
2 cur = conn.execute('''SELECT * FROM users
3 WHERE user=?''', (username,))
4 row = cur.fetchone()
5 return row
• What’s wrong with this?• The user of the function has to process the tuple• Exposes changes in the database schema• This is ugly and prone to failure
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 17
Idea 2: Let’s use classes and objects
1 class User:
2 def __init__(self, username, fname, lname):
3 self.username = username
4 self.fname = fname
5 self.lname = lname
6
7 @staticmethod
8 def find(username):
9 cur = conn.execute('''SELECT * FROM users
10 WHERE user=?''', (username,))
11 row = cur.fetchone()
12 if row is None:
13 raise UserNotFound('{} does not exist'.format(username))14 return User(row[0], row[1], row[2])
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 18
Idea 2: Let’s use classes and objects
1 @staticmethod
2 def create(username, fname, lname):
3 cur = conn.execute('''INSERT INTO users
4 VALUES (?, ?, ?)''', (username, fname, lname))
5 return User(username, fname, lname)
6
7 @staticmethod
8 def delete(username):
9 cur = conn.execute('''DELETE FROM users
10 WHERE username = ?''', (username,))
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 19
An API abstracts away from the database
• Defining the User object makes life much nicer for otherprogrammers
• No one else has to write SQL (like the function approach)
• Internalise the tuple processing and produce a standardizedobject representation
• We can transparently make changes to the database schemawithout affecting anyone else
• You should consider creating an object like this for each tablein your database
• And if you’re really good, you could try and figure out how toavoid rewriting for every new database table
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 20
More Joins
• Okay, back to more complex SQL
• We can join more than two tables together
• Example: List the names of all females who ran the 100m inalphabetical order.
1 SELECT fname, lname
2 FROM people p
3 JOIN results r ON p.id = r.person
4 JOIN events e ON e.id = r.event
5 WHERE e.name = '100m'6 AND e.gender = 'F'7 ORDER BY lname;
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 20
More Joins
• Okay, back to more complex SQL
• We can join more than two tables together
• Example: List the names of all females who ran the 100m inalphabetical order.
1 SELECT fname, lname
2 FROM people p
3 JOIN results r ON p.id = r.person
4 JOIN events e ON e.id = r.event
5 WHERE e.name = '100m'6 AND e.gender = 'F'7 ORDER BY lname;
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 21
Nested SQL queries
• Remember: Every SQL query returns its result as a table.
• This means we can nest SQL queries: an outer query cancheck the results of an inner (nested) sub-query.
• We can build complex query from smaller building blocks.
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 22
Nesting SQL Queries (continued)
• Example: Which females ran in the 100m?
1 SELECT r.person
2 FROM results r
3 JOIN events e ON e.id = r.event
4 WHERE e.name = '100m' AND e.gender = 'F'
• Use this as a sub-query to answer the original question:
1 SELECT fname, lname
2 FROM people
3 WHERE id IN (
4 SELECT r.person
5 FROM results r
6 JOIN events e ON e.id = r.event
7 WHERE e.name = '100m' AND e.gender = 'F'8 )
9 ORDER BY lname;
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 22
Nesting SQL Queries (continued)
• Example: Which females ran in the 100m?
1 SELECT r.person
2 FROM results r
3 JOIN events e ON e.id = r.event
4 WHERE e.name = '100m' AND e.gender = 'F'
• Use this as a sub-query to answer the original question:
1 SELECT fname, lname
2 FROM people
3 WHERE id IN (
4 SELECT r.person
5 FROM results r
6 JOIN events e ON e.id = r.event
7 WHERE e.name = '100m' AND e.gender = 'F'8 )
9 ORDER BY lname;
Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 22
Nesting SQL Queries (continued)
• Example: Which females ran in the 100m?
1 SELECT r.person
2 FROM results r
3 JOIN events e ON e.id = r.event
4 WHERE e.name = '100m' AND e.gender = 'F'
• Use this as a sub-query to answer the original question:
1 SELECT fname, lname
2 FROM people
3 WHERE id IN (
4 SELECT r.person
5 FROM results r
6 JOIN events e ON e.id = r.event
7 WHERE e.name = '100m' AND e.gender = 'F'8 )
9 ORDER BY lname;Tim Dawborn Databases & SQL Lecture 2, January, 2016
Python/sqlite3 DB Design API JOINs 23
Scientific Databases: How complex SQL can become...
• Part of our work here at Sydney Uni is to look at howdatabases and SQL can help answering scientific questions.
• For example in the context of genomics:
• Another good example: SkyServerhttp://cas.sdss.org/dr6/en/tools/chart/navi.asp
Tim Dawborn Databases & SQL Lecture 2, January, 2016