ncss: databases and sql · grant lname schultz schultz robinson varvel nemes barber herdman newton...

34
NCSS: Databases and SQL Tim Dawborn Lecture 2, January, 2016

Upload: phamtuyen

Post on 03-May-2018

216 views

Category:

Documents


1 download

TRANSCRIPT

NCSS: Databases and SQL

Tim Dawborn

Lecture 2, January, 2016

Python/sqlite3 DB Design API JOINs 2

Outline

1 Connecting to an SQLite database using Python

2 What is a ‘good’ database design?

3 A nice API

4 More on joins and subqueries...

Tim Dawborn Databases & SQL Lecture 2, January, 2016

Python/sqlite3 DB Design API JOINs 3

Database Application Programming

• Python standard library contains a module for interacting withSQLite databases: sqlite3

• You work primarily with Connection and Cursor objects• The connect method creates a Connection object to a SQLite

database

1 >>> import sqlite3

2 >>> conn = sqlite3.connect('sports.db')3 >>> conn

4 <sqlite3.Connection object at 0x312720>

5 >>>

• Once you are finished, the connection should be closed

6 >>> conn.close()

7 >>>

Tim Dawborn Databases & SQL Lecture 2, January, 2016

Python/sqlite3 DB Design API JOINs 4

Cursor Objects

• Cursors are the standard way to interact with a database fromwithin a programming language

• Cursor objects allow you to execute a query and iteratethrough the results of a query

• A Connection object allows you to obtain a cursor whichpoints into the database

1 >>> cur = conn.cursor()

2 >>> cur

3 <sqlite3.Cursor object at 0x387b00>

4 >>>

Tim Dawborn Databases & SQL Lecture 2, January, 2016

Python/sqlite3 DB Design API JOINs 5

Executing Queries

• The cursor has an execute method which allows you toexecute one SQL query at a result

• The cursor object itself can then be iterated through to obtainthe resultant rows

1 >>> cur.execute('SELECT * FROM events;')2 >>> for row in cur:

3 ... print(row)

4

5 (0, '100m', 16, 'M', '09:10')6 (1, '200m', 16, 'M', '09:15')7 (2, '100m', 17, 'M', '09:00')8 (3, '100m', 17, 'F', '09:05')9 >>>

Tim Dawborn Databases & SQL Lecture 2, January, 2016

Python/sqlite3 DB Design API JOINs 6

Executing Queries

• The fetchone method returns one row at a result, or None

• The previous is equivalent to the following

1 >>> cur.execute('SELECT * FROM events;')2 >>> while True:

3 ... row = cur.fetchone()

4 ... if row is None:

5 ... break

6 ... print(row)

7

8 (0, '100m', 16, 'M', '09:10')9 (1, '200m', 16, 'M', '09:15')

10 (2, '100m', 17, 'M', '09:00')11 (3, '100m', 17, 'F', '09:05')12 >>>

Tim Dawborn Databases & SQL Lecture 2, January, 2016

Python/sqlite3 DB Design API JOINs 7

Data Types: Python vs. SQL

• We know SQLite has 5 data types

• However, Python has a lot more than these five ...• str, int, float, list, dict, set, tuple, ...

• How do we coerce Python data types to SQLite andvice-versa?

SQLite Python

NULL None

INTEGER int

REAL float

TEXT str

BLOB bytes

Tim Dawborn Databases & SQL Lecture 2, January, 2016

Python/sqlite3 DB Design API JOINs 7

Data Types: Python vs. SQL

• We know SQLite has 5 data types

• However, Python has a lot more than these five ...• str, int, float, list, dict, set, tuple, ...

• How do we coerce Python data types to SQLite andvice-versa?

SQLite Python

NULL None

INTEGER int

REAL float

TEXT str

BLOB bytes

Tim Dawborn Databases & SQL Lecture 2, January, 2016

Python/sqlite3 DB Design API JOINs 8

Joining tables

• Why do we need to join tables together?

• Why not store all data in just one huge table?id0012345678

fnameBarryBarryPrueAndrewMathewMaraScottAlecKarenGrant

lnameSchultzSchultzRobinsonVarvelNemesBarberHerdmanNewtonBarberOvzinsky

genderMMFMMFMMFM

age16161716131716161417

ename100m200m100m100m100m100m100m

100m200m

eage16161716131716

1417

egenderMMFMMFMM

M

at09:1009:1509:0509:1009:1009:0509:1009:10

09:00

result00:1500:4000:2000:17

00:2000:21

• Problem: A lot of data is repeated and too many NULLs...

Tim Dawborn Databases & SQL Lecture 2, January, 2016

Python/sqlite3 DB Design API JOINs 8

Joining tables

• Why do we need to join tables together?

• Why not store all data in just one huge table?

id0012345678

fnameBarryBarryPrueAndrewMathewMaraScottAlecKarenGrant

lnameSchultzSchultzRobinsonVarvelNemesBarberHerdmanNewtonBarberOvzinsky

genderMMFMMFMMFM

age16161716131716161417

ename100m200m100m100m100m100m100m

100m200m

eage16161716131716

1417

egenderMMFMMFMM

M

at09:1009:1509:0509:1009:1009:0509:1009:10

09:00

result00:1500:4000:2000:17

00:2000:21

• Problem: A lot of data is repeated and too many NULLs...

Tim Dawborn Databases & SQL Lecture 2, January, 2016

Python/sqlite3 DB Design API JOINs 8

Joining tables

• Why do we need to join tables together?

• Why not store all data in just one huge table?id0012345678

fnameBarryBarryPrueAndrewMathewMaraScottAlecKarenGrant

lnameSchultzSchultzRobinsonVarvelNemesBarberHerdmanNewtonBarberOvzinsky

genderMMFMMFMMFM

age16161716131716161417

ename100m200m100m100m100m100m100m

100m200m

eage16161716131716

1417

egenderMMFMMFMM

M

at09:1009:1509:0509:1009:1009:0509:1009:10

09:00

result00:1500:4000:2000:17

00:2000:21

• Problem: A lot of data is repeated and too many NULLs...

Tim Dawborn Databases & SQL Lecture 2, January, 2016

Python/sqlite3 DB Design API JOINs 8

Joining tables

• Why do we need to join tables together?

• Why not store all data in just one huge table?id0012345678

fnameBarryBarryPrueAndrewMathewMaraScottAlecKarenGrant

lnameSchultzSchultzRobinsonVarvelNemesBarberHerdmanNewtonBarberOvzinsky

genderMMFMMFMMFM

age16161716131716161417

ename100m200m100m100m100m100m100m

100m200m

eage16161716131716

1417

egenderMMFMMFMM

M

at09:1009:1509:0509:1009:1009:0509:1009:10

09:00

result00:1500:4000:2000:17

00:2000:21

• Problem: A lot of data is repeated and too many NULLs...

Tim Dawborn Databases & SQL Lecture 2, January, 2016

Python/sqlite3 DB Design API JOINs 9

Central goal: minimizing redundancy

• Central goal of database design: minimizing data redundancy

• There’s a large body of theoretical work on this

• Data Normalization — in a nutshell:• Keep a table about data of just one concept.

(such as persons or events)• Use foreign keys to link tables in your schema

people

idfnamelnamegenderage

results

eventpersonresult

events

idnameagegenderat

Tim Dawborn Databases & SQL Lecture 2, January, 2016

Python/sqlite3 DB Design API JOINs 9

Central goal: minimizing redundancy

• Central goal of database design: minimizing data redundancy

• There’s a large body of theoretical work on this

• Data Normalization — in a nutshell:• Keep a table about data of just one concept.

(such as persons or events)• Use foreign keys to link tables in your schema

people

idfnamelnamegenderage

results

eventpersonresult

events

idnameagegenderat

Tim Dawborn Databases & SQL Lecture 2, January, 2016

Python/sqlite3 DB Design API JOINs 10

A well-done database schema

• The Good:• each table is about a well defined concept only• updates affect (typically) a single row only

• The Bad:• many tables• tendency to introduce unique IDs• a lot of joins...

• The Ugly:• it’s not always possible and very experience-driven

Tim Dawborn Databases & SQL Lecture 2, January, 2016

Python/sqlite3 DB Design API JOINs 11

Piecing it all together

• How are we going to integrate all of this into our NCSSBook?

• Model-View-Controller mentality

• Loading and writing data to and from a SQLite database• Database is stored in a single flatfile• Ease of setup – no server required• Convenient for version control

• Correct data modelling is important

Tim Dawborn Databases & SQL Lecture 2, January, 2016

Python/sqlite3 DB Design API JOINs 12

Example: Account table

• What do you think about the following users table design?id

123456

user

steveelainemirandajessemathewtim

pass

1234abcdeffoobar1Two3156ytfv1h8joshua

email

[email protected]@[email protected]@[email protected]@iinet.net.au

school

St. Mary’sSt. Mary’sX Public SchoolY State HighY State HighMercedes College

city

SydneySydneyGlebeTownsvilleTownsvillePerth

friends

2,311,4,53,53,4,6,105,10

• friends should be a separate table rather than a CSV-string

• Better also have schools in a separate table

• Unencrypted passwords? Seriously!?

Tim Dawborn Databases & SQL Lecture 2, January, 2016

Python/sqlite3 DB Design API JOINs 12

Example: Account table

• What do you think about the following users table design?id

123456

user

steveelainemirandajessemathewtim

pass

1234abcdeffoobar1Two3156ytfv1h8joshua

email

[email protected]@[email protected]@[email protected]@iinet.net.au

school

St. Mary’sSt. Mary’sX Public SchoolY State HighY State HighMercedes College

city

SydneySydneyGlebeTownsvilleTownsvillePerth

friends

2,311,4,53,53,4,6,105,10

• friends should be a separate table rather than a CSV-string

• Better also have schools in a separate table

• Unencrypted passwords? Seriously!?

Tim Dawborn Databases & SQL Lecture 2, January, 2016

Python/sqlite3 DB Design API JOINs 13

Example: Login code

• What do you think about the following login code?1 import sqlite3

2 def login(username, password):

3 conn = sqlite3.connect('ncssbook.db')4 cur = conn.execute('SELECT user, pass FROM users')5 found = False

6 for row in cur:

7 if row[0] == username and row[1] == password:

8 found = True

9 conn.close()

10 return found

• Do not scan a table and filter in Python – use SQL!• row[0] or row[1] relies on the positions; use row["user"]

• Use just one connection for the whole program• Store passwords hashed or encrypted!

Tim Dawborn Databases & SQL Lecture 2, January, 2016

Python/sqlite3 DB Design API JOINs 13

Example: Login code

• What do you think about the following login code?1 import sqlite3

2 def login(username, password):

3 conn = sqlite3.connect('ncssbook.db')4 cur = conn.execute('SELECT user, pass FROM users')5 found = False

6 for row in cur:

7 if row[0] == username and row[1] == password:

8 found = True

9 conn.close()

10 return found

• Do not scan a table and filter in Python – use SQL!• row[0] or row[1] relies on the positions; use row["user"]

• Use just one connection for the whole program• Store passwords hashed or encrypted!

Tim Dawborn Databases & SQL Lecture 2, January, 2016

Python/sqlite3 DB Design API JOINs 14

Example: Login code – a better approach

1 import sqlite3

2

3 conn = sqlite3.connect('ncssbook.db')4

5 def login(username, password):

6 cur = conn.execute('''7 SELECT id

8 FROM users

9 WHERE user=? AND pass=?

10 ''', (username, password))

11 row = cur.fetchone()

12 user_id = None if row is None else row['id']13 conn.commit()

14 return user_id

Tim Dawborn Databases & SQL Lecture 2, January, 2016

Python/sqlite3 DB Design API JOINs 15

Making others’ lives easier

• There are some operations that we’ll do a lot:• Create a (user) row in a table• Find a (user) row in a table• Update a (user) row in a table• Delete a (user) row in a table

• It’s painful to have to write SQL every time we need to dothis

• for every single type of table in the database!

Tim Dawborn Databases & SQL Lecture 2, January, 2016

Python/sqlite3 DB Design API JOINs 16

Idea 1: Let’s use functions

1 def find_user(username, conn):

2 cur = conn.execute('''SELECT * FROM users

3 WHERE user=?''', (username,))

4 row = cur.fetchone()

5 return row

• What’s wrong with this?

• The user of the function has to process the tuple• Exposes changes in the database schema• This is ugly and prone to failure

Tim Dawborn Databases & SQL Lecture 2, January, 2016

Python/sqlite3 DB Design API JOINs 16

Idea 1: Let’s use functions

1 def find_user(username, conn):

2 cur = conn.execute('''SELECT * FROM users

3 WHERE user=?''', (username,))

4 row = cur.fetchone()

5 return row

• What’s wrong with this?• The user of the function has to process the tuple• Exposes changes in the database schema• This is ugly and prone to failure

Tim Dawborn Databases & SQL Lecture 2, January, 2016

Python/sqlite3 DB Design API JOINs 17

Idea 2: Let’s use classes and objects

1 class User:

2 def __init__(self, username, fname, lname):

3 self.username = username

4 self.fname = fname

5 self.lname = lname

6

7 @staticmethod

8 def find(username):

9 cur = conn.execute('''SELECT * FROM users

10 WHERE user=?''', (username,))

11 row = cur.fetchone()

12 if row is None:

13 raise UserNotFound('{} does not exist'.format(username))14 return User(row[0], row[1], row[2])

Tim Dawborn Databases & SQL Lecture 2, January, 2016

Python/sqlite3 DB Design API JOINs 18

Idea 2: Let’s use classes and objects

1 @staticmethod

2 def create(username, fname, lname):

3 cur = conn.execute('''INSERT INTO users

4 VALUES (?, ?, ?)''', (username, fname, lname))

5 return User(username, fname, lname)

6

7 @staticmethod

8 def delete(username):

9 cur = conn.execute('''DELETE FROM users

10 WHERE username = ?''', (username,))

Tim Dawborn Databases & SQL Lecture 2, January, 2016

Python/sqlite3 DB Design API JOINs 19

An API abstracts away from the database

• Defining the User object makes life much nicer for otherprogrammers

• No one else has to write SQL (like the function approach)

• Internalise the tuple processing and produce a standardizedobject representation

• We can transparently make changes to the database schemawithout affecting anyone else

• You should consider creating an object like this for each tablein your database

• And if you’re really good, you could try and figure out how toavoid rewriting for every new database table

Tim Dawborn Databases & SQL Lecture 2, January, 2016

Python/sqlite3 DB Design API JOINs 20

More Joins

• Okay, back to more complex SQL

• We can join more than two tables together

• Example: List the names of all females who ran the 100m inalphabetical order.

1 SELECT fname, lname

2 FROM people p

3 JOIN results r ON p.id = r.person

4 JOIN events e ON e.id = r.event

5 WHERE e.name = '100m'6 AND e.gender = 'F'7 ORDER BY lname;

Tim Dawborn Databases & SQL Lecture 2, January, 2016

Python/sqlite3 DB Design API JOINs 20

More Joins

• Okay, back to more complex SQL

• We can join more than two tables together

• Example: List the names of all females who ran the 100m inalphabetical order.

1 SELECT fname, lname

2 FROM people p

3 JOIN results r ON p.id = r.person

4 JOIN events e ON e.id = r.event

5 WHERE e.name = '100m'6 AND e.gender = 'F'7 ORDER BY lname;

Tim Dawborn Databases & SQL Lecture 2, January, 2016

Python/sqlite3 DB Design API JOINs 21

Nested SQL queries

• Remember: Every SQL query returns its result as a table.

• This means we can nest SQL queries: an outer query cancheck the results of an inner (nested) sub-query.

• We can build complex query from smaller building blocks.

Tim Dawborn Databases & SQL Lecture 2, January, 2016

Python/sqlite3 DB Design API JOINs 22

Nesting SQL Queries (continued)

• Example: Which females ran in the 100m?

1 SELECT r.person

2 FROM results r

3 JOIN events e ON e.id = r.event

4 WHERE e.name = '100m' AND e.gender = 'F'

• Use this as a sub-query to answer the original question:

1 SELECT fname, lname

2 FROM people

3 WHERE id IN (

4 SELECT r.person

5 FROM results r

6 JOIN events e ON e.id = r.event

7 WHERE e.name = '100m' AND e.gender = 'F'8 )

9 ORDER BY lname;

Tim Dawborn Databases & SQL Lecture 2, January, 2016

Python/sqlite3 DB Design API JOINs 22

Nesting SQL Queries (continued)

• Example: Which females ran in the 100m?

1 SELECT r.person

2 FROM results r

3 JOIN events e ON e.id = r.event

4 WHERE e.name = '100m' AND e.gender = 'F'

• Use this as a sub-query to answer the original question:

1 SELECT fname, lname

2 FROM people

3 WHERE id IN (

4 SELECT r.person

5 FROM results r

6 JOIN events e ON e.id = r.event

7 WHERE e.name = '100m' AND e.gender = 'F'8 )

9 ORDER BY lname;

Tim Dawborn Databases & SQL Lecture 2, January, 2016

Python/sqlite3 DB Design API JOINs 22

Nesting SQL Queries (continued)

• Example: Which females ran in the 100m?

1 SELECT r.person

2 FROM results r

3 JOIN events e ON e.id = r.event

4 WHERE e.name = '100m' AND e.gender = 'F'

• Use this as a sub-query to answer the original question:

1 SELECT fname, lname

2 FROM people

3 WHERE id IN (

4 SELECT r.person

5 FROM results r

6 JOIN events e ON e.id = r.event

7 WHERE e.name = '100m' AND e.gender = 'F'8 )

9 ORDER BY lname;Tim Dawborn Databases & SQL Lecture 2, January, 2016

Python/sqlite3 DB Design API JOINs 23

Scientific Databases: How complex SQL can become...

• Part of our work here at Sydney Uni is to look at howdatabases and SQL can help answering scientific questions.

• For example in the context of genomics:

• Another good example: SkyServerhttp://cas.sdss.org/dr6/en/tools/chart/navi.asp

Tim Dawborn Databases & SQL Lecture 2, January, 2016