databases and sql - information sciences institutecreate table users(name varchar(128), email...
TRANSCRIPT
Databases and SQL
4
What Is a Database• Like a dictionary, database software is designed to keep the
inserting and accessing of data very fast, even for large amounts of data.
• Database software maintains its performance by building indexes as data is added to the database to allow the computer to jump quickly to a particular entry.
• There are many different database systems including: Oracle, MySQL, Microsoft SQL Server, PostgreSQL, and SQLite.
• We focus on SQLite because it is a very common database and is already built into Python.
• SQLite is designed to be embedded into other applications to provide database support within them. For example, the Firefox browser also uses the SQLite database internally.
5
(Relational) Database Concepts
• You can think of a database as a spreadsheet with multiple sheets. The primary data structures in a database are: tables, rows, and columns.
6
7
Terminology
• Database - contains many tables
• Relation (or table) - contains tuples and attributes
• Tuple (or row) - a set of fields that generally represents an “object” like a person or a music track
• Attribute (also column or field) - one of possibly many elements of data corresponding to the object represented by the row
8
Tables / Relations
Columns / Attributes
Rows /Tuples
Create a Database File and a Table Named ‘Tracks’ with Two Columns
import sqlite3
conn = sqlite3.connect('music.sqlite3')
cur = conn.cursor()
cur.execute('DROP TABLE IF EXISTS Tracks ')
cur.execute('CREATE TABLE Tracks (title TEXT, plays INTEGER)')
conn.close()
9
Let’s Try it!
Removes table ‘Tracks’ if it exists
Creates a new table ‘Tracks’ with text columnnamed ‘title’ and integer column named ‘plays’
The Functions We Used• The connect operation makes a “connection” to the
database stored in the file music.sqlite3 in the current directory.
• If the file does not exist, it will be created.
• This is called a “connection” because sometimes the database is stored on a separate “database server”. In our examples the database will just be a local file.
• A cursor is like a file handle that we can use to perform operations on the data stored in the database. Calling cursor() is very similar conceptually to calling open() when dealing with text files.
10
We Can Put Data in the Tableimport sqlite3
conn = sqlite3.connect('music.sqlite3') cur = conn.cursor()
cur.execute('INSERT INTO Tracks (title, plays) VALUES ( ?, ? )', ( 'Thunderstruck', 20 ))cur.execute('INSERT INTO Tracks (title, plays) VALUES ( ?, ? )', ('MyWay', 15))conn.commit()
print 'Tracks:'cur.execute('SELECT title, plays FROM Tracks') for row in cur: print row
cur.execute('DELETE FROM Tracks WHERE plays < 100')conn.commit()cur.close()
11
What We Did• We INSERT-ed two rows into our table and used commit()
to force the data to be written to the database file.
• We used the SELECT command to retrieve the rows we just inserted from the table.
• We indicated which columns we would like (title, plays) and indicated which table we want to retrieve the data from.
12
title plays
Thunderstruck 20
My Way 15
Tracks
Continued…• After we execute the SELECT statement, the cursor is
something we can loop through in a for statement.• Finally, we execute an SQL command to DELETE the rows we
have just created.• Notice the use of WHERE to express a ‘selection criterion’.
The command applies only to rows that match the criterion.• And we call commit() again to force execution.
• The output:
13
Tracks:(u'Thunderstruck', 20)(u'My Way', 15)
14
SQL• Structured Query Language is the language we use to issue commands
to the database
• Create a table
• Retrieve some data
• Insert data
• Delete data
http://en.wikipedia.org/wiki/SQL
15
Database Model
http://en.wikipedia.org/wiki/Database_model
A database model or database schema is the structure or format of a database, described in a formal language
supported by the database management system. In other words, a “database model” is the application of a data
model when used in conjunction with a database management system.
16
SQLite Browser
• SQLite is a very popular database - it is free and fast and small• SQLite Browser allows us to directly manipulate SQLite files
• http://sqlitebrowser.org/• SQLite is embedded in Python and a number of other languages
17
SQLite is in lots of software...
http://www.sqlite.org/famous.html
If you search for “sqlite” on a Mac you’ll find it in lots of places!
On My Mac, SQLite Is Used in:
• Python (of course)• Google Drive• Microsoft Office• Anaconda• Adobe Acrobat• Twitter• Firefox• . . .
18
• The Mac’s OS:• The accounts database• The character palettes• . . .
• Safari• Mail• iTunes• Photos
• . . .
19
20
CREATE TABLE Users( name VARCHAR(128), email VARCHAR(128))
21
22
23
The Table with a Four Rows of Data
SQL: Structured Query Language
• Structured Query Language is the language we use to issue commands to the database • Create a table
• Retrieve some data• Insert data• Delete data
24
25
SQL: Insert
• The Insert statement inserts a row into a table
INSERT INTO Users (name, email) VALUES ('Kristin', '[email protected]')
INSERT INTO Users (name,email) VALUES ('Kristin','[email protected]')
26
27
SQL: Delete
• Deletes a row in a table based on a selection criteria
DELETE FROM Users WHERE email='[email protected]'
DELETE FROM Users WHERE email='[email protected]'
28
29
SQL: Update
• Allows the updating of a field with a where clause
UPDATE Users SET name='Charles' WHERE email='[email protected]'
UPDATE Users SET name='Charles' WHERE email='[email protected]'
30
31
Retrieving Records: Select
• The select statement retrieves a group of records - you can either retrieve all the records or a subset of the records with a WHERE clause
SELECT * FROM Users
SELECT * FROM Users WHERE email='[email protected]'
SELECT * FROM Users
SELECT * FROM Users WHERE email='[email protected]'
32
33
34
Sorting with ORDER BY
• You can add an ORDER BY clause to SELECT statements to get the results sorted in ascending or descending order
SELECT * FROM Users ORDER BY email
SELECT * FROM Users ORDER BY name
SELECT * FROM Users ORDER BY emailSELECT * FROM Users ORDER BY name
35
36
SQL Summary
SELECT * FROM Users
SELECT * FROM Users WHERE email='[email protected]'
UPDATE Users SET name="Charles" WHERE email='[email protected]'
INSERT INTO Users (name, email) VALUES ('Kristin', '[email protected]')
DELETE FROM Users WHERE email='[email protected]'
SELECT * FROM Users ORDER BY email
37
This is not too exciting (so far)
• Tables pretty much look like big fast programmable spreadsheets with rows, columns, and commands
• The power comes when we have more than one table and we can exploit the relationships between the tables
38
Complex Data Models and Relationships
http://en.wikipedia.org/wiki/Relational_model
39
Database Design
• Database design is an art form of its own with particular skills and experience
• Our goal is to avoid the really bad mistakes and design clean and easily understood databases
• Others may performance tune things later
• Database design starts with a picture...
40
41
42
Building a Data Model
• Drawing a picture of the data objects for our application and then figuring out how to represent the objects and their relationships
• Basic Rule: Don’t put the same string data in twice - use a relationship instead
• When there is one thing in the “real world” there should be one copy of that thing in the database
Basic Data Modeling
• Data modeling is a sophisticated skill • We will only introduce the most basic concepts of
relational data modeling
• Consider iTunes’ database of tracks/songs and albums
• We don’t just want a list of all songs, because we may also want to know facts about each albums or artists, for example, and we wouldn’t want to repeat those as columns for each song
43
44
Track Len Artist Album Genre Rating Count
45
For each “piece of info”...• Is the column an object or an
attribute of another object?
• Once we define objects, we need to define the relationships between objects. Track
Len
Artist
Album
Genre
Rating
Count
46
Track
Album
Artist
Genre
Len
Rating
Count
Track Rating Len Count
Album belongs-to
Artistbelongs-to
Genre belongs-to
47
Representing Relationships in a Database
48
Database Normalization (3NF)
• There is *tons* of database theory - way too much to understand without excessive predicate calculus
• Do not replicate data - reference data - point at data• Use integers for keys and for references• Add a special “key” column to each table which we will make
references to. By convention, many programmers call this column “id”
http://en.wikipedia.org/wiki/Database_normalization
49
We want to keep track of which band is the “creator” of each music track...What album does this song “belong to”??
Which album is this song related to?
50
Integer Reference Pattern
We use integers to reference rows in another table
Album
Artist
51
Key TerminologyFinding our way around....
52
Three Kinds of Keys
• Primary key - generally an integer auto-increment field
• Logical key - What the outside world uses for lookup
• Foreign key - generally an integer key pointing to a row in another table
Album id title artist_id ...
53
Primary Key Rules
Best practices
• Never use your logical key as the primary key
• Logical keys can and do change, albeit slowly
• Relationships that are based on matching string fields are less efficient than integers
User id login password name email created_at modified_at login_at
54
Foreign Keys
• A foreign key is when a table has a column that contains a key which points to the primary key of another table.
• When all primary keys are integers, then all foreign keys are integers - this is good - very good
Artist id name ...
Album id title artist_id ...
55
Track Rating Len Count
Album belongs-to
Artist
belongs-to
Genre belongs-to
56
Album
belongs-to
Album
idtitle
Track
idtitle
ratinglen
countalbum_id
TablePrimary keyLogical keyForeign key
Track
Len Rating
Count
Title
57
Album
idtitle
Track
idtitle
ratinglen
countalbum_id
TablePrimary keyLogical keyForeign key
Artist
idname
artist_id
Genre
idname
genre_id
Naming FK artist_id is a convention
58
We have relationships!
Artist
Genre
Album
Track
59
Using Join Across Tables
http://en.wikipedia.org/wiki/Join_(SQL)
60
Relational Power
• By removing the replicated data and replacing it with references to a single copy of each bit of data we build a “web” of information that the relational database can read through very quickly - even for very large amounts of data
• Often when you want some data it comes from a number of tables linked by these foreign keys
61
The JOIN Operation
• The JOIN operation links across several tables as part of a select operation
• You must tell the JOIN how to use the keys that make the connection between the tables using an ON clause
62
select Album.title, Artist.name from Album join Artist on Album.artist_id = Artist.id
What we want to see
The tables that hold the data
How the tables are linked
Artist
Album
63
select Track.title, Genre.name from Track join Genre on Track.genre_id = Genre.id
What we want to see
The tables that hold the data
How the tables are linked
64
Additional SQL Topics
• Indexes improve access performance for things like string fields
• Constraints on data - (cannot be NULL, etc..)
• Transactions - allow SQL operations to be grouped and done as a unit
65
Summary
• Relational databases allow us to scale to very large amounts of data
• The key is to have one copy of any data element and use relations and joins to link the data to multiple places
• This greatly reduces the amount of data which much be scanned when doing complex operations across large amounts of data
• Database and SQL design is a bit of an art form
READ CHAPTER 14 IN THE BOOK!!!