part i: introductioncbourke/computerscienceii/modules/module-sql/sql/... · 15 5 11132 fall 2011 23...
Post on 22-Jul-2020
6 Views
Preview:
TRANSCRIPT
Introduction to Databases & SQL
Dr. Christopher M. Bourkecbourke@cse.unl.edu
Outline
1. Introduction
2. Creating Tables
3. Manipulating Data
4. Querying Data
5. Extended Demonstration
6. Designing Databases
7. Extended Demonstration
Part I: Introduction
Motivating ExampleFlat Files
Consider the following data, stored in a flat file:
Course Course Name Student NUID Email
waits, tom 11223344 tomwaits@hotmail.comCSCE 156 Intro to CSII Lou Reed 11112222 reed@gmail.comCSCE 156 Intro to CSII Tom Waits 11223344 twaits@email.comCSCE 230 Computer Hardware Student, J. 12345678 jstudent@geocities.comCSCE 156 Intro to CSII Student, John 12345678 jstudent@geocities.comCSCE 230 Computer Hardware John Student 12345678 jstudent@geocities.comCSCE 235 Discrete Math Student, John 12345678 jstudent@geocities.comCSCE 235 Discrete Math Tom Waits 11223344 twaits@email.comNONE Null Tom Waits 11223344 twaits@email.com
Table: Course Enrollment Data
Problems with Flat Files
Problems:
I Formatting Issues
I Repetition of data
I Incomplete data
I Integrity of data
I Organizational problems: any aggregation requires processing allrecords
I Updating information is difficult (must enumerate all possiblechanges, side effects so that information is not lost)
I Concurrency Issues
Relational Databases
Solution: Relational Database Management System (RDBMS)
I Represents data as tuples (pairs, triples, etc.)
I Data is grouped into relations
I E. F. Codd (IBM, 1970)
Key Aspects I
I Data is stored in tables
I Tables have columns (or fields) that have a unique name and type(integer, string)
I Each column stores a single piece of data
I Each row represents a single record
Key Aspects II
I Each row may have a unique primary key (PK) which uniquelyidentifies each record
I Records in different tables are related to each other through foreignkeys (FK)
I Order of rows/columns is meaningless
Key Aspects III
studentId firstName lastName nuid
1 Tom Waits 112233442 Lou Reed 111122225 John Student 12345678
emailId studentId address
8 2 reed@gmail.com10 1 tomwaits@hotmail.com13 1 twaits@email.com14 5 jstudent@geocities.com
courseId name description
42123 CSCE 156 Computer Science II12333 CSCE 230 Computer Hardware11132 CSCE 235 Discrete Mathematics
enrollmentId studentId courseId semester
15 5 11132 Fall 201123 1 42123 Fall 201124 2 42123 Fall 201110 1 11132 Fall 201129 5 42123 Spring 201132 5 12333 Fall 2011
Key Aspects IV
I Supports Transactions: an interaction or batch of interactions treatedas one unit
I ConstraintsI Allowing or disallowing NULLI Disallowing “bad” values (ranges)I Enforcing formatting (capitalization, precision)I Limiting combinations of data fields
ACID Principles I
I Atomicity – Data transactions must be an all-or-nothing process
I Atomic operation: not divisible or decomposable
I Consistency – Transactions will retain a state of consistency
I All constraints, triggers, cascades preserve a valid state after thetransaction has completed
I Isolation – No transaction interferes or is even aware of another;state transitions are equivalent to serial transactions
I Durability – Once committed, a transaction remains so
I Data is to be protected from catastrophic error (power loss/crash)
RDBMS’s
I Commercial Systems: MS Access, MySQL (Oracle), MariaDB,PostgreSQL, Informix, DB2 SQLServer, Oracle, SQLite
I NoSQL (non relational databases)
I Key-value databases: MongoDB, Cassandra, Spanner
I Graph-based: Neo4j, OrientDB
I Redis
I Database SaaS: Amazon Web Services (Dynamo)
Advantages I
I Data is structured instead of “just there”
I Better organization
I Duplication is minimized (with proper normalization)
I Updating information is easier
I Organization of data allows easy access
I Organization allows aggregation and more complex information
Advantages II
I Data integrity can be enforced (data types and user definedconstraints)
I Faster
I Scalable
I Security
I Portability
I Concurrency
Structured Query Language
We interact with RDBMs using Structured Query Language (SQL)
I Common language/interface to most databases
I Developed by Chamberlin & Boyce, IBM 1974
I Implementations may violate standards: portability issues
I Comments: #
I Keywords are case insensitive
I Create & manage tables: create, alter, drop
Structured Query LanguageCRUD
Basic SQL functionality: CRUD:
I Create – insert new records into existing tables ( insert )
I Retrieve – get a (subset) of data from specific rows/columns( select )
I Update – modify data in fields in specified rows ( update )
I Destroy – delete specific rows from table(s) ( delete )
Part II: Creating a Database
MySQLGetting Started
I Need access to a database server
I Need to use a client to connect to the server
I CLI: mysql
I MySQL Workbenchhttps://www.mysql.com/products/workbench/
1 use databaseName;
2 show tables;
3 describe TableName;
MySQLCSCE Students
I You only have access to a database with the same name as your login
I Host: cse.unl.edu
I Port: 3306
I User: your CSE login ID
I Password: go to https://cse.unl.edu/account
I Do not use the same password as your login
I More: http://cse.unl.edu/faq
Creating TablesSyntax
1 create table TableName (
2 columnName columnType [options],
3 ...
4 );
Options:
I primary key
I auto_increment (for primary keys)
I not null
I default (value)
Column Types
I varchar(n) – variable character field of up to n characters
I integer (or int )
I decimal(65,30) , float , double
I Others: aliases, vendor-specific keywords, date/time, other dataformats (JSON)
Creating TablesExample
1 create table Book (
2 bookId integer primary key auto_increment not null,
3 title varchar(255) not null,
4 author varchar(255),
5 isbn varchar(255) not null default ’’,
6 dewey float,
7 numCopies integer default 0
8 );
Primary Keys
I A primary key gives each record a unique identity
I At most one primary key per table
I No two rows can have the same primary key value
I Must be able to uniquely identify all records (not just those that exist)
I PKs can be one or more columns (composite key)
I Should not use/allow NULL values
I Can/should
I External identifiers should not be used
I Should use integers, not varchars or floats
Keys
I Tables can have multiple (non-primary) keys
I May be a combination of columns (composite key)
I May be declared non-unique in which case it serves as an index(allows database lookup optimization)
I Syntax:key(column1, column2,...)
Indices and Unique Constraints
I The keyword index can also be used
I We can make them unique by using unique
I Syntax: constraint constraintName unique index(columnName)
I Multiple columns: ... index(c1, c2)
Foreign Keys I
I Relations between records in different tables can be made withforeign keys
I A foreign key is a column that references a key (PK or regular key) inanother table
I Inserts cannot occur if the referenced record does not exist
I Foreign keys establish relationships between tables:I One-to-one (avoid)I One-to-many relationsI Many-to-oneI Many-to-Many relations: requires a Join Table
I Syntax:foreign key (columnName) references tableName(columnName)
Demonstration I
Write SQL to define two tables: one for authors and one for books.Simplify the book table, but include an ISBN as well as a constraint tokeep it unique. Model the fact that an author may write more than onebook.
Demonstration ISolution
1 create table Author (
2 authorId integer primary key not null auto_increment,
3 firstName varchar(255),
4 lastName varchar(255) not null
5 );
6
7 create table Book (
8 bookId integer primary key not null auto_increment,
9 authorId integer not null,
10 title varchar(255),
11 isbn varchar(255) not null,
12 numCopies integer default 0,
13 foreign key (authorId) references Author(authorId),
14 constraint uniqueISBN UNIQUE INDEX(isbn)
15 );
Demonstration IISolution
Figure: Entity-Relation Diagram for the Book/Author tables
Part III: Manipulating Data
Inserting Data
I Creating data involves inserting new records into a table
I Values are specified with literals
I Numerical literals
I String literals: use single quote characters
I Syntax:insert into tableName (c1, c2, ...) values (v1, v2, ...);
Inserting Data IExamples
1 insert into Author (firstName, lastName) values
2 (’Normal’, ’Mailer’);
3 --or
4 insert into Author (authorId, firstName, lastName) values
5 (1234, ’Normal’, ’Mailer’);
6
7 insert into Book (title, authorId, isbn) values
8 (’The Naked and the Dead’, 1234, ’978-0312265052’);
9 insert into Book (title, authorId, isbn) values
10 (’The Naked and the Dead’,
11 (select authorId from Author where lastName = ’Mailer’),
12 ’978-0312265052’);
Inserting Data IIExamples
1 --Errors:
2 /*
3 insert into Book (title, authorId, isbn) values
4 (’The Executioner’s Song’, 1234, ’978-0312265052’);
5 */
6
7 insert into Book (title, authorId, isbn) values
8 (’The Executioner\’s Song’, 1234, ’978-1306746281’);
Updating Data
I Existing data can be modified using an update statement
I Should be used in conjunction with clauses
I Syntax:1 update tableName set c1 = v1, c2 = v2, ...
2 where [condition];
Updating DataExamples
1 --dangerous:
2 update Author set firstName = ’Norman’;
3
4 --correct:
5 update Author set firstName = ’Norman’
6 where authorId = 1234;
Deleting Data
I Records can be deleted using a delete statement
I Should be used in conjunction with clauses
I Unless you really want to delete everything
I Syntax:delete from tableName where [condition]
Deleting DataExamples
1 --we cannot delete norman:
2 delete from Author where lastName = ’Mailer’;
3 delete from Author where authorId = 1234;
4
5 delete from Book where bookId = 1;
6 delete from Book where title = ’The Naked and the Dead’;
7
8 --deletes all of Norman’s book entries:
9 delete from Book where authorId = 1234;
10 --now we could delete norman
Part IV: Querying Data
Querying Data
I Data can be retrieved using the select statement
I Syntax:select c1, c2, ... from tableName where [condition];
I Can select all columns by using the * wildcard:select * from Book;
Querying DataExamples
1 select * from Book;
2 select * from Book where authorId = 1234;
3 select * from Book where isbn = ’978-0312265052’;
4
5 select lastName, firstName from Author;
6 select lastName, firstName from Author where authorId = 1234;
Querying DataAliasing
I Names of the columns are part of the database
I SQL allows us to “rename” them in result of our query using aliasing
I Syntax: columnName as columnAlias
I Sometimes necessary if column has no name (aggregates)
1 select title as bookTitle,
2 numCopies as numberOfCopies
3 from Book;
Complex where Clause
I Queries can be quantified using the where clause
I Only records matching the condition will be affected (updated,deleted, selected)
I Compound conditions can be composed using parentheses and theand and or keywords
1 select * from Book
2 where numCopies > 10 AND
3 (title != ’The Naked and the Dead’
4 OR authorId = 1234);
To check nullity: where title is null , where title is not null
distinct Clause
I Many records may have the same column value
I To query only unique values you may use the distinct keyword
1 select distinct lastName from Author;
2 select distinct authorId from Book;
like Clause
I String ( varchar ) values can be searched/partially matched using thelike clause
I Used in conjunction with the string wildcard, %
1 select * from Book where isbn like ’123%’;
2 select * from Author where lastName like ’%ailer%’;
in Clause
I The in clause allows you to do conditionals on a set of values
I May be used in conjunction with a nested query
1 select * from Book where isbn in (’978-0312265052’,
2 ’789-65486548’, ’681-0654895052’);
3
4 select * from Book where authorId in
5 (select authorId from Author where lastName like ’M%’);
order by Clause
I In general, the order of the results of a select clause isnondeterministic
I To impose an order, you can use order by
I Can order along multiple columns
I Can order descending ( desc ) or ascending ( asc ) order
1 select * from Book order by title;
2 select * from Book order by authorId, title desc;
Aggregate Functions
I Aggregate functions allow us to compute data in the databasewithout processing all the data in code
I count provides a mechanism to count the number of records
I Other aggregate functions: max , min , avg , sum
I null values are ignored/treated as zero
Aggregate FunctionsExamples
1 --number of Book records:
2 select count(*) as numberOfTitles from Book;
3 --most common book:
4 select max(numCopies) from Book;
5
6 --nested queries:
7 select * from Book
8 where numCopies = (select max(numCopies) from Book);
GROUP BY clause
I The group by clause allows you to project data with common valuesinto a smaller set of rows
I Used in conjunction with aggregate functions to do more complicatedaggregates
I Example: find total copies of all books grouped by their author
1 select author, sum(numCopies) as totalCopies
2 from Book group by author;
group by clauseExample Illustrated
Table content:
title author numCopies
Naked and the Dead Norman Mailer 10Dirk Gently’s Holistic Detective Agency Douglas Adams 4Barbary Shore Norman Mailer 3The Hitchhiker’s Guide to the Galaxy Douglas Adams 2The Long Dark Tea-Time of the Soul Douglas Adams 1Ender’s Game Orson Scott Card 7
GROUP BY clauseExample Illustrated
Grouping:
title author numCopies
Naked and the Dead Norman Mailer 10Barbary Shore Norman Mailer 3Dirk Gently’s Holistic Detective Agency Douglas Adams 4The Hitchhiker’s Guide to the Galaxy Douglas Adams 2The Long Dark Tea-Time of the Soul Douglas Adams 1Ender’s Game Orson Scott Card 7
group by clauseExample Illustrated
Projection & aggregation:
author totalCopies
Norman Mailer 13Douglas Adams 7Orson Scott Card 7
having clause
I Projected data can be further filtered using the having clause
I having is evaluated after group by which is evaluated after any
where clause
1 select author, sum(numCopies) as totalCopies
2 from Book where isbn like ’%9’
3 group by author
4 having totalCopies > 10;
Joins
A join is a clause that combines records from two or more tables.
I Result is a set of columns/rows (a “table”)
I Results from tables A,B are joined by shared values in specifiedcolumns
I Common to join via foreign keys
I Table names can be aliased for convenienceI Types of joins we’ll look at:
I (inner) joinI left (outer) join
I Other types of joins: Self-join, cross join (cartesian product), rightouter joins, full outer joins
Inner Join I
Inner Join:
I Most common type of join
I Combines rows of table A with rows of table B for all records thatsatisfy some predicate
I Predicate provided by the on clause
I Common to omit “inner” and simply use the join keyword
Inner Join II
1 select * from Book b
2 join Author a on b.authorId = a.authorId;
3
4 select * from Author a
5 join Book b on a.authorId = b.authorId;
6
7 select * from Book b
8 join Author a on b.authorId = b.authorId
9 where a.title like ’The%’;
10
11 select a.firstName as firstName,
12 a.lastName as lastName,
13 b.title as bookTitle
14 from Book b
15 join Author a on b.authorId = b.authorId;
Inner JoinExample Illustrated
1 select s.studentId, s.lastName, e.address
2 from student s
3 join email e on s.studentId = e.studentId;
Inner JoinExample Illustrated
studentId lastName firstName
1234 Castro Starlin5678 Rizzo Anthony1122 Sveum Dale9988 Sandberg Ryne
(a) Student Table
emailId studentId address
1111 9988 rsandberg@cubbies.net1112 9988 rsberg@unl.edu1113 5678 rizzo@cubs.com1114 1234 number13@cubs.com
(b) Email Table
studentId lastName address
1234 Castro number13@cubs.com5678 Rizzo rizzo@cubs.com9988 Sandberg rsandberg@cubbies.net9988 Sandberg rsberg@unl.edu
Table: Joined tables
Left (Outer) Join I
I A Left Outer Join joins table A to table B, preserving all records intable A
I Records in A with no matching records in B will have null valuesfor columns in B
I Syntax: omit “outer” and use left join
Left (Outer) Join II
1 --no difference as a book cannot exist without an author:
2 select * from Book b
3 left join Author a on b.authorId = a.authorId;
4
5 --includes authors even if they have no book records:
6 select * from Author a
7 left join Book b on a.authorId = b.authorId;
8
9 select s.studentId, s.lastName, e.address
10 from student s
11 left join email e on s.studentId = e.studentId;
Left (Outer) Join III
student id last name adddress
1234 Castro number13@cubs.com5678 Rizzo rizzo@cubs.com9988 Sandberg rsandberg@cubbies.net9988 Sandberg rsberg@unl.edu1122 Sveum null
Table: Left-Joined Tables
Other Joins
Nice tutorial: http://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins
Figure: Types of Joins
Part V: Extended Demonstration
SQL Query Demonstrations
Figure: Video Game Database
Part VI: Designing a Database
Database DesignTable Design
I Identify all “entities” in different tables
I Ask “what defines an entity” to determine the columns and theirtypes
I Properly define whether or not a column should be allowed to be nulland/or defaults
I Each table should have a primary key
I Relations between tables should be identified and defined with foreignkeys
I Security concerns: don’t store sensitive information (passwords) inplaintext
Normalization I
Normalizing a database is the process of separating data into differenttables to reduce or eliminate data redundancy and reduce anomolies(preserve integrity) when data is changed. It also reduces the amount ofbook keeping necessary to make such changes.
I 1 Normal Form: the domain of each attribute in each table has onlyatomic values: each column value represents a single value
I Example: Allowing multiple email records for one PersonI Storing these as (say) a CSV in one column is a violation of 1NFI Hardcoding a fixed number of columns (Email1, Email2, Email3) for
multiple values is a violation of 1NFI Separating records out into another table and associating them via a
FK conforms to 1NF
Normalization II
I 2 Normal Form: 1NF and no non-prime attribute is dependent on anyproper subset of prime attributes
I Mostly relevant for tables with composite PKs (multiple columns)I If another, non-prime column is dependent only on a subset of these,
its not 2NFI Must split it out groups of data into multiple tables and relate them
through FKs so that non-prime column is dependent only on the subsetof prime columns
I Example: a purchase record may contain: customerId, storeId,storeLocation
I Using an auto-generated primary key gives you 2NF automatically
Normalization III
I 3 Normal Form: 2NF and no non-prime column is transitivelydependent on the key
I No non-prime column may depend on another non-prime columnI Example: Storing a price-per-unit, quantity, and total (total can be
derived from the other two)I Example: OrderId-CustomerId-CustomerName (OrderId is the PK,
CustomerId is FK, CustomerName should be derivable fromCustomerId)
Normalization IV
I Bottom line: 3NF is common sense; design your database to benormalized to begin with
I Every non-key attribute must provide a fact about the key (1NF), thewhole key (2NF), and nothing but the key (3NF), so help me Codd1
1Edgar Codd, inventor of the relational model, 1970–1971
Best Practice Tip 1Primary Keys
Primary Keys
I Should be integers (not varchars nor floats)
I Best to allow the database to do key management (auto increment)
I Should not be based on external identifiers that are not controlled bythe database (SSNs, ISBNs, etc.)
I Be consistent in naming conventions ( tableNameId )
Good Practice Tip 2Use consistent naming conventions
I Short, simple, descriptive names
I Limit abbreviations, acronymsI Use consistent styling
I Tables: UpperCamelCaseI Columns: lowerCamelCase
I Use singular forms, avoid pluralizations
I Primary key field: tableNameId
I Foreign key fields should match the fields they refer to
I End goal: unambiguous, consistent, self-documenting
Good Practice Tip 3Ensure Good Data Integrity
Data can break code, code should not break data.
I Data/databases are a service to code
I Different code, different modules can access the same data
I The database does not use the code!
I Should do everything you can to prevent bad code from harming data(constraints, foreign & primary keys, etc).
I Database is your last line of defense against bad code
Good Practice Tip 4Keep Business Logic Out!
I Databases offer “programming functionality”
I Triggers, cascades, stored procedures, etc.
I Use them sparingly!!!
I RDMSs are for the management and storage of data, not businesslogic
I Demarcation of responsibility
I DBAs should not have to be Application Programmers, and vice versa
Good Practice Tip 5Hard vs Soft Deletes
I delete removes records entirely (“hard” delete)
I May not want the data to be permanently deleted
I Tables may be given an active/inactive flag (boolean)
I Deleting a record: set the flag to inactive (“soft” delete)
I Application is responsible for only querying for active records
I Extra care must be taken
Part VII: Database Design Demonstration
Designing A DatabaseDemonstration
Design a database to support a course roster system for the University ofNebraska–Lincoln system. The database design should be able to modelstudents (with their unique NUIDs), courses, and their relation (ability ofstudents to enroll in courses). The system will also need to email studentsabout updates in enrollment, so be sure your model is able to incorporatethis functionality.
Designing A Database
Figure: A normalized database design.
End Result
I Pieces of data are now organized and have a specific type
I No duplication of data
I Entities are represented by IDs, ensuring identity (Tom Waits is nowthe same as t. Waits)
I Data integrity is enforced (only one NUID per Student)I Relations are well-defined
I A student has email(s)I A course has student(s) and a student has course(s)
Data from flat file
studentId firstName lastName nuid
1 Tom Waits 112233442 Lou Reed 111122225 John Student 12345678
emailId studentId address
8 2 reed@gmail.com10 1 tomwaits@hotmail.com13 1 twaits@email.com14 5 jstudent@geocities.com
courseId name description
42123 CSCE 156 Computer Science II12333 CSCE 230 Computer Hardware11132 CSCE 235 Discrete Mathematics
enrollmentId studentId courseId semester
15 5 11132 Fall 201123 1 42123 Fall 201124 2 42123 Fall 201110 1 11132 Fall 201129 5 42123 Spring 201132 5 12333 Fall 2011
top related