sql. what is sql? dml ddl partially declarative based on the algebra via the tuple calculus, and...
TRANSCRIPT
SQL
What is SQL?• DML• DDL • Partially declarative• Based on the algebra via the tuple calculus, and therefore
its core has an elegant set-theoretic foundations• Provides a sound foundation for mathematically precise
optimization• Has very simple structure; optimized for moving large
numbers of tuples at once• Set-based, retrieval-based
What are relational databases?
Tables and SQL, a programming language engineered for
high volume data applications
“must always be correct” data management apps
transaction-based applications
non-network, non-object based data
Chapter 3: The syntax of SQL
SELECT select_list [FROM table_source] [WHERE search_condition] [ORDER BY order_by_list] [LIMIT row_limit]
Details of the Select clause
SELECT [ALL|DISTINCT] column_specification [[AS] result_column] [, column_specification [[AS] result_column]]
All columns in a base table
Column name in a base table
Calculation
Function
Details of Select, continued• Select * means all columns (attributes)• Using arithmetic
• SELECT invoice_total - payment_total – credit_total AS balance_due
• Using a function• SELECT CONCAT(first_name, ' ', last_name) AS full_name
• Renaming a column• SELECT invoice_number AS "Invoice Number",
invoice_date AS Date, invoice_total AS Total
FROM invoices
Operator precedence
Operator Name Order of precedence * Multiplication 1 / Division 1 DIV Integer division 1 % (MOD) Modulo (remainder) 1 + Addition 2 - Subtraction 2
Calculations & concatenation & strings
SELECT invoice_total, payment_total, credit_total,
invoice_total - payment_total - credit_total AS balance_due
FROM invoices
SELECT vendor_city, vendor_state, CONCAT(vendor_city, vendor_state)
FROM vendors
SELECT vendor_contact_first_name,vendor_contact_last_name CONCAT(LEFT(vendor_contact_first_name, 1),
LEFT(vendor_contact_last_name, 1)) AS initials
FROM vendors
Date function, round functionSELECT invoice_date,
DATE_FORMAT(invoice_date, '%m/%d/%y') AS 'MM/DD/YY',
DATE_FORMAT(invoice_date, '%e-%b-%Y') AS 'DD-Mon-YYYY'
FROM invoices
SELECT invoice_date, invoice_total,
ROUND(invoice_total) AS nearest_dollar,
ROUND(invoice_total, 1) AS nearest_dime
FROM invoices
Where clause formatWHERE [NOT] search_condition_1 {AND|OR}
[NOT] search_condition_2 ...
WHERE NOT (invoice_total >= 5000
OR NOT invoice_date <= '2011-08-01')
WHERE invoice_total < 5000
AND invoice_date <= '2011-08-01'
In phrase and nested selectsWHERE test_expression [NOT] IN ({subquery|expression_1 [, expression_2]...})
WHERE vendor_id IN
(SELECT vendor_id
FROM invoices
WHERE invoice_date = '2011-07-18')
Like clauseWHERE vendor_city LIKE 'SAN%'
Cities that will be retrieved
“San Diego”, “Santa Ana”
WHERE vendor_name LIKE 'COMPU_ER%'
Vendors that will be retrieved
“Compuserve”, “Computerworld”
Order By clauseSELECT vendor_name,
CONCAT(vendor_city, ', ', vendor_state, ' ',
vendor_zip_code) AS address
FROM vendors
ORDER BY vendor_name
Chapter 4: Manipulating multiple tables
• Join• Equijoin• Natural join• Outer join• Self join• N-way join
Explicit versus implicit joins
SELECT vendor_name, invoice_number, invoice_date,
line_item_amount, account_description
FROM vendors v
JOIN invoices i
ON v.vendor_id = i.vendor_id
JOIN invoice_line_items li
ON i.invoice_id = li.invoice_id
JOIN general_ledger_accounts gl
ON li.account_number = gl.account_number
WHERE invoice_total - payment_total - credit_total > 0
ORDER BY vendor_name, line_item_amount DESC
SELECT invoice_number, vendor_name
FROM vendors v, invoices i
WHERE v.vendor_id = i.vendor_id
ORDER BY invoice_number
Left outer join and natural joinSELECT vendor_name, invoice_number, invoice_total
FROM vendors LEFT JOIN invoices
ON vendors.vendor_id = invoices.vendor_id
ORDER BY vendor_name
SELECT invoice_number, vendor_name
FROM vendors
NATURAL JOIN invoices
ORDER BY invoice_number
Union operatorSELECT 'Active' AS source, invoice_number,
invoice_date, invoice_total
FROM active_invoices
WHERE invoice_date >= '2011-06-01'
UNION
SELECT 'Paid' AS source, invoice_number,
invoice_date, invoice_total
FROM paid_invoices
WHERE invoice_date >= '2011-06-01'
ORDER BY invoice_total DESC
Chapter 5: Aggregates in queriesAVG([ALL|DISTINCT] expression)
SUM([ALL|DISTINCT] expression)
MIN([ALL|DISTINCT] expression)
MAX([ALL|DISTINCT] expression)
COUNT([ALL|DISTINCT] expression)
COUNT(*) – counts nulls
Issues with aggregate data• This is when we intentionally create a new form of “object”
and shift to ones that are deliberately not related to individual tuples
• It’s selections and projections that often unintentionally lose a perspective on object identity
Aggregate examplesSELECT COUNT(*) AS number_of_invoices,
SUM(invoice_total – payment_total – credit_total)
AS total_due
FROM invoices
WHERE invoice_total – payment_total – credit_total > 0
SELECT 'After 1/1/2011' AS selection_date,
COUNT(*) AS number_of_invoices, ROUND(AVG(invoice_total), 2) AS avg_invoice_amt,
SUM(invoice_total) AS total_invoice_amt
FROM invoices
WHERE invoice_date > '2011-01-01‘
SELECT 'After 1/1/2011' AS selection_date,
COUNT(*) AS number_of_invoices,
MAX(invoice_total) AS highest_invoice_total,
MIN(invoice_total) AS lowest_invoice_total
FROM invoices
WHERE invoice_date > '2011-01-01'
Aggregate examples, continuedSELECT MIN(vendor_name) AS first_vendor,
MAX(vendor_name) AS last_vendor,
COUNT(vendor_name) AS number_of_vendors
FROM vendors
SELECT COUNT(DISTINCT vendor_id) AS number_of_vendors,
COUNT(vendor_id) AS number_of_invoices,
ROUND(AVG(invoice_total), 2) AS avg_invoice_amt,
SUM(invoice_total) AS total_invoice_amt
FROM invoices
WHERE invoice_date > '2011-01-01'
Having clause (with group by)SELECT select_list
FROM table_source
[WHERE search_condition]
[GROUP BY group_by_list]
[HAVING search_condition]
[ORDER BY order_by_list]
Group by and Having: complexities• Group by: how to vertically group tuples• Having: which groups will be included in the final result• Note: All of this is after the Where clause is applied• Group by: based on columns or expressions that have
columns in them• If there are any calculations done in the Select clause, this
happens after the Group by clause; i.e., it is performed for each group that results from the Group by
• Group by can be nested if you specify more than one column
• Order by operators are performed after the Having
Having example, group by exampleSELECT vendor_id, ROUND(AVG(invoice_total), 2)
AS average_invoice_amount
FROM invoices
GROUP BY vendor_id
HAVING AVG(invoice_total) > 2000
ORDER BY average_invoice_amount DESC
(Note: 2 is the number of decimals in the result.)
SELECT vendor_name,
COUNT(*) AS invoice_qty,
ROUND(AVG(invoice_total),2) AS invoice_avg
FROM vendors JOIN invoices
ON vendors.vendor_id = invoices.vendor_id
WHERE invoice_total > 500
GROUP BY vendor_name
ORDER BY invoice_qty DESC
These are the same…SELECT invoice_date,
COUNT(*) AS invoice_qty,
SUM(invoice_total) AS invoice_sum
FROM invoices
GROUP BY invoice_date
HAVING invoice_date BETWEEN '2011-05-01' AND '2011-05-31'
AND COUNT(*) > 1
AND SUM(invoice_total) > 100
ORDER BY invoice_date DESC
SELECT invoice_date, COUNT(*) AS invoice_qty,
SUM(invoice_total) AS invoice_sum
FROM invoices
WHERE invoice_date BETWEEN '2011-05-01' AND '2011-05-31'
GROUP BY invoice_date
HAVING COUNT(*) > 1 AND SUM(invoice_total) > 100
ORDER BY invoice_date DESC
We will shift our focus a bit…• We’ll run the actually queries that appear in chapters 5
onward• The slides will contain overview material and not the
actual queries
Chapter 6: subqueries• Often used to pass an aggregate value to a parent query• Often a good way to book-keep what might have been a
very complex WHERE clause, with perhaps a multiway join
• A good way to make a query look more readable to someone who uses it later
• Subqueries can also be reused in other queries• Note: you cannot use the SELECT attributes from the
embedded query in the parent query unless it directly references the appropriate table(s) in the outer FROM clause
Subqueries, continued• Important tool: IN operator, which is a set “element of”
operator (written with an epsilon), and you can negate the IN (not IN)
• Often you use SOME or ALL or ANY, if you want to return multiple values (i.e., multiple tuples, perhaps with only one attribute each)
• The default if you don’t use one of the operators above is to return a single value
• ALL is a set “for all elements of” operator (written with an upside down A)
• ANY is a set “there exists” operator (written with a backward E)
Subqueries, continued• The default is that a subquery only executes once, but
you can use a “correlated” query so that it will run once for each row processed by the parent query. • This breaks the “execute from the inside out” paradigm on an
uncorrelated subquery.• EXISTS is often used with correlated queries
• You can put a subquery in a HAVING, FROM, or SELECT clause as well• But such queries get messy and we will skip this for now.
• It’s a good idea to write and test subqueries independently whenever possible, unless they are trivial
Examples from chapter 6• 7: passing a single value• 6: ANY• 3: IN• 8: NOT EXISTS• 5: ALL
Chapter 7: changing the database state• This is when we need a transaction protocol• Updates must never overlap with each other or with read-
only queries• Read-only queries can overlap• But we want to increase through put by supporting as
much “concurrency” as possible• Each transaction has the potential to update the DB state
2 Phase Transactions• Each SQL program is within a begin and end transaction
pair• Each transaction has its own workspace for DB items it is
going to update• Any transactions that overlap in execution time will appear
to have run in some serial order• This is done by transactions requesting read and write locks
(also known as shared and exclusive locks)• Read locks can be shared with other readers• Write locks cannot be shared with readers or writers• All locks held until the end of the transaction• They are released and then the changes that a transaction
has made are moved to the DB
Serializability of transactions• The net effect is that the transactions that overlap in
execution time appear to have run in some serial order• Transactions can be undone by throwing away the local
store (conceptually, at least)• The write period at the end of the transaction must be
atomic• The two phases:
• Request read, write, upgrade locks (and wait on locks) and process
• Release locks and move updates to the DB
• There is a notion of “serializability” which means that the actual schedule of executed steps corresponds to some serial order of running the transactions
Interesting facts on transactions…• Various legal schedules might produce different results• A crash during phase two can lead the database
inconsistent• The transaction manager uses a lot of overhead
resources handling locks• We still need to be able to roll the database back and
rerun transaction logs • User must control the nature of overlapping transactions
or there might be very little true concurrency• In a distributed database, the lock manager is a
bottleneck because all components of the database must move in lockstep
Updating data: changing the DB stateINSERT INTO invoices VALUES
(115, 97, '456789', '2011-08-01', 8344.50, 0, 0, 1, '2011-08-31', NULL);
INSERT INTO invoices VALUES
(116, 97, '456701', '2011-08-02', 270.50, 0, 0, 1, '2011-09-01', NULL),
(117, 97, '456791', '2011-08-03', 4390.00, 0, 0, 1, '2011-09-02', NULL),
(118, 97, '456792', '2011-08-03', 565.60, 0, 0, 1, '2011-09-02', NULL);
Updating tables, continuedUSE ex;
INSERT INTO color_sample (color_number)
VALUES (606);
INSERT INTO color_sample (color_name)
VALUES ('Yellow');
INSERT INTO color_sample
VALUES (DEFAULT, DEFAULT, 'Orange');
INSERT INTO color_sample
VALUES (DEFAULT, 808, NULL);
INSERT INTO color_sample
VALUES (DEFAULT, DEFAULT, NULL);
Updating tables, continuedINSERT INTO invoice_archive
SELECT *
FROM invoices
WHERE invoice_total - payment_total - credit_total = 0;
INSERT INTO invoice_archive
(invoice_id, vendor_id, invoice_number, invoice_total, credit_total,
payment_total, terms_id, invoice_date, invoice_due_date)
SELECT
invoice_id, vendor_id, invoice_number, invoice_total, credit_total,
payment_total, terms_id, invoice_date, invoice_due_date
FROM invoices
WHERE invoice_total - payment_total - credit_total = 0;
Updating, continued
UPDATE invoicesSET payment_date = '2011-09-21', payment_total = 19351.18WHERE invoice_number = '97/522';
UPDATE invoicesSET terms_id = 1WHERE vendor_id = 95;
UPDATE invoicesSET credit_total = credit_total + 100WHERE invoice_number = '97/522';
Updating, continuedDELETE FROM general_ledger_accounts
WHERE account_number = 306;
DELETE FROM invoice_line_items
WHERE invoice_id = 78 AND invoice_sequence = 2;
DELETE FROM invoice_line_items
WHERE invoice_id = 12;
DELETE FROM invoice_line_items
WHERE invoice_id IN
(SELECT invoice_id
FROM invoices
WHERE vendor_id = 115);