sql basicsais-grid-2013.jinr.ru/docs/22/6-avtomonov_sql.pdf · in oracle if query has no join...
TRANSCRIPT
• Introduction to SQL
• JOINS
• Advanced queries
• Subqueries
• Common table expressions
• Hierarchical queries
• Analytic functions
AGENDA
Structured Query Language
• Programming language designed to manage data in relational databases (RDBMS).
DDL – Data Definition Language
• Adding, changing or deleting database objects
• Example : DROP TABLE [TABLE];
DML – Data Manipulation Language
• Inserting, deleting and updating data in a database
• Example : DELETE FROM [TABLE];
DCL – Data Control Language
• authorizes users to access and manipulate data
• Example : GRANT SELECT ON [TABLE] TO [USER];
SQL LANGUAGE
A query is an operation that retrieves data from one or more tables or
views.
May contain subqueries
May perform aggregation
Several tables may be joined together to retrieve data.
SELECT STATEMENT
JOINS
EQUIJOIN Values in the two corresponding columns of
the different tables must be equal
NON-EQUIJOIN The relationship between the columns of the
different tables must be other than equal
OUTERJOIN
(LEFT, RIGHT, FULL)
It returns also the rows that do not satisfy the
join condition
SELF-JOIN Joining data in a table to itself
JOINS
An outer join does not require each record in the two joined tables to have a
matching record.
There are 3 kinds of outer joins :
OUTER JOINS
DAVID 1 1 IT
JAMES 2 2 FINANCE
ADAM 3 5 HR
LEFT FULL RIGHT
DAVID 1 1 IT
JAMES 2 2 FINANCE
ADAM 3
DAVID 1 1 IT
JAMES 2 2 FINANCE
5 HR
DAVID 1 1 IT
JAMES 2 2 FINANCE
5 HR
ADAM 3
CARTESIAN PRODUCTS
Select * From OneTwoThree t1, OneTwoThree t2
Cartesian product is a result of a so-called CROSS JOIN.
In Oracle if query has no join condition, then Cartesian
product is returned
© http://dotnetslackers.com
uses equality only comparisons in the join-predicate
Example :
Show department name for all employees.
EQUIJOINS
Select e.emp_name, e.emp_deptno, d.dept_name
From emp e, dept d
Where e.emp_deptno = d.deptno
NON-EQUIJOIN, SELF-JOIN
Select e1.ename,e1.sal,e2.ename,e2.sal
From emp e1, emp e2
Where e1.sal > e2.sal
ENAME SAL ENAME SAL
KING 4500 FORD 3000
KING 4500 JONES 2975
KING 4500 CLARK 2450
KING 4500 MILLER 1300
FORD 3000 JONES 2975
FORD 3000 CLARK 2450
FORD 3000 MILLER 1300
JONES 2975 CLARK 2450
JONES 2975 MILLER 1300
CLARK 2450 MILLER 1300
Self-join allows compare different values against other rows in the same table
Example :
Show the list of employees that have less salary.
Equi-joins:
• ANSI syntax
SELECT e.name, d.name FROM employees e
INNER JOIN departments d ON e.dept_id=d.dept_id;
• Oracle
SELECT e.name, d.name FROM employees e, departments d
WHERE e.dept_id=d.dept_id;
Outer-joins :
• ANSI syntax (LEFT, RIGHT, FULL)
SELECT e.name, d.name FROM employees e
RIGHT OUTER JOIN departments d ON e.dept_id=d.dept_id;
• Oracle
SELECT e.name, d.name FROM employees e, departments d
WHERE e.dept_id(+)=d.dept_id;
JOINS SYNTAX ANSI VS ORACLE
Types Question
SUBQUERIES Who works in the same department as
Clark?
Correlated SUBQUERIES
Who are the employees that receive more
than the average salary of their
department?
Inline Views What are the employees salary and the
minimum salary in their department?
Common table expression
Hierarchical QUERIES What are the people I work under starting with CEO
Analytic functions What is the difference in currency exchange rates
between yesterday an today
ADVANCED SQL QUERIES
A subquery is a query within a query and it is used to answer multiple-part questions.
A subquery can reside in the WHERE clause, the FROM clause or the SELECT clause.
Can contain another subquery
* Can be correlated
SUBQUERIES
If a subquery is in the WHERE (or HAVING) clause of main query it is called a NESTED
SUBQUERY.
Example :
Which employees receive the least salary?
NS does not reference a main query
NS runs only once for entire outer query
NS is evaluated in “bottom-to-top” manner, i.e. inner most query is evaluated first
Select * From emp e
Where e.sal = (Select Min(sal) From emp)
NESTED SUBQUERY (NS)
A CS is a subquery that uses values from the outer query in its WHERE clause.
It is evaluated FOR EACH ROW produced by the parent query.
Example :
Which employees receive more then the average salary of their department?
CS is evaluated in “top-to-bottom” manner, the order is :
− The outer query receives a row
− For each candidate row a CS is executed once
− Depending on CS row is included into the result set
− The process repeats for all rows
Select * From emp e
Where sal > (Select Avg(sal) From emp i
Where i.deptno = e.deptno)
CORRELATED SUBQUERY (CS)
AKA derived table in MSSQL , and subselect in Postgres
An In-line view is a subquery in the FROM clause of a SQL statement just as if it was a
table. It acts as a data source!
What are the employees salary and the MINIMAL salary in their department?
Inline views can be used to define a set of rows for UPDATE or DELETE statements
Update (Select * from emp_new Where sal = 3000)
Set sal = 3100
Delete From (Select * From emp_new Where job = 'ANALYST')
Select e.ename,e.deptno, e.sal, t.msal
From emp e, (Select i.deptno, Min(sal) msal From emp i Group By i.deptno) t
Where e.deptno = t.deptno
Order By e.deptno, e.sal
INLINE VIEWS
There are 4 types of subqueries :
Who works in the same department as KING?
Who works in the same department as KING or SCOTT?
Who works in the same department and under the same boss as SCOTT?
Who works in the same department and under the same boss as SCOTT or ALLEN?
Select * From emp Where deptno = (Select deptno From emp Where ename = 'KING')
Select * From emp Where deptno In (
Select deptno From emp Where ename In ('KING','SCOTT'))
Select * From emp Where (deptno,mgr) =
(Select deptno, mgr From emp Where ename = 'SCOTT')
Select * From emp Where (deptno,mgr) In
(Select deptno, mgr From emp Where ename In ('SCOTT','ALLEN'))
SUBQUERIES
Subqueries can be used to:
Define the set of rows to be inserted into the target table of
an INSERT or CREATE TABLE statement
Define one or more values to be assigned to existing rows in an UPDATE statement
Create Table emp_new As
Select * From emp Where 1=0
Insert Into emp_new
(Select * From emp Where deptno = 20)
Update emp Set (sal, deptno) =
(Select sal,deptno From emp Where ename = 'KING')
Where ename = 'SCOTT'
SUBQUERIES USAGE
CTE - a temporary named result set, derived from a simple query and defined within the execution scope of a SELECT, INSERT,UPDATE, or DELETE statement.
WITH clause is part of SQL-99 standard, Oracle supports CTE since version 9iR2
CTE can be considered as views within one specific query.
Single WITH section can contain several temporary result set definitions.
Optimizer may treat it as an inline view or as a temporary table.
Developer can influence the decision by using hints : MATERIALIZE or INLINE.
WITH
subquery_name
AS
(the aggregation SQL statement)
SELECT
(query naming subquery_name);
COMMON TABLE EXPRESSION.
(SUBQUERY FACTORING)
Example:
Show each employees manager name and the number of people in the managers
department.
SELECT e.ename,
dc1.dept_count,
m.ename,
dc2.dept_count
FROM emp e,
(SELECT deptno, COUNT(*) AS
dept_count
FROM emp
GROUP BY deptno) dc1,
emp m,
(SELECT deptno, COUNT(*) AS
dept_count
FROM emp
GROUP BY deptno) dc2
WHERE e.deptno = dc1.deptno
AND e.mgr = m.empno
AND m.deptno = dc2.deptno;
WITH dept_count AS (
SELECT deptno, COUNT(*) AS dept_count
FROM emp
GROUP BY deptno)
SELECT e.ename,
dc1.dept_count,
m.ename,
dc2.dept_count
FROM emp e,
dept_count dc1,
emp m,
dept_count dc2
WHERE e.deptno = dc1.deptno
AND e.mgr = m.empno
AND m.deptno = dc2.deptno;
Recursive CTE
Oracle-specific CONNECT BY
Recursive CTE A recursive CTE requires four elements :
1. Anchor query (runs once and results form the top node in the hierarchy)
2. Recursive query (runs multiple times, forms remaining nodes)
3. UNION ALL statement to bind the Anchor and Recursive queries together.
4. INNER JOIN statement to bind the Recursive query to the results of the CTE.
HIERARCHICAL QUERY
Oracle has its own syntax to retrieve hierarchical data :
START WITH – specifies the root row(s) of the hierarchy.
CONNECT BY – specifies the relationship between parent rows and child rows of the hierarchy.
Reference between rows is defined by PRIOR operator. There could be several PRIOR
conditions.
Select empno,lpad(ename,Level*8,' ')
ename,mgr,Level
From emp
Connect By mgr = Prior empno
Start With ename = 'JONES'
empno ename mgr level
7566 JONES 7839 1
7788 SCOTT 7566 2
7876 ADAMS 7788 3
7902 FORD 7566 2
7369 SMITH 7902 3
CONNECT BY
Stored data may contain infinite loops, Oracle detects such cases and returns runtime error.
NB:
Hierarchical queries can be used to generate specific number of rows. The query below generates 10
sequence numbers :
The NOCYCLE parameter in the CONNECT BY condition causes Oracle to return the rows in spite of
the loop.
LOOPS AND PSEUDOCOLUMNS
Select Level From dual Connect By Level <= 10
Appeared in Oracle 8.1.6 in 1999
Accepted as ANSI standard SQL2003 (but termed "windowing functions")
Supported by major RDBMS : Oracle, DB2, PostgreSQL, etc. (but not MySQL and
SQLite)
AF usage can provide the following benefits, compared to regular queries:
Concise wording
- Many traditional non-analytical queries are hard to understand and debug.
Reduced network load
- One AF query can replace several regular ones.
Computing is done on DB server
- Calculations can be transferred from client to DB server, which can be more
appropriate for big data processing.
Better query performance
- Optimizer is aware of AF and special plans may improve performance.
ANALYTIC FUNCTIONS (AF)
Select * From currency Order By Day
DAY EURCHF
04/04/2013 1.078
05/04/2013 1.11
06/04/2013 1.029
07/04/2013 1.041
08/04/2013 1.066
Select c.*,
Lag(eurchf,1) Over (Order By Day) prev_eurchf
From currency c;
DAY EURCHF PREV_EURCHF
04/04/2013 1.078
05/04/2013 1.11 1.078
06/04/2013 1.029 1.11
07/04/2013 1.041 1.029
08/04/2013 1.066 1.041
Select day, EURCHF, Round(((EURCHF - prev_eurchf) / prev_eurchf )*100,3) pct_change
From ( Select day, EURCHF, LAG(EURCHF,1) Over (Order By Day) prev_eurchf
From currency)
DAY EURCHF PCT_CHANGE
04/04/2013 1.078
05/04/2013 1.11 2.968
06/04/2013 1.029 -7.297
07/04/2013 1.041 1.166
08/04/2013 1.066 2.402
-8
-6
-4
-2
0
2
4
5 6 7 8
ANALYTIC FUNCTION EXAMPLE
For analytic functions, you can use all of the regular group functions :
• SUM
• MAX
• MIN
• AVG
• COUNT
Plus list of additional analytical functions that can be used only for window queries :
• LAG
• LEAD
• FIRST
• LAST
• FIRST_VALUE
• LAST_VALUE
• ROW_NUMBER
• DENSE_RANK
ANALYTIC FUNCTIONS
Genaral syntax of analytic function :
Analytic clause syntax :
SELECT analytic_function([ arguments ])
OVER ([analytic_clause])
FROM TABLE
[PARTITION BY [expression list]]
ORDER BY [sort spec] [window spec]
AF SYNTAX
PRECEDING и FOLLOWING set the upper and lower bounds for aggregation.
Example: RANGE UNBOUNDED PRECEDING means “apply AF to every row from the
start of partition until the current one”
There can be different types of windows:
• Cumulative aggregate :
One of the bound is fixed (1st row of partition), the other is moving with the current row.
• Upper and lover bounds are fixed against the current row. The whole window is
moving.
As a special case bounds can be symmetrical with respect to the current row.
Can be created for a Range of values (RANGE) or Rows shift (ROWS)
WINDOWS
• Basic aggregation functions decrease level of detail, while AF don’t :
Basic aggregation Analytic function
Select deptno, Count(1) cnt
From emp Where deptno = 10
Group By deptno
Select ename,deptno,
Count(1) Over() cnt
From emp Where deptno = 10
ENAME DEPTNO CNT
CLARK 10 3
KING 10 3
MILLER 10 3
DEPTNO CNT
10 3
AF vs. BASIC AGGREGATION
Function type Usage
Ranking functions Position of employee in a salary
rating
“Floating” window
For each person what’s the average
salary of all employees on the same
position hired within last 5 months.
Ratio calculation
Which part of the total sum of
salaries for all managers does each
manager receive.
Access to following/preceding rows What the next higher and previous
lower salaries for an employee.
Complex math (variance, etc.) Data mining and analytics
TYPES OF ANALYTIC FUNCTIONS
QUESTIONS?