dbms module ii

��

��

��

DATABASE MANAGEMENT SYSTEM (FOR BOTH CSE-4TH-SEM/MECH-3RD SEM

DEPARTMENT) MODULE-II

Query Language in which user requests information from the database.

Categories of languages

1. Procedural language.

2. Nonprocedural language.

Procedural Language: in procedural language the user interface the system to perform a sequence of operation on the database to compute the desired result. Example: Relational Algebra. Non-Procedural Language: in non-procedural language the user describe information desired without giving a specific procedure for obtaining that information. Example Tuple & Domain calculus.

Relational Algebra: -It is a Procedural language.

- it consist of a set of operation that take one or two relation as input and

produce a new output.

-Six basic operators are used in procedural language.

1. select: σ

2. project: ∏

3. union: ∪

4. set difference: –

5. Cartesian product: x

6. rename: ρ

��

��

��

The operators take one or two relations as inputs and produce a new

relation as a result.

1. Selection Operation:

- it is unary operations.

- it is represented by the lower Greek letter sigma. (σσσσ))))

- Syntax: <selection-condition>(<Relation>)

-Notation: σ p(r)

-p is called the selection predicate

- Defined as:σp(r) = {t | t ∈ r and p(t)}

Where p is a formula in propositional calculus consisting of terms

Connected by: ∧ (and), ∨ (or), ¬ (not)

Each term is one of: <attribute> op <attribute> or <constant>

Where op is one of: =, ≠, >, ≥. <. ≤

-Example of selection:

σ branch_name=“Perryridge”(account)

2. Project Operation

-Notation: where A1, A2 are attribute names and r is a relation name.n the

result is defined as the relation of k columns obtained by erasing the

columns that are not listed

-Duplicate rows removed from result, since relations are sets

-Example: To eliminate the branch_name attribute of account

∏account_number, balance (account)

3. Union Operation

Notation: r ∪ s -Defined as:r ∪ s = {t | t ∈ r or t ∈ s} -For r ∪ s to be valid.

��

��

��

1. r, s must have the same arity (same number of attributes) 2. The attribute domains must be compatible (example: 2nd column of r deals with the same type of values as does the 2nd column of s) -Example: to find all customers with either an account or a loan ∏customer_name (depositor) ∪ ∏customer_name (borrower)

Union Operation – Example

Relations r, s:

��

4. Set Difference Operation

-Notation r – s - Defined as: r – s = {t | t Î r and t Ï s} - Set differences must be taken between compatible relations. -r and s must have the same arity - Attribute domains of r and s must be compatible

Set Difference Operation – Example

Relations r, s:

r-s

5. Cartesian Product

Operation:

-Notation r x s - Defined as: r x s = {t q | t Î r and q Î s}

��

��

��

- Assume that attributes of r(R) and s(S) are disjoint. (That is, R Ç S = Æ). - If attributes of r(R) and s(S) are not disjoint, then renaming must be used.

Cartesian product Operation – Example

Relations r, s:

rXs

Example Queries: Q1.Find all loans of over $1200 Ans: σamount > 1200 (loan) Q2.Find the loan number for each loan of an amount greater than $1200 σamount > 1200 (loan) Αns: ∏loan_number (σamount > 1200 (loan)) Q3.Find the names of all customers who have a loan, an account, or both, from the bank Αns: ∏customer_name (borrower) ∪ ∏customer_name (depositor)

Q4. Find the names of all customers who have a loan at the Perryridge Branch

Ans: ∏customer_name (σbranch_name=“Perryridge” (σborrower.loan_number = loan.loan_number(borrower x loan)))

Q5. Find the names of all customers who have a loan at the Perryridge branch but do not have an account at any branch of the bank.

Ans: ∏customer_name (σbranch_name = “Perryridge”

(σborrower.loan_number = loan.loan_number(borrower x loan))) – ∏customer_name(depositor)

��

��

��

Q6. Find the names of all customers who have a loan at the Perryridge branch. Ans: ∏customer_name (σbranch_name = “Perryridge” ( σborrower.loan_number = loan.loan_number (borrower x loan)))

Formal Definition:

A basic expression in the relational algebra consists of either one of the following: -A relation in the database -A constant relation - Let E1 and E2 be relational algebra expressions; the following are all relational algebra expressions: -E1 ∪ E2 -E1 – E2 -E1 x E2 -σp (E1), P is a predicate on attributes in E1 -∏s (E1), S is a list consisting of some of the attributes in E1 -ρ x (E1), x is the new name for the result of E1

Additional Operations

We define additional operations that do not add any power to the Relational algebra, but that simplifies common queries. -Set intersection - Natural join - Division - Assignment

Set Intersection Operation:

Notation: r ∩ s Defined as: r ∩ s = { t | t ∈ r and t ∈ s } Assume: r, s have the same arty attributes of r and s are compatible Note: r ∩ s = r – (r – s)

��

��

��

Set Intersection Operation – Example Relation r, s:

r�s Natural Join Operation: -Let r and s be relations on schemas R and S respectively. Then, r s is a relation on schema R ∪ S obtained as follows: -Consider each pair of tuples tr from r and ts from s. -If tr and ts have the same value on each of the attributes in R ∩ S, add a tuple t to the result, where �t has the same value as tr on r

��t has the same value as ts on s - Example: R = (A, B, C, D) S = (E, B, D) -Result schema = (A, B, C, D, E) -r s is defined as: Natural Join Operation – Example Relations r, s:

Lossless Design: Outer Join An extension of the join operation that avoids loss of information. -Computes the join and then adds tuples form one relation that does not match tuples in the other relation to the result of the join. - Uses null values: -null signifies that the value is unknown or does not exist

��

��

��

-All comparisons involving null are (roughly speaking) false by definition. Outer Join – Example Relation loan

Relation borrower

Outer Join – Example:

Left Outer Join: take all the tuples from left relation and match with right relation, which didn’t match with all the attributes.

��

��

��

Right Outer Join: take all the tuples from right relation and match with left relation, which didn’t match with all the attributes with left relation.

Relational Calculus Languages: -Tuple Relational Calculus -Domain Relational Calculus -QuerybyExample (QBE) Tuple Rellational Calculus: -A nonprocedural query language, where each query is of the form {t | P (t ) } - It is the set of all tuples t such that predicate P is true for t - t is a tuple variable, t [A ] denotes the value of tuple t on attribute A - t Î r denotes that tuple t is in relation r - P is a formula similar to that of the predicate calculus. Banking Example Branch (branch_name, branch_city, assets ) Customer (customer_name, customer_street, customer_city ) Account (account_number, branch_name, balance )

��

��

��

Loan (loan_number, branch_name, amount ) Depositor (customer_name, account_number ) Borrower (customer_name, loan_number ) Example Queries -Q1.Find the loan_number, branch_name, and amount for loans of over $1200 Ans: {t | t ∈ loan ∧ t [amount ] > 1200} -Q2.Find the loan number for each loan of an amount greater than $1200 Ans: {t | ∃ s ∈ loan (t [loan_number ] = s [loan_number ] ∧ s [amount ] > 1200)}.Notice that a relation on schema [loan_number ] is implicitly defined by the query. Q3.Find the names of all customers having a loan, an account, or both at the bank Ans: {t | ∃s ∈ borrower ( t [customer_name ] = s [customer_name ]) ∧ ∃u ∈ depositor ( t [customer_name ] = u [customer_name] ) Q4.Find the names of all customers who have a loan and an account at the bank Ans: {t | ∃s ∈ borrower ( t [customer_name ] = s [customer_name ]) ∨ ∃u ∈ depositor ( t [customer_name ] = u [customer_name ]) Q5.Find the names of all customers having a loan at the Perryridge branch Ans: {t | ∃s ∈ borrower (t [customer_name ] = s [customer_name ] ∧ ∃u ∈ loan (u [branch_name ] = “Perryridge”∧ u [loan_number ] = s [loan_number ])) Q6. Find the names of all customers who have a loan at the Perryridge branch, but no account at any branch of the bank Ans: {t | ∃s ∈ borrower (t [customer_name ] = s [customer_name ] ∧ ∃u ∈ loan (u [branch_name ] = “Perryridge” ∧ u [loan_number ] = s [loan_number ])) ∧ not ∃v ∈ depositor (v [customer_name ] =t [customer_name ])} } Domain Relational Calculus: -A nonprocedural query language equivalent in power to the tuple relational calculus

��

��

��

- Each query is an expression of the form: { < x1, x2, …, xn > | P (x1, x2, …, xn)} -x1, x2, …, xn represent domain variables, -P represents a formula similar to that of the predicate calculus. Example Queries Q1.Find the loan_number, branch_name, and amount for loans of over $1200 Ans: {< l, b, a > | < l, b, a > ∈ loan ∧ a > 1200} Q2.Find the names of all customers who have a loan from the Perryridge branch and the loan amount: Ans: {< c > | ∃ l, b, a (< c, l > ∈ borrower ∧ < l, b, a > ∈ loan ∧ a > 1200)} Q3. Find the names of all customers who have a loan from the Perryridge branch and the loan amount: Ans: {< c, a > | ∃ l (< c, l > ∈ borrower ∧ ∃b (< l, b, a > ∈ loan ∧ b = “Perryridge”))} {< c, a > | ∃ l (< c, l > ∈ borrower ∧ < l, “ Perryridge”, a > ∈ loan)} {< c > | ∃ l, b, a (< c, l > ∈ borrower ∧ < l, b, a > ∈ loan ∧ a > 1200)} Q.Find the names of all customers who have a loan of over $1200 {< l, b, a > | < l, b, a > ∈ loan ∧ a > 1200}

Database Design Life Cycle:

Data base design means to design the logical and physical structure of data stored in a database to meet the required information needed for different applications.

- In DDLC held the following phase for design the database. Requiring Collection and analysis:

- It is the first step in the database design. - During this step database design the detailed requirement by

interacting with potential users to identify their particular needs based on the problems.

Conceptual database design: - It is the second step in the database design. - In this step to create a conceptual scheme for the database

that is independent of a specific database management system(DBMS)

��

��

��

- The conceptual scheme includes detailed description of the user and entity types, relationship and constraints.

- It provides a concept to the high level model such as entity Relationship model.

Data Model mapping: - It is also called logical database design. - During this phase the transformation of the conceptual

scheme into the actual implementation of the database is done.

- This may be carried out as DBMS package such as ORACLE, MY SQL, and SQLSERVER.

Physical database Design: - In this phase we design the specification for sorting database

in term of physical storage structure, access path and file organizations.

- The design corresponds to designing the internal scheme of the three level DBMS architecture.

Functional Dependency:

- Functional dependencies are constraints on the set of legal relations.

- It plays important role in database design. - The functional dependencies is denoted by � �, between

two set of attributes �,� - Let R is a relation scheme and �C� and �� then the

functional dependencies � � holds on R. if in any legal relation r(R) for all pairs of tuples t1 in t2 such that.

- t1[�]=t2[�] and their must be t1[�]=t2[�] - Fd represents an interrelationship among attributes of an

entity represented by a relation.

��

��

��

Armstrong Axioms

- Let F be a set of functional dependency the closure of F is the set of all functional dependencies logically implied by F.

- We denoted the closure of F by F+. - We can find the F+ given by F by using following rule. - A) Reflexive rule - B)Augmentation rule - C) Transitive Rule - These three rule are sound because do not generate any

incorrect functional dependency. - The three rules are complete because for a given set F of FD

and they allowed generate all F+. - These three rules is known as Armstrong Rules.

I. Reflexive Rule: - if � are the set of attributes and �� then � � is FD. - Proof: let t1 and t2 be the two tuples of relations R such that

t1[�]=t2[�] It means that all the attributes of t1 and t2 are same. Then � is another set of set attributes which is subset of �. �� so t1[�]=t2[�] So � � is FD.

II. Augmentation Rule: If � � is a FD and r is set of attributes the r��r� is FD. Proof: let t1, t2 be the two tuples of relation R. Let us assume that the FD r�->r� is not FD in a relation R. Since �->� is satisfied on R so we get t1 [�] =t2 [�] -------- (1) t1[�]=t2[�] ------------(2) since r��r� does not hold on Relation R t1[r�]=t2[r�]---------------(3) t1[r�]=t2[r�]----------------(4) from equ(1) and (3) we get t1[r]->t2[r]--------(5) from equ(2) and (5) t1[�r]=t2[�r]---------------(6) from equ(3) and (6) we get r�->r� (proved)

��

��

��

III. Transitive Rule: If �->� and �->r are FD then �->r is FD. Proof: let t1 and t2 are two tuples of relation R Since �->� is satisfied on R so we get t1 [�] =t2 [�] -------- (1)

t1[�]=t2[�] ------------(2) since �->� satisfy on R we get t1[�]=t2[�]------------------(3) t1[r]=t2[r]-------------------(4) from equ (1) and (4) we get �->r which holds the relation R (proved)

Closure of attributes sets:

Let R be v=the relation schema with set of functional dependency F. let � be the set of attributes, the closure of attributes set � under a set of FD and is denoted by �+. Compute the candidate Key:

1) Determine each set of attributes X for which left-hand side of FD in F must be a subset of X.

2) Compute X+ under given set of FD in F. 3) If X+={R} then X is a candidate key R.

Example: let us consider r= {A, B, C, D, E, F, G, H, I} And set of FD {AB->C, A->DE, B->F, F->GH,D->IJ} Find out candidate key of R.

Ans: A+ under F= {A, D, E, I, J} B+ under F= {B, F, G, H} D+ under F= {D, I, J} F+ under F= {F, G, H} AB+ under f F={A,B,C,D,E,F,H,I,J}={R} Hence AB is the candidate key of R. Prime Attributes & Non-Prime Attributes: An attributes A in a relation schema is a prime attributes. If A is part of any candidate key of relation R is known prime attributes. If A is not part of any candidate key of R A is called non-prime attributes.

��

��

��

Database normalization Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored. First Normal Form - 1NF: First normal form (1NF) sets the very basic rules for an organized database:

• Eliminate duplicative columns from the same table. • Create separate tables for each group of related data and identify

each row with a unique column or set of columns (the primary key).

The first normal form only says that the table should only include atomic values, i.e. one value per box. For example, we cannot in Table 1 below put in both Volvo and SAAB in the same box even if we buy cars from both suppliers. We must use to different rows for storing that. In most RDBMSs it is not allowed to assign more than one value to each box that result in that all tables are in first normal form.

��

��

��

Second Normal Form - 2NF The second normal form says that a table, despite being in 1NF, is not allowed to contain any full functional dependencies on components of the primary key.

- A relation schema R is 2NF if every nonprime attributes A in R is fully functional dependency on P.K of R.

- Fully Functional Dependency: functional dependency is constraints on the set of legal relations.

- It plays important role in database design. - The functional dependency is denoted by �->�, between two

set of attributes �, �. - Let R is a relation scheme and �<� and �<� then the

functional dependency �->� hold on R. if in any legal relation r(R) for all pairs of tuples t1 and t2 in r such that t1[�1]->t2[�2] and their must be t1[�1]->t2[�2]

- FD represent on interrelationship among attributes of an entity represented by a relation.

- A better definition of 2NF: To fulfill 2NF a table should fulfill 1NF and in addition every non-key attribute should be FFD of every candidate key.

Third Normal Form - 3NF A table is said to be third normal form, if all the non key field of the table are independent of other non-key field of the table.

- 3rd NF is based on the concept of transitive dependency. - A functional dependency X->Y in a relation schema R is

transitive dependency. If there is set of attributes Z that is neither a candidate key nor a subset of key and both X->Z and Z->Y is hold.

- When a non key attributes depends on other non key attributes is called a transitive dependency. Sl.NO ORIGIN DESTINATION DISTANCE

Here non key attributes Distance dependent on other non key attributes Origin and Destination.

Origin, Destination-�Distance is transitive dependency.

��

��

��

Boyce Codd Normal Form - BCNF Every non-trivial functional dependency in the table is a dependency on a super key.

- Trivial functional dependency: A trivial functional dependency is a functional dependency of an attribute on a superset of itself. {Employee ID, Employee Address} � {Employee Address} is trivial, as is {Employee Address} � {Employee Address}.

- BCNF is simple form of 3NF.but it is much strict then 3NF.it means that every BCNF relation is also 3Rd NF, but a relation in 3RD NF is not a BCNF.

- A relation schema R is in BCNF with respect to a set of FD, if for all FD in F+ of the form �--�� where �<� and �<� at least one of the following rule hold.

- A) �-�� is trival functional dependency that �<� - B) � is super key of schema R.

PID C_NAME PLOT_NO AREA PRICE TAX_RATE

FD1

FD2

FD4 FD3

FD5

The plot schema is not 3Rd NF, since the FD3 and FD4 violates the 3NF. Hence it is decomposed into BCNF as follow.

Area---------�Price

C_Name---------�Tax_Rate

��

��

��

PID C_NAME PLOT_NO AREA

FD1

FD2

FD3

Area Price C_Name Tax_Rate

R1 R2

In the relation R is in 3RD Nf but it is not BCNF. since Area-�C_name violates the BCNF because Area is not super key.

So R3 is decomposed R31, R32.hence the PID is super key.

The BCNF relations are: R1(Area,Price) R2(C_Name,Tax_Rate) R3(Area,C_Name) R4(PId,Plolt_no,Area)

Fourth Normal Form - 4NF: A relation schema R is in 4NF with respect to a set D of Functional dependency and multivalve dependencies in D+ of the form �-�---�� where ��R and ��R at least one of the following rule hold.

1) �---�--�� is multi-value dependency 2) � is a super key of R(Schema)

if �--�--�� is a multi-valuee dependency on schema R, so �--�--�� is trival if �� or � U�=R Table: loan_info Loan_no Cust_Name Street City 110 Ram G.Nagar B.Patana

110 Ram I.Nagar B.Patana

120 Hari K.Nagar Rkl

��

��

��

Here we find the cust_name---�-�Street,city is MVD and cust_name is not super key of R. we replace the loan_info into two schema. Borrower

Cust_name Loan_no Ram 110 Hari 120

Customer Cust_Name Strret City Ram G.Nagar B.Patana Ram I.Nagar B.Patana

Hari K.Nagar Rkl Fifth Normal Form - 5NF: it is based on join dependency calllled project join dependency.

- A relation schema R is in PJNF w.r.t D of functional dependency,multi-value dependency. If for all join dependency in D+ of the form *(R1, R2, R3………..Rn) where Ri�R and R= R1UR2UR3uR4………Urn at least one of the following rule hold. 1) *(R1, R2, R3………..Rn) is a trival join dependency. 2) Every Ri is super key of R.

Join dependency: A table T is subject to a join dependency if T can always be recreated by joining multiple tables each having a subset of the attributes of T.

If r=R1UR2UR3…………….Rn, we say that relation r(R) satisfy the join dependency *(R1, R2, R3………….Rn).this dependency require for all legal r( R)= �R1(r) X�R2 (r)……………….�Rn (r)

��

��

��

Important Points

Functional dependency

In a given table, an attribute Y is said to have a functional dependency on a set of attributes X (written X � Y) if and only if each X value is associated with precisely one Y value. For example, in an "Employee" table that includes the attributes "Employee ID" and "Employee Date of Birth", the functional dependency {Employee ID} � {Employee Date of Birth} would hold. It follows from the previous two sentences that each {Employee ID} is associated with precisely one {Employee Date of Birth}. Trivial functional dependency

- Let R is a relation and FD �--�� is trival dependency if �� - A Fd is said to be trival functional dependency they are

satisfied all relations. - A�A,AB�A

A trivial functional dependency is a functional dependency of an attribute on a superset of itself. {Employee ID, Employee Address} � {Employee Address} is trivial, as is {Employee Address} � {Employee Address}. Full functional dependency

An attribute is fully functionally dependent on a set of attributes X if it is:

• functionally dependent on X, and • not functionally dependent on any proper subset of X.

{Employee Address} has a functional dependency on {Employee ID, Skill}, but not a full functional dependency, because it is also dependent on {Employee ID}.

��

��

��

Transitive dependency

A transitive dependency is an indirect functional dependency, one in which X�Z only by virtue of X�Y and Y�Z.

Multivalued dependency

A multivalued dependency is a constraint according to which the presence of certain rows in a table implies the presence of certain other rows.

Join dependency

A table T is subject to a join dependency if T can always be recreated by joining multiple tables each having a subset of the attributes of T.

Superkey

A superkey is a combination of attributes that can be used to uniquely identify a database record. A table might have many superkeys.

Candidate key

A candidate key is a special subset of superkeys that do not have any extraneous �

Non-prime attribute

A non-prime attribute is an attribute that does not occur in any candidate key. Employee Address would be a non-prime attribute in the "Employees' Skills" table.

Prime attribute:A prime attribute, conversely, is an attribute that does occur in some candidate key.

��

��

��

Primary key

Most DBMSs require a table to be defined as having a single unique key, rather than a number of possible unique keys. A primary key is a key which the database designer has designated for this purpose.

Query Processing

Query processing refer to no of activity involved in retrieve or extracting data from database.

- Transfer query in High level language (SQL) into Low level language (Relational algebra)

- Execute to retrieve of data.

Query Optimization: query optimizer is the process of selecting the most efficient query among many strategies, i.e. usually possible for processing.

- Query optimization reduces the execution time of query. - Scanner, Parser, and validator

Internal representation of query

Plan Generation

Cost Estimation

Execute query

Query code generator

Code execute in query

Run-time database processor.

Result query

System catalogue manager

��

��

��

- A query expressed in high level language such as sql must first scanned, parsed and validated.

- Scanner: identify the language taken such as sql ,keyboard, attribute name and relational schema name.

- Parser: Parser check the query syntax to determine whether it is formulated according to syntax rule or not.

- Validator: checking all the attributes and relation names are valid and semantically meaningful name exist in database or not.

- Internal representation of query: it is usually as tree data structure called query tree. it can also represented query graph

- Query Optimizer: is responsible for identifying an efficient execution plan for evaluating query.

- I) optimizer generates alternate plan and choose plan with least estimated cost.

- II) To estimate cost of plan, optimizer uses the system catalog/data dictionary.

- Code generator: it generates the code to execute the plan, chosen by query optimizer.

- Runtime Time Database: the processor has the task of running the query code to produce the query result.

****************************End of Module II*********************

Possible Questions: (2 Marks)

Difference between natural join and inner join:

- It is a binary operation that allows us to combine certain selection and a Cartesian product into one operation.

- It is denoted as - It generates a Cartesian product of its two arguments and

performs a selection forcing equality on those attributes that appear in both relations.

��

��

��

- It removes duplicate attributes.

Inner Join:

- it is binary operation - it is represented by command inner join - it generates new relation that contain tuples are common in

both relation with conditions. - It cannot remove the duplicate attributes.

Update anomalies:

- The redundancy causes problems with storage, retrieval, and updating of data.

- The redundancy can lead to update anomalies such as inserting, modifying, and deleting data may cause inconsistency.

Multivalued dependency

- A multivalued dependency is a constraint according to which the presence of certain rows in a table implies the presence of certain other rows.

- Let R be the relation schema and � � �( Subset) and � � � (subset). The multivalue dependency �---�-�� in R.

Semi less join:

- It reduce the number of tuples in arelation before transferring to another relation.

- It follows that the resulting relation will have two attributes with non identical values in every tuples.

- If one of the these attributes is projected away and the other renamed(if necessary)

- After applying semi less join , the resulting relation has exactly the same set of tuples, but a difference name and a difference schema.

dbms module ii

Engineering

loan loan

loan number

relation r p

e r s

schema r s

s example

loan ns

loan q3