week 1. before we start course website discussion forum hw
TRANSCRIPT
Week 1
Before we start
• Course website• http://www.cs.ucla.edu/classes/spring15/cs143/
• Discussion Forum• https://piazza.com/ucla/spring2015/cs143/home
• HW and Project Submisssion• https://ccle.ucla.edu/course/view/15S-COMSCI143-1
• Contact:• Hsuan Chiu [email protected]• Yuxi [email protected]
• Come to my office hour if you have questions• Hsuan Chiu BH 2432 Wed 4:30pm~5:30pm, Fri 12:00pm~2:00pm• Yuxi ChenBH 2432 Mon 1:30pm~2:30pm, Fri 12:00pm~2:00pm
• Start from next week, we will use Discussion 2B session as another office hour. Students enrolled in 2B please leverage the other two sessions.
The discussion class
• We usually try to• Go over material covered in the lecture very quickly• Answer your questions• Do more exercises• Go over the previous homework• Discuss projects• …
Today’s plan
• Introduction to DBMSs• Relational Algebra• Project 1
IntroductionChapter 1
Let’s start with a question
• How would you store the information of the books you have in your bookshelf at home?
• You can use a simple text file, or spreadsheet in your file system, and it’s going to work perfectly
• What if you need to store the information of all books in all the libraries in UCLA?
• Why it’s not wise to use the same approach?
Drawbacks of using file systems to store data
• Data redundancy and inconsistency• Multiple file formats, duplication of information in different files
• Difficulty in accessing data • Need to write a new program to carry out each new task
• Data isolation - multiple files and formats• Integrity problems
• Integrity constraints (e.g., account balance > 0) become “buried” in program code rather than being stated explicitly
• Hard to add new constraints or change existing ones
Drawbacks of using file systems (cont.)
• Atomicity of updates• Failures may leave database in an inconsistent state with partial updates
carried out• Example: Transfer of funds from one account to another should either
complete or not happen at all
• Concurrent access by multiple users• Concurrent access needed for performance• Uncontrolled concurrent accesses can lead to inconsistencies
• Example: Two people reading a balance (say 100) and updating it by withdrawing money (say 50 each) at the same time
• Security problems• Hard to provide user access to some, but not all, data
Solution?
• We need a system to support all the previous needs• That system is called Database Management System (DBMS)
Database management system (DBMS)
• DBMS contains information about a particular enterprise• Collection of interrelated data• Set of programs to access the data • An environment that is both convenient and efficient to use
• Database Applications:• Banking: transactions• Airlines: reservations, schedules• Universities: registration, grades• Sales: customers, products, purchases• Online retailers: order tracking, customized recommendations• Manufacturing: production, inventory, orders, supply chain• Human resources: employee records, salaries, tax deductions
• Databases touch all aspects of our lives
Instances and schemas
• Similar to types and variables in programming languages• Schema – the logical structure of the database
• Example: The database consists of information about a set of customers and accounts and the relationship between them
• Analogous to type information of a variable in a program• Instance – the actual content of the database at a particular point in time
• Analogous to the value of a variable
Schema for LibBook system
• Book (BID, BTitle, BPrice, BPubDate, …)• Library (LID, LName, LDept, …)• BoLi(BID, LID, Date, Number, …)• …
Instances for LibBook system
• Book
• Library
• BoLi
BID BTitle BPrice BPubDate
12 DB Concept 153 2009
32 OS Concept 102 2008
42 AI in practice 100 2005
15 Intro. To DB 50 2010
LID LName LDept
23 CSLib CS
12 Powell
65 Lib65 Humanities
BID LID Date Number
42 23 2009 3
12 23 2009 1
12 12 2005 2
Today: Relational Algebra
• Way to ask queries of relations.• Operators
Operation Symbol Explanation
Selection σ Selects rows
Projection π Selects columns
Union ⋃
Intersection ⋂
Cross-product ×
Join ⋈ Join tables on some condition
Division / Later..
Introduction to the Relational Algebra
Relation schema and instance
• A1, A2, …, An are attributes• R = (A1, A2, …, An ) is a relation schema
Example:
instructor = (ID, name, dept_name, salary)
• Formally, given sets D1, D2, …., Dn, a relation r is a subset of
D1 x D2 x … x Dn
• One (set) of the attributes should be key (identifier)
• Order of tuples is irrelevant
Models = Relations (almost)
Attribute = ColumnTuple = RowDomain = Allowed values for each columnKey = Row’s Identifier in the tableCardinality = Number of rowsDegree = Number of columns
• So from now on, we use table and relation interchangeably. • Although tables are ordered and can have duplicate items.
Database
• A database consists of multiple relations (tables)
• Information about an enterprise is broken up into parts
instructor student advisor
• Bad design: univ (instructor -ID, name, dept_name, salary, student_Id, ..)results in
• repetition of information (e.g., two students have the same instructor)
• the need for null values (e.g., represent an student with no advisor)
• Normalization theory (Chapter 7) deals with how to design “good” relational schemas
Relational query languages
• Now that we have a collection of relations as a relational database, we need a language to access (query) our database:
• Procedural vs non-procedural, or declarative• Relational operators
Selection
• Select rows• Example: σbirth_year < 1950(R)
Username Email Birth Year
meep [email protected] 1920
notmeep [email protected] 1920
beep [email protected] 1980
beepboop [email protected] 1985
Selection
• Select rows• Example: σbirth_year < 1950(R)
Username Email Birth Year
meep [email protected] 1920
notmeep [email protected] 1920
beep [email protected] 1980
beepboop [email protected] 1985
Projection
• Select columns• Example: πusername, email(R)
Username Email Birth Year
meep [email protected] 1920
notmeep [email protected] 1920
beep [email protected] 1980
beepboop [email protected] 1985
Projection
• Select columns• Example: πusername, email(R)
Username Email Birth Year
meep [email protected] 1920
notmeep [email protected] 1920
beep [email protected] 1980
beepboop [email protected] 1985
Selection & Projection
• What’s the difference between the following:– σbirth_year < 1950(πusername, email(R))– πusername, email(σbirth_year < 1950(R))
Username Email Birth Year
meep [email protected] 1920
notmeep [email protected] 1920
beep [email protected] 1980
beepboop [email protected] 1985
Selection & Projection
• What’s the difference between the following:– σbirth_year < 1950(πusername, email(R))– πusername, email(σbirth_year < 1950(R))
Username Email Birth Year
meep [email protected] 1920
notmeep [email protected] 1920
beep [email protected] 1980
beepboop [email protected] 1985
Selection & Projection
• What’s the difference between the following:– σbirth_year < 1950(πusername, email(R))– πusername, email(σbirth_year < 1950(R))
Username Email Birth Year
meep [email protected] 1920
notmeep [email protected] 1920
beep [email protected] 1980
beepboop [email protected] 1985
Selection & Projection
• What’s the difference between the following:– σbirth_year < 1950(πusername, email(R))– πusername, email(σbirth_year < 1950(R))
Username Email Birth Year
meep [email protected] 1920
notmeep [email protected] 1920
beep [email protected] 1980
beepboop [email protected] 1985
Selection & Projection
• What’s the difference between the following:– σbirth_year < 1950(πusername, email(R))– πusername, email(σbirth_year < 1950(R))
Username Email Birth Year
meep [email protected] 1920
notmeep [email protected] 1920
beep [email protected] 1980
beepboop [email protected] 1985
Union
• Set union between two relations with same fields
• Example: σbirth_year < 1950(R) U σbirth_year%2==0(R)
Username Email Birth Year
meep [email protected] 1920
notmeep [email protected] 1920
beep [email protected] 1980
beepboop [email protected] 1985
Union
• Set union between two relations with same fields• Example: σbirth_year < 1950(R) U σbirth_year%2==0(R)
Username Email Birth Year
meep [email protected] 1920
notmeep [email protected] 1920
Username Email Birth Year
meep [email protected] 1920
notmeep [email protected] 1920
beep [email protected] 1980
⋃
Union
• Set union between two relations with same fields
• Example: σbirth_year < 1950(R) U σbirth_year%2==0(R)
Username Email Birth Year
meep [email protected] 1920
notmeep [email protected] 1920
beep [email protected] 1980
Intersection
• Set intersection between two relations with same fields
• Example: σbirth_year < 1950(R) σ⋂ birth_year%2==0(R)
Username Email Birth Year
meep [email protected] 1920
notmeep [email protected] 1920
beep [email protected] 1980
beepboop [email protected] 1985
Intersection
• Set intersection between two relations with same fields
• Example: σbirth_year < 1950(R) σ⋂ birth_year%2==0(R)Username Email Birth Year
meep [email protected] 1920
notmeep [email protected] 1920
Username Email Birth Year
meep [email protected] 1920
notmeep [email protected] 1920
beep [email protected] 1980
⋂
Intersection
• Set intersection between two relations with same fields
• Example: σbirth_year < 1950(R) σ⋂ birth_year%2==0(R)
Username Email Birth Year
meep [email protected] 1920
notmeep [email protected] 1920
Set Difference
• A – B takes out all rows in B from A.• Example: σbirth_year%2==0(R) - σbirth_year < 1950(R)
Username Email Birth Year
meep [email protected] 1920
notmeep [email protected] 1920
beep [email protected] 1980
beepboop [email protected] 1985
Set Difference
• A – B takes out all rows in B from A.• Example: σbirth_year%2==0(R) - σbirth_year < 1950(R)
Username Email Birth Year
meep [email protected] 1920
notmeep [email protected] 1920
Username Email Birth Year
meep [email protected] 1920
notmeep [email protected] 1920
beep [email protected] 1980
-
Set Difference
• A – B takes out all rows in B from A.• Example: σbirth_year%2==0(R) - σbirth_year < 1950(R)
Username Email Birth Year
beep [email protected] 1980
Cross product
• A x B takes all rows from A and combines them with all rows from B.
• Example: πusername(R) x πbirth_year(R)
Username Email Birth Year
meep [email protected] 1920
notmeep [email protected] 1920
beep [email protected] 1980
beepboop [email protected] 1985
Cross product
• A x B takes all rows from A and combines them with all rows from B.
• Example: πusername(R) x πbirth_year(R)
Username
meep
notmeep
beep
beepboop
Birth Year
1920
1980
1985X =
Cross product
• A x B takes all rows from A and combines them with all rows from B.
• Example: πusername(R) x πbirth_year (R)
Username Birth Year
meep 1920
meep 1980
meep 1985
notmeep 1920
notmeep 1980
notmeep 1985
beep 1920
beep 1980
beep 1985
beepboop 1920
beepboop 1980
beepboop 1985
Username
meep
notmeep
beep
beepboop
Birth Year
1920
1980
1985X =
Join
• A B joins A and B based on some column.⋈– Natural join: Default, joins on matching
columns.– Can also specify a particular join predicate.
• Example: πusername,email(R) π⋈ username,birth_year(R)
Username Email
meep [email protected]
notmeep [email protected]
Username Birth Year
meep 1920
beep 1980
notmeep 1985
⋈
Join
• A B joins A and B based on some column.⋈• Example: πusername,email(R) π⋈ username,birth_year(R)• 1. cross product
Username_1 Email Username_2 Birth_Year
meep [email protected] meep 1920
meep [email protected] beep 1980
meep [email protected] notmeep 1985
notmeep [email protected] meep 1920
notmeep [email protected] beep 1980
notmeep [email protected] notmeep 1985
Join
• A B joins A and B based on some column.⋈• Example: πusername,email(R) π⋈ username,birth_year(R)• 2. selection on username_1=username_2
Username_1 Email Username_2 Birth_Year
meep [email protected] meep 1920
meep [email protected] beep 1980
meep [email protected] beepboop 1985
notmeep [email protected] meep 1920
notmeep [email protected] beep 1980
notmeep [email protected] notmeep 1985
Join
• A B joins A and B based on some column.⋈• Example: πusername,email(R) π⋈ username,birth_year(R)• 3. project attrubutes
Username Email Birth_Year
meep [email protected] 1920
notmeep [email protected] 1985
Division● A / B projects the columns from A that are
not in B, while selecting elements of A that fully intersect with B.
● Think of it as opposite of cross-product!● Example: who takes both cs186 and cs162?
Person Class
meep cs186
meep cs162
moop cs186
moop cs61
beep cs186
beep cs162
Class
cs186
cs162
=
Person
meep
beep/
Division
● Can be expressed with the following equation:
● Things to eliminate: Things that are in the cross product of πx(A) and B that are NOT in A.
● Subtract the left hand side (projection) of those candidates from πx(A).
πx(A) - πx((πx(A) x B) - A)
Division example 2
pno weight
p1 22
p2 17
p3 17
P4 6
P5 30
sno pno
s1 p1
s2 p3
s2 p5
s3 p3
s3 p4
s4 p6
s5 p1
s5 p2
s5 p3
s5 p4
s5 p5
s5 p6
Table: spj
Table: p
Query: Find supplier id ‘sno’ from table spj that they supply all components of weight equal to 17
Division
Idea: Find the values that do not belong in the answer,
and remove them from the list of possible answers.
πx((πx(A) x B) - A)
πx(A) - πx((πx(A) x B) - A)
Division
• 1. formulate all possible answer
(πx(A) x B) : (πsno(spj) x πpno(σweight = 17(p) ))
sno
s1
s2
s3
s4
s5
pno
p2
p3
X =
Sno pno
s1 p2
s1 p3
s2 p2
s2 p3
s3 p2
s3 p3
s4 p2
s4 p3
s5 p2
s5 p3
Division• 2. find out disqualified answer (πx(A) x B) - A
Sno pno
s1 p2
s1 p3
s2 p2
s2 p3
s3 p2
s3 p3
s4 p2
s4 p3
s5 p2
s5 p3
sno pno
s1 p1
s2 p3
s2 p5
s3 p3
s3 p4
s4 p6
s5 p1
s5 p2
s5 p3
s5 p4
s5 p5
s5 p6
Sno pno
s1 p2
s1 p3
s2 p2
s3 p2
s4 p2
s4 p3
Find which supplier who do not support both p2 and p3
- =
Temp table:tmp
Division
• 3. remove those disqualified answer from the list of possible answers.
πx(A) - πx((πx(A) x B) - A)=πsno(spj)-πsno(tmp)
sno
s1
s2
s3
s4
s5
-
sno
s1
s2
s3
s4
sno
s5=
Division
• Since division operator is not implemented in SQL, you need to use more complex SQL query to present.
• But the basic concept is if you see a query expressed with “All”, you may need to consider about using division.
• “Which students are registered on ALL the courses given by Soini?” • “Which boys are registered on those courses that are taken by ALL t
he girls?”
Exercise
• Consider these relations in database. Primary key attributes are underlined. • employee ( eid, street, city)• works ( eid, c-name, salary)• company ( cid, c-name, city)• manager ( pid, mgrid)• Assume the companies may be located in several cities. Find all companies
located in every city in which U-Bank is located. • Find the eids and cities of residence of all employees who work for U-Bank. • Find the eid, streets, and cities of residence of all employees who work for U-
Bank and earn more than $10,000 per annum.• Find the eids of all employees in this database who live in the same city as the
company for which they work.• Assume the companies may be located in several cities. Find all companies
located in every city in which U-Bank is located.
Project 1A
• Set up environment. • VM Installation. http://yellowstone.cs.ucla.edu/~hchiu/cs143/• Download CS143.ova
• Run a test.• 1. start VM. • 2. create a php file.• 3. put under folder www.
• Demo: bmi and calculator.
Form example.
<html><body><form action="handle.php" method="GET">Name: <input type="text" name="name" />Age: <input type="text" name="age" /><input type="submit" /></form><?phpif($_GET["name"]){echo "Welcome ".$_GET["name"]."<br/>";echo "You are ".$_GET["age"]." years old.";}?></body></html>
Hint for project 1A
evalExample:<html><body><?php$equ= "3+4*5";eval("\$ans= $equ;");echo "ans= ".$ans;?></body></html>
Hint for project 1A
preg_match<html><body><? preg_match("/^(http:\/\/)?([^\/]+)/i", "http://www.php.net/index.html", $matches); print $host = $matches[0]."<BR>"; print $host = $matches[1]."<BR>"; print $host = $matches[2]."<BR>"; // get last two segments of host name preg_match("/[^\.\/]+\.[^\.\/]+$/","http://www.php.net",$matches); echo "domain name is: ".$matches[0]."\n"; ?> </body></html>
matches
• If matches is provided, then it is filled with the results of search. • $matches[0] will contain the text that matched the full pattern, • $matches[1] will have the text that matched the first captured
parenthesized subpattern, and so on.