cs 542 database index structures
DESCRIPTION
TRANSCRIPT
CS 542 Database Management SystemsControlling Database Integrity and Performance
J Singh January 31, 2011
2© J Singh, 2011 2
Today’s Topics
• Database Integrity– Primary Key Constraints – Prevent Duplicates– Foreign Key Constraints – Prevent Dangling References– Attribute Constraints – Prevent Inconsistent Attribute Values– Tuple Constraints – More vigilant checking of attribute values– Assertions – Paranoid integrity checking
• Views
• Performance Topics– Indexes
• Discussion of presentation topic proposals
3© J Singh, 2011 3
Primary Key Constraints
• What are Primary Keys good for?
– Uniquely identify the subject of each tuple
– Ensure that there are no duplicates
– Cannot be null – that would imply a NULL subject.
P1 P2 … Pm A1 A2 … An
Thing1
::
Thingn
• A table may not have more than one primary key
– A Primary Key may consist of one or more columns
– Multiple Unique keys are OK
• For Table R, <P1, P2, …, Pm> together constitute a primary key if for each tuple in R,
– <P1, P2, …, Pm> are unique
– P1, P2, …, Pm are non-null
• <U1, U2, …, Um> together constitute a unique key if for each tuple in R,
– <U1, U2, …, Um> are unique
– But U1, U2, …, Um can be null
4© J Singh, 2011 4
Name CountryCode
Helsinki FIN
City
Foreign Key Constraints (p1)
• Foreign Key– Must point to a Key
ReferenceCREATE TABLE City ( :: CountryCode char(3) REFERENCES Country(Code))
• Key Reference– Must be unique or primary
key– Try: INSERT INTO city(Name, CountryCode) value ('xyzzy',
'XYZ');
– Try: UPDATE cityset CountryCode='XYZ' where
CountryCode='FIN';
– Key reference must already exist before a referencing tuple can be added
Code Name …
FIN Finland
Country
Foreign Key
Key Reference
Name CountryCode
Helsinki FIN
Xyzzy XYZ
City
Main Idea: Prevent Dangling Tuples
5© J Singh, 2011 5
Foreign Key Constraints (p2)
• Alternative methods of defining a foreign key– CREATE TABLE City ( CountryCode char(3) REFERENCES COUNTRY(Code), …)
– CREATE TABLE City ( CountryCode char(3), …, FOREIGN KEY CountryCode [CONSTRAINT [ctyREFcntry]] REFERENCES
COUNTRY(Code))
– CREATE TABLE City ( CountryCode char(3), …) Then, later, ALTER TABLE City ADD [CONSTRAINT [ctyREFcntry]] FOREIGN KEY CountryCode REFERENCES
COUNTRY(Code);
Notation: [] signifies optional
6© J Singh, 2011 6
Foreign Key Constraints (p3)
• Changes to Key References– Try: DELETE FROM country WHERE code=‘FIN’;– Try: UPDATE country SET Code='XYZ' WHERE Code='FIN‘;
• Referential Integrity Options– Restrict (default)
• Reject request– Cascade
• Reflect changes back– Set Null
• Set the foreign key to NULL
Code Name …
FIN Finland
Country
Foreign Key
Key Reference
Name CountryCode
Helsinki FIN
City
7© J Singh, 2011 7
Foreign Key Constraints (p4)
• Chicken and Egg definitionsCREATE TABLE chicken ( cID INT PRIMARY KEY, eID INT REFERENCES egg(eID));
CREATE TABLE egg( eID INT PRIMARY KEY, cID INT REFERENCES
chicken(cID));
• Consistently fails– Can’t define a foreign key
to a table before it has been defined
• Solution– Define the tables w/o
constraintsCREATE TABLE chicken( cID INT PRIMARY KEY, eID INT); CREATE TABLE egg( eID INT PRIMARY KEY, cID INT);
– And then add foreign keysALTER TABLE chicken ADD CONSTRAINT c_e FOREIGN KEY (eID) REFERENCES egg(eID);ALTER TABLE egg ADD CONSTRAINT e_c FOREIGN KEY (cID) REFERENCES chicken(cID);
8© J Singh, 2011 8
Foreign Key Constraints (p5)
• Chicken and Egg insertionINSERT INTO chicken VALUES(1, 1001);INSERT INTO egg VALUES(1001, 1);
• Still consistently fails– Need a way to postpone
constraint checking– How long to postpone?
• Until transaction commit
• Solution– Define the tables with deferred
constraint-checkingALTER TABLE chicken ADD CONSTRAINT c_e FOREIGN KEY (eID) REFERENCES egg(eID) INITIALLY DEFERRED DEFERRABLE;ALTER TABLE egg ADD CONSTRAINT e_c FOREIGN KEY (cID) REFERENCES chicken(cID) INITIALLY DEFERRED DEFERRABLE;
• And thenINSERT INTO chicken VALUES(1, 1001);
INSERT INTO egg VALUES(1001, 1);COMMIT;
9© J Singh, 2011 9
Attribute-Based Constraints
• NOT NULL– The most common
• Reasonability Constraints– Validate incoming data? e.g.,
• Population Density < 30000– Specification:
Population INT(11) NOT NULL CHECK (Population <= 30000 * SurfaceArea),
– The condition in CHECK(cond) can take any value that a condition in WHERE(cond) can take
• Including subqueries– The attribute constraint is checked when assigned
• Can be violated underneath as long as it is not re-evaluated• For example, if we update SurfaceArea, the violation won’t be
flagged
Not implemented in all databases, e.g., MySQL
10© J Singh, 2011 10
Tuple-Based Constraints
• Validate the entire tuple whenever anything in that tuple is updated
– More integrity enforcement than with attribute-based constraints e.g.,
• Population Density <= 30000– Specification:
Population INT(11) NOT NULL,CHECK (Population <= 30000 * SurfaceArea),
– The condition in CHECK(cond) can take any value that a condition in WHERE(cond) can take
• Including subqueries– The attribute constraint is checked when tuple is updated
• If we update SurfaceArea, the violation will be flagged• But the violation of CHECK (Population > ( SELECT SUM(Population) FROM City WHERE City.CountryCode = Code))
which specifies a subquery involving another table, will not be flagged
Not implemented in all databases, e.g., MySQL
11© J Singh, 2011 11
Assertions
• Validate the entire database whenever anything in the database is updated
– Part of the database, not any specific table– Specification: Table-like
CREATE ASSERTION CountryPop CHECK ( NOT EXISTS (SELECT * FROM Country WHERE Population < (SELECT SUM(Population) FROM City WHERE City.CountryCode = Code)))
– Difficult to implement efficiently• Often not implemented• I don’t know of any implementations
• Can be implemented for specific cases using Triggers, see Section 7.5
12© J Singh, 2011 12
Views
• Also called Virtual Views• Don’t actually exist in the database but behave as if they do• Can be subsets of the data or joins – actually, arbitrary queries
• Subset example,CREATE VIEW ct AS SELECT c.Name AS nm, c.countrycode AS
cntry FROM city c WHERE population > 0
• Join exampleCREATE VIEW CityLanguage as SELECT city.name, city.countrycode, lang.language as
Language FROM city, countrylanguage as lang WHERE city.countrycode = lang.countrycode AND lang.isOfficial = ‘T‘;
13© J Singh, 2011 13
Operations on Views (p1)
• SELECT SELECT * FROM CityLanguage WHERE Language='Dutch';
– Shouldn’t ‘temporarily’ create the table and SELECT from it.
– Should use the definition of CityLanguage to make a query, i.e.,
SELECT * FROM (SELECT …blabla… FROM city, countrylanguage as lang WHERE city.countrycode = lang.countrycode AND lang.isOfficial = 'T') WHERE Language='Dutch';
14© J Singh, 2011 14
Operations on Views (p2)
• UPDATE, INSERT not always possible, except– Can sometimes be implemented using INSTEAD OF triggers– Modifications are permitted when the view is derived from a
single table R and• The WHERE clause does not involve R in a Subquery• The FROM clause can only consist of one occurrence of R• The valued of all attributes not specified in the view definition can
be ‘manufactured’ by the database– Example. For the view ct
CREATE VIEW ct AS SELECT c.Name AS nm, c.countrycode AS cntry
FROM city c WHERE population > 0
the queryINSERT INTO ct (nm, cntry) values ('FirSPA', 'FIN')
can be automatically rewritten as INSERT INTO CITY (Name, CountryCode) values ('FirSPA', 'FIN')
15© J Singh, 2011 15
Top-Down Datalog Recursion Revisited
• IDB’s are conceptualized (and implemented) as Views
for IDB predicate p(x,y, …)FOR EACH subgoal of p DO IF subgoal is IDB, recursive call; IF subgoal is EDB, look up
Rule Why unsafe?
isHappy(x) isRich(y)
y could recursively loop back to x
Bachelor(x) NOT isMarried(x)
Negated, may result in infinite recursion
IsCheap(x) x < 10
It’s safe!
16© J Singh, 2011 16
Indexes
• Motivation:– Preventing the need for linear search through a big table– Example query:
SELECT * FROM City WHERE CountryCode = 'FIN';– Another:
SELECT * FROM City WHERE Population > (0.4 * ( SELECT Population FROM Country WHERE CountryCode = Code));
– Expected time for first example: O(n). For the second, O(n2)• Declaration
CREATE INDEX CityIndex ON City(CountryCode);CREATE INDEX CityPopIndex ON City(Population);CREATE INDEX CountryPopIndex ON Country(Population);
Main Idea: Data Structures for Fast Search
17© J Singh, 2011 17
Selection of Indexes (p1)
• Why not create an index for every attribute?
• Useful indexes, and not so useful ones– Primary key?– Unique key?– From previous examples,
• CityIndex?• CityPopIndex?• CountryPopIndex?
18© J Singh, 2011 18
Selection of Indexes (p2)
• The Mantra:– Don’t define indexes too early: know your workload first
• Be as empirical as is practical– The Greedy approach to index selection:
• Start with no indexes• Evaluate candidate indexes, choose the one potentially most
effective• Repeat
– Query execution will take advantage of defined indexes
CS 542 Database Management SystemsReport Proposals
J Singh January 31, 2011
20© J Singh, 2011 20
Report Proposals – General Observations
• Simply Impressive!
• Corrective Themes– When in doubt, prefer depth over breadth– Tilt the balance toward obtaining and working with real
data– Focus on your contributions– Separate the report from the project
• If your intent in the project is to do a significant piece of development, make the report about the design
• Go light on implementation; toy application is good to get your feet wet but leave the heavy lifting for the project
– For big papers, don’t try to swallow it whole. Take a piece and focus on that.
21© J Singh, 2011 21
Next meeting
• February 7
• Index Structures, Chapter 14